Amazon S3 Ingestion Target
Amazon S3 stores data as objects within resources called Buckets. S3 target stores objects on the specified Amazon S3 bucket.
Target Configuration
Configure the target parameters that are explained below.
Connection Name
Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Amazon S3 earlier. Or create one as explained in the topic - Amazon S3 Connection →
Use the Test Connection option to ensure that the connection with the Amazon S3 channel is established successfully.
A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.
Bucket Name
Buckets are storage units used to store objects, which consists of data and metadata that describes the data. S3 target bucket name is to be specified.
Path
File/Directory path of the target bucket should be given where the processed objects will be emitted.
Path given directly to the root folder of a bucket is not supported.
A directory path can be given as shown below:
Example: /sales/2022-07-14
Single statement MVEL expressions can be used to create custom folders in the bucket.
Example: The expression sales/@{java.time.LocalDate.now()} will create a folder with the <current_date> inside sales directory.
Custom File Name
Enable this option for creating custom file name in the specified S3 path.
File Name
A custom file name should be provided.
Example: sales_@{java.time.LocalDate.now()} input will create sales_<current_date>.<output_format> file.
It would be best to have a unique file name while using the same target location for multiple pipelines run, as in the case of incremental read. Otherwise, it will overwrite data in each iteration.
File Format
Output format in which result will be processed.
Delimiter
A message field separator should be selected for CSV (Delimited) file format.
Header Included
Option to write the first row of the data file as header.
Output Fields
Fields in the message that needs to be emitted should be selected.
Output Mode
Output mode specified how to write the data.
Add configuration: Additional properties can be added using Add Configuration link.
More Configurations
Save Mode
Specify the expected behavior of saving data to the target S3 path in the bucket, with the following options:
Append: Adds new data to the existing target table, leaving current records untouched.
ErrorifExists: When persisting data, if the data already exists, an exception is expected to be thrown.
Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data.
Overwrite: Completely replaces all current data in the target table with the new data set.
Update: Updates existing rows in the target table based on specified join keys. No new records will be inserted.
When Save Mode is set to Update, additional configuration fields appear:
Update Type
Select a method to manage records between the current and incoming data:
Keep latest with Overwrite: Substitutes existing records with new data, adding a column to record the timestamp of the last modification.
Latest data with version: Retains the existing record and adds a new version of the latest modified record, incorporating columns to track start date, end date, and deletion status.
Note: The target table should not have a primary key to execute updates with the “latest data with version” option, as this will cause the update to fail.
Note: Employ the incremental data ingestion method from the source when using the Update method to prevent overwriting the entire dataset.
Join Columns
Specify the key columns used to align incoming source data with existing records in the target database.
If Save Mode field is selected with any other option than Update, then proceed by updating the following field.
Partitioning Required
Whether to partition data on s3 or not.
Partition Columns
If partitioning is enabled, select the fields on which data will be partitioned.
Limitations on custom/dynamic folder or file naming
Not supported on streaming data as the output file will be renamed.
Partitioning of the emitted data is not supported.
Option to invoke custom functions in MVEL expressions is not supported.
If you have any feedback on Gathr documentation, please email us!