Amazon S3 Ingestion Target

Amazon S3 stores data as objects within resources called Buckets. S3 target stores objects on the specified Amazon S3 bucket.

Target Configuration

Configure the target parameters that are explained below.

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Amazon S3 earlier. Or create one as explained in the topic - Amazon S3 Connection →

Use the Test Connection option to ensure that the connection with the Amazon S3 channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.


Bucket Name

Buckets are storage units used to store objects, which consists of data and metadata that describes the data. S3 target bucket name is to be specified.


Path

File/Directory path of the target bucket should be given where the processed objects will be emitted.

Path given directly to the root folder of a bucket is not supported.

A directory path can be given as shown below:

Example: /sales/2022-07-14

Single statement MVEL expressions can be used to create custom folders in the bucket.

Example: The expression sales/@{java.time.LocalDate.now()} will create a folder with the <current_date> inside sales directory.


Custom File Name

Enable this option for creating custom file name in the specified S3 path.


File Name

A custom file name should be provided.

Example: sales_@{java.time.LocalDate.now()} input will create sales_<current_date>.<output_format> file.

It would be best to have a unique file name while using the same target location for multiple pipelines run, as in the case of incremental read. Otherwise, it will overwrite data in each iteration.


File Format

Output format in which result will be processed.

Delimiter

A message field separator should be selected for CSV (Delimited) file format.


Header Included

Option to write the first row of the data file as header.


Output Fields

Fields in the message that needs to be emitted should be selected.


Output Mode

Output mode specified how to write the data.


Add configuration: Additional properties can be added using Add Configuration link.


More Configurations

Save Mode

Specify the expected behavior of saving data to the target S3 path in the bucket, with the following options:

  • Append: Adds new data to the existing target table, leaving current records untouched.

  • ErrorifExists: When persisting data, if the data already exists, an exception is expected to be thrown.

  • Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data.

  • Overwrite: Completely replaces all current data in the target table with the new data set.

  • Update: Updates existing rows in the target table based on specified join keys. No new records will be inserted.

When Save Mode is set to Update, additional configuration fields appear:

Update Type

Select a method to manage records between the current and incoming data:

  • Keep latest with Overwrite: Substitutes existing records with new data, adding a column to record the timestamp of the last modification.

  • Latest data with version: Retains the existing record and adds a new version of the latest modified record, incorporating columns to track start date, end date, and deletion status.

Note: The target table should not have a primary key to execute updates with the “latest data with version” option, as this will cause the update to fail.

Note: Employ the incremental data ingestion method from the source when using the Update method to prevent overwriting the entire dataset.

Join Columns

Specify the key columns used to align incoming source data with existing records in the target database.


If Save Mode field is selected with any other option than Update, then proceed by updating the following field.

Partitioning Required

Whether to partition data on s3 or not.


Partition Columns

If partitioning is enabled, select the fields on which data will be partitioned.


Limitations on custom/dynamic folder or file naming

  • Not supported on streaming data as the output file will be renamed.

  • Partitioning of the emitted data is not supported.

  • Option to invoke custom functions in MVEL expressions is not supported.

Top