GCS Emitter
Gathr provides GCS emitter. The configuration details for GCS emitter are mentioned below:
| Field | Description | 
|---|---|
| Save as Dataset | Select the checkbox to save the schema as dataset. Mention the dataset name. | 
| Connection Name | Choose the connection name from the drop down to establish the connection. | 
| Override Credentials | Check the checkbox for user specific actions. | 
| Service Account Key File | Upload GCP Service Account Key File to create connection. You can test the connection by clicking at the TEST CONNECTION button. | 
| Bucket Name | Mention the bucket name. | 
| Path | Mention the sub-directories of the bucket name mentioned above to which the data is to be written. | 
| Output Type | Select the output format in which the results will be processed. | 
| Delimiter | Select the message field separator. | 
| Output Fields | Select the fields in the message that needs to be a part of the output data. | 
| Partitioning Required | To partition the data, checkmark the box. | 
| Partition Columns | Option to select fields on which the data will be partitioned. | 
| Save Mode | Save Mode is used to specify the expected behavior of saving data to a data sink. ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data. Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data. This is similar to a CREATE TABLE IF NOT EXISTS in SQL | 
| Check point Storage Location | Select the check pointing storage location. The available options are S3, HDFS, EFS. | 
| Check point Connections | Select the connection from the drop-down list. Connections are listed corresponding to the selected storage location. | 
| Override Credentials | Check the checkbox for user specific actions. | 
| Username | The name of user through which the Hadoop service is running. Click TEST CONNECTION BUTTON to test the connection. | 
| Checkpoint Directory | It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir | 
| Time-Based Check Point | Select checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis. | 
| Enable Trigger | Trigger defines how frequently a streaming query should be executed. | 
| Trigger Type | Available options in drop-down are: One Time Micro Batch Fixed Interval Micro Batch | 
| ADD CONFIGURATION | User can add further configurations (Optional). Add various Spark configurations as per requirement. Example: Perform imputation by clicking the ADD CONFIGURATION button. For imputation replace nullValue/emptyValue with the entered value across the data. (Optional) Example: nullValue =123, the output will replace all null values with 123 | 
| ENVIRONMENT PARAMS | Click the + ADD PARAM button to add further parameters as key-value pair. | 
If you have any feedback on Gathr documentation, please email us!