Custom ETL Source

Custom Data Source allows you to read data from any data source.

You can write your own custom code to ingest data from any data source and build it as a custom Data Source. You can use it in your pipelines or share it with other workspace users.

Schema Type

See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.

After providing schema type details, the next step is to configure the data source.


How to Create Custom Code Jar

Create a jar file of your custom code and upload it in a pipeline or as a registered component utility.

To write a custom code for your custom Data Source, follow these steps:

  1. Download the Sample Component. (Available on the home page of Register Entities > Components page).

    Import the downloaded Sample component as a maven project in Eclipse. Ensure that Apache Maven is installed on your machine and that the PATH for the same is set on the machine.

  2. Implement your custom code and build the project. To create a jar file of your code, use the following command:

    mvn clean install –DskipTests
    

    For a Custom Data Source, add your custom logic in the implemented methods of the classes as mentioned below:

High-level abstraction

If you want high level, abstraction using only Java code, then extend BaseSource as shown in SampleCustomData Source class

com.yourcompany.component.ss.Data Source.SampleCustomData Source which extends BaseSource

Methods to implement:

  • public void init(Map<String, Object> conf)

  • public List<String> receive()

  • public void cleanup()

Low-level abstraction

If you want low-level implementation using spark API, then extend AbstractData Source as shown in SampleSparkSourceData Source class.

com.yourcompany.component.ss.Data Source.SampleSparkSourceData Source extends AbstractData Source

Methods to implement:

  • public void init(Map<String, Object> conf)

  • public Dataset<Row> getDataset(SparkSession spark)

  • public void cleanup()

Data Source Configuration

Configuring Custom Data Source.

Connection Type

Type of connection to be selected for custom component.

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details earlier, or create a new connection for the type of connection selected above.

Channel Plugin

Fully qualified name of a custom code class.

Upload the custom code jar using the Upload button on the Register Entities > Components page.

You can use this Custom Data Source in any application.

Type of Source

Select the type of source, that is Streaming or Batch.


Notes

Optionally, enter notes in the Notes → tab and save the configuration.


Click Done to save the configuration.

Top