Register Entities allows you to register custom components i.e. custom parsers, data sources and processors to be used in the pipelines.
There are following types of entities:
Entity | Description |
Register Component | Upload a customized jar to create a customized component that can be used in data pipelines. |
Functions | A rich library of pre-defined functions and user defined functions. |
Variables | Use variables in your pipelines at runtime as per the scope. |
Calendar | Create multiple holiday calendar which can be then used in Workflow. |
Each entity is explained below.
Use Register Component to register a custom component (Channel and Processor) by uploading a customized jar. Those custom components can be used in data pipelines.
Register Components tab comes under Register Entities side bar option.
Download a sample jar from Data Pipelines page, customize it as per your requirement, and upload the same on Register Components page.
Gathr allows you implement your custom code in the platform to extend functionalities for:
Channel: To read from any source.
Processor: To perform any operation on data-in-motion.
Custom code implementation allows importing custom components and versioning.
You can download a Maven based project that contains all the necessary Gathr dependencies for writing custom code and sample code for reference.
Pre-requisites for custom code development
1. JDK 1.7 or higher
2. Apache Maven 3.x
3. Eclipse or any other IDE
Steps for Custom Code Implementation
Build Custom Code
Provide all the dependencies required for the custom components in pom.xml available in the project.
• Build project using mvn clean install.
• Use jar-with-dependencies.jar for component registration.
Register custom code
The list of custom components is displayed on the page shown below and the properties are described below:
Field | Description |
---|---|
Components | The icon of the component is displayed in this column, which symbolizes a Data Source or a Processor. |
Name | Provide name for Custom Component. |
Config | Config link of the component. You can add configuration to a custom component or upload a jar. |
Engine | The supported engine which is Spark. |
Scope | The component can be used for a Project or across Workspace. Note: The user can define the scope of the Component by selecting either Project or Workspace. If user selects workspace then, the created Component can be used across the Workspace. However, if the user selects Project as scope, then the Component will be visible only in the specific project. |
Actions | Add Config (+) Upload Jar Delete |
Owner | If the custom component was created by a Superuser or workspace user. |
Version | The version number of the custom component. |
Perform following operation on uploaded custom components.
l Change scope of custom components (i.e. Global/Local)
l Change icon of custom components.
l Add extra configuration properties.
l Update or delete registered custom components.
Version Support (Versioning) in component registration
Register multiple versions of a registered component and use any version in your pipeline.
Note:
- As shown in the above image, the user can view the details of listing page of the created Component including details such as Components, Owner, Parent Project (the project in which the Component is registered), Scope (Workspace/Project), Owner, so on and so forth.
- If you have used any registered component in the pipeline, make sure that all the registered components (ones registered with single jar) should be of the same version. If you have registered a component with a fully qualified name, then that component cannot be registered with another jar in the same workspace.
- If same jar is uploaded having same FQN, a new version of that component will get created.
Functions enables you to enrich an incoming message with additional data that is not provided by the source.
Field | Description |
---|---|
Function Name | Specify the name of Function. |
Arguments | The argument specification are enlisted. |
Scope | The function can be used for a Project or across Workspace. Note: The user can define the scope of the Function by selecting either Project or Workspace. If user selects workspace then, the created Function can be used across the Workspace. However, if the user selects Project as scope, then the Function will be visible only in the specific project. |
Parent Project | Parent project of the function being registered. |
Owner | Name of the user who created the function. |
Date Created | Creation date of the function. |
Date Modified | Last modified date of the function. |
Actions | Option to view more details about the function like: Description, Parameters, Returns, Throws and Example. Also, option to delete the registered function |
Gathr provides a rich library of system-defined functions as explained in the Functions section.
Allows you to use variables in your pipelines at runtime as per the scope.
To add a variable, click on Create New Variable (+) icon and provide details as explained below.
Field | Description |
---|---|
Name | Provide a name to the variable. |
Value | Value of assigned to the variable (it can be an expression) |
Data Type | Select the Data Type of the variable. The options are: • Number • Decimal • String |
Scope | Select the Scope of the variable. Following are the types of scope: Project: If the user selects Project, then the scope of this variable will be within the project. Pipeline: If user selects Pipeline, then the scope of this variable will be within the selected pipeline. Global: If user selects Global, then the scope of this variable will be across the application. The global variable will be accessible across various workspaces/project where it was created. Note: - The Global variable can also be utilized in Functions Processor. Workspace: If the user selects Workspace, then the scope of variable will be within all the topologies of the workspace. |
For example, if you create the following variables: Name, Salary and Average.
Then by calling the following code, you will get all the variables in the varMap in its implementation class.
Map<String, ScopeVariable> varMap = (Map<String, ScopeVariable>) configMap.get(svMap);
If you want to use the Name variable that you have created by calling the following code you will get all the details of the scope variables.
The variable object has all the details of the variable Name, Value, Datatype and Scope.
ScopeVariable variable = varMap.get(Name);
String value = variable.getValue();
Variable listing page (shown below)
Note: As shown in the above image, the user can view the details of listing page of the created Variable including details such as Name, Initial value, Data Type, Parent Project (the project in which the Variable is created), Scope (Workspace/Project), so on and so forth.
You can now add Scope Variable so that you can use these variables to reuse and update them as and when needed on pipeline and pipeline components.
Scope Variable Support is added for below components with their respective location where the scope variable will be populated with the help of @.
Cobol (Data Source) --> copybookPath --> dataPath
HDFS (Data Source) --> file path
Hive (Data Source) --> Query
JDBC (Data Source) -- > Query
GCS (Batch and Streaming) (Data Source)--> File Path
File Writer (Emitter)--> File Path
Formats supported are:
@{Pipeline.filepath} = /user/hdfs
@{Workspace.filepath} = /user/hdfs
@{Global.filepath}/JSON/demo.json = /user/hdfs/JSON/demo.json
@{Pipeline.filepath + '/JSON/demo.json'} = /user/hdfs/JSON/demo.json
@{Workspace.filepath + “/JSON/demo.json”} = /user/hdfs/JSON/demo.json
@{Global.lastdecimal + 4} // will add number = 14.0
The user can create holiday calendars from Register Entities < Calendar< Calendar listing page. There will be a + icon to create the calendar.
Entity | Description |
Name | Name of the calendar. |
Scope | Select Project or Workspace, which defines the scope of the calendar. Note: The user can define the scope of the Calendar by selecting either Project or Workspace. If user selects workspace then, the set Calendar can be used across the Workspace. However, if the user selects Project as scope, then the set Calendar will be visible only in the specific project. |
Timezone | Select the timezone for your calendar. |
Date(s) | Select the date(s) for your calender to be marked as holiday. |
Description | User can add a description about the calendar. |
Upload | Upload a text file (.txt) that contains date(s) in the format of MM-DD-YYYY. In case the file has multiple dates, each entry should be in a new line. |
Note: These calender can be used in the Workflow.
Calendar Listing (shown below):
Note: As shown in the above image, the user can view the details of listing page of the Calendar including details such as Name, Dates, Parent Project (the project in which the Calendar is created), Scope (Workspace/Project), so on and so forth.