Solr Emitter
In this article
Solr emitter allows you to store data in Solr indexes. Indexing is done to increase the speed and performance of search queries.
Solr Emitter Configuration
To add a Solr emitter to your pipeline, drag it onto the canvas and connect it to a Data Source or processor. The configuration settings of the Solr emitter are as follows:
Field | Description |
---|---|
Connection Name | All Solr connections are listed here. Select a connection for connecting to Solr. |
Batch Size | If user wants to index records in batch, for that the user has to specify batch size. |
Ignore Missing Values | Ignore or persist empty or null values of message fields in sink. |
Across Field Search Enabled | Specifies if full text search is to be enabled across all fields. |
Index Number of Shards | Specifies number of shards to be created in index store. |
Index Replication Factor | Specifies number of additional copies of data to be kept across nodes. Should be less than n-1, where n is the number of nodes in the cluster. |
Index Expression | jsexpression is used to evaluate index name. For example: ’ns_Name’, the index will be created as ’ns_Name’ the index will be created as ’ns_Name’. Use field alias instead of field name in expression when you want to perform field based partitioning. |
Routing Required | This specifies if custom dynamic routing is to be enabled. If enabled, a json of routing policy needs to be defined. |
ID Generator Type | Enables to generate the ID field. Following types of ID generators are available: Key Based: Key Fields: Select message field to be used as key. Select:** Select all/id/sequence_number/File_id. Note: Add key ‘incremental_fields’ and comma separated column names as values. This will work with a key based UUID UUID: Universally unique identifier. Custom: In this case, you can write your custom logic to create the ID field. For example, if you wish to use an UUID key but want to prefix it with “HSBC”, then you can write the logic in a java class. If you select this option then an additional field - “Class Name” will be displayed on user interface where you need to mention the fully qualified class name of your Java class. You can download the sample project from the “Data Pipeline” landing page and refer Java class com.yourcompany.custom.keygen.SampleKeyGenerator to write the custom code. |
Enable TTL | Select TTL that limits the lifetime of the data. TTL Type: Provide TTL type as either Static or Field Value. TTL Value: Provide TTL value in seconds in case of static TTL type or integer field in case of Field Value. |
Emitter Output Fields | Fields of the output message. |
Connection Retries | Number of retries for component connection. Possible values are -1, 0 or positive number. -1 denotes infinite retries. If Routing Required =true, then: Routing Policy - A json defining the custom routing policy. Example: {“1”:{“company”:{“Google”:20.0,“Apple”:80.0}}} Here 1 is the timestamp after which custom routing policy will be active, ‘company’ is the field name and the value ‘Google’ takes 20% shards and value ‘Apple’ takes 80% shards. |
Delay Between Connection Retries | Defines the retry delay intervals for component connection in milliseconds. |
Enable TTL | Check this option to limit the lifetime of data. |
TTL Type | Options available are: Static and Field Value. |
TTL Value | If the TTL Type is selected as Field value then Provide TTL Value. Provide field of integer or long type only. The value of selected field will be considered as TL value in seconds. Provide TTL value in seconds if TTL Type is selected as Static. |
Priority | Priority defines the execution order for the emitters. |
Checkpoint Storage Location | Select the checkpointing storage location. Available options are HDFS, S3, and EFS. |
Checkpoint Connections | Select the connection. Connections are listed corresponding to the selected storage location. |
Checkpoint Directory | It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir |
Time-base Checkpoint | Select checkbox to enable time-based checkpoint on each pipeline run |
Output Mode | Output mode to be used while writing the data to Streaming sink. Append: Output Mode in which only the new rows in the streaming data will be written to the sink. Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates. The complete output mode comes if aggregation processor is being used. Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates. |
Enable Trigger | Trigger defines how frequently a streaming query should be executed. |
Add Configuration | The user can add further configuration. Note: Index_field and store_field support is there using Add Configuration. |
If you have any feedback on Gathr documentation, please email us!