Hive Ingestion Source
In this article
To use a Hive Data Source, select the connection and specify a warehouse directory path.
Add a Hive Data Source into your ingestion application.
Configuring Hive Data Source
| Field | Description | 
|---|---|
| Connection Name | Connections are the Service identifiers. Select the connection name from the available list of connections, from where you would like to read the data. | 
| Override Credentials | Check the override credentials for user specific actions. | 
| Username | The name of user through which the hadoop services is running. This option is available once you check the Override Credentials option. | 
| KeyTab Select Option | Select one of the options for keytab Upload as mentioned below: KeyTab File or Specify KeyTab File Path. | 
| Query | Provide Hive compatible SQL query to be executed in component. | 
| Inspect Query | Provide same query as in Query but provide limit in record count using only during inspect and schema detection. | 
| Refresh Table Metadata | Spark hive caches the parquet table metadata and partition information to increase performance. It allows you to have an option to refresh table cache, to get the latest information during inspect. Also, this feature helps the most when there are multiple update and fetch events, in the inspect session. Refresh Table option also repairs and sync partitioned values into Hive metastore. This allows to process the latest value while fetching data during inspect or run. | 
| Table Names | User can specify single or multiple table names to be refreshed. | 
After the query, Describe Table and corresponding Table Metadata, Partition Information, Serialize and Reserialize Information is populated.
Make sure that the query you run matches with the schema created with Upload data or Fetch from Source.
Schema
Check the populated schema details. For more details, see Schema Preview →
Click Done to save the configuration.
If you have any feedback on Gathr documentation, please email us!