OpenAI Embeddings
The OpenAI Embeddings Processor in Gathr transforms text into numerical vectors. By converting words and phrases into multi-dimensional vectors, this processor captures semantic relationships, making it easier for your models to understand context and meaning in natural language.
It measures how related text strings are, making it perfect for various tasks, such as:
Search: Find relevant results by ranking them based on their connection to a query string.
Clustering: Group similar text strings together for organized analysis.
Recommendations: Discover items with related text, tailored just for you.
Anomaly Detection: Identify outliers with little relatedness to the rest of the data.
Diversity Measurement: Understand the distribution of similarities within your data.
Classification: Classify text strings based on their most similar label.
Processor Configuration
Configure the processor parameters as explained below.
Connection Name
A connection name can be selected from the list if you have created and saved connection details for OpenAI earlier. Or create one as explained in the topic - OpenAI Connection →
Override Credential
Select the override credentials option check-box for overriding the credentials.
Provide details for these parameters when overriding credentials for the selected connection. Refer to OpenAI Connection topic to learn more.
API Key: The secret key for accessing the end point.
Organization: The organization for which the connection should work.
TEST CONNECTION
Once the connection details are provided, click this option to test connection.
Embedding Model
Gathr supports the ’text-embedding-ada-002’ model.
It helps to convert text into numerical values, which is useful for various tasks such as searching, grouping similar text, making recommendations, detecting anomalies, measuring diversity, and classifying text based on its level of similarity.
Input Column
Select a column to convert text into numerical vectors.
Batch Size
Batch Size determines the number of rows to embed in single request. The maximum value allowed is 1000.
Output Column
Select a column to assign embeddings or create a new output column. Type a name for the new column and press enter to create it.
RETRY CONFIGURATION
Enable retries for embedding requests. Customize the number of retries and the pause duration between attempts for improved handling of temporary issues.
Enable Retry
Enable the option to retry the embedding request if it is timed out or exceeds the rate limit.
Retry Count
Specify the number of retries for resending an embedding request after a timeout or exceeding the rate limit. Increasing retries improves resilience in handling temporary issues.
Retry Delay
Specify the waiting time, in seconds, between each retry attempt for an embedding request. It represents the duration the system should pause before making another attempt.
Add Configuration: Additional properties can be added using this option as key-value pairs.
If you have any feedback on Gathr documentation, please email us!