NVIDIA NIM Processor
NVIDIA NIM (NVIDIA Inference Microservices) is a suite of optimized, containerized AI inference endpoints designed to simplify the deployment and scaling of powerful AI models. Built on top of NVIDIA’s cutting-edge inference stack, NIM enables developers to integrate pre-trained generative AI models into applications quickly and efficiently, with minimal infrastructure overhead.
Configure the processor parameters as explained below.
Model Selection
Under the Model Selection tab of the NVIDIA NIM processor, provide the below details.
Connection Type
Connect to NIM using one of the below options:
NIM Cloud: Use this option if the model is on NIM Cloud.
NIM (Kubernetes): Use this option if model is deployed on Kubernetes.
Configuration fields for NIM Cloud:
Connection
A connection name can be selected from the list if you have created and saved connection details for NVIDIA NIM earlier. Or create one as explained in the topic - NVIDIA NIM Connection →
Configuration fields for NIM (Kubernetes):
Kubernetes Gateway URL
Kubernetes gateway URL where the model is deployed.
Authentication Type
Select the method used by your environment to authenticate access to Kubernetes.
None: No authentication method is used.
Basic: Authenticate via a username and password.
Token: Authenticate through a token-based system.
For basic authentication type:
Username
Enter the username required for basic authentication to access Kubernetes.
Password
Enter the password associated with the provided username.
For token-based authentication type:
Token Id
Enter the unique identifier for the token.
Token
Enter the authentication token associated with the provided Token Id.
Enable SSL
It is set to False by default.
Set this option to True, if the resource that is to be requested is SSL-enabled.
Keystore select option
If SSL is set to True, choose how the SSL-enabled resource should be verified.
Either a keystore file or a certificate file needs to be uploaded based on the chosen verification method.
The Keystore Password or Certificate Alias should then be provided as per the type of file uploaded for verification.
Select Model
Select the model from dropdown based on whether it is for text generation or embeddings.
Click Next, based on model type it will redirect to next screen.
Configuration Tab
Provide the below details in the Configuration tab.
Upon selecting Text generation models in Select Model field, the below options will be available in the configuration tab.
Prompt
A prompt is concise instruction or query in natural language provided to the Nvidia processor to guide its actions or responses. In the Prompts section, you have the flexibility to Load Existing Prompts.
Load Existing Prompts
Discover a set of ready-to-use sample prompts that can kickstart your interactions with the Nvidia processor. Customize prompts to suit your specific needs.
Save Prompts
Store your preferred prompts for future use.
Delete Prompts
Remove prompts that are no longer necessary.
Reset Prompt
To reset the prompt, clear the details in the prompt field, restoring it to its default state.
System
Provide high-level instructions and context for the Nvidia model, guiding its behavior and setting the overall tone or role it should play in generating responses.
<|endoftext|> is a document separator that the model sees during training, so if a prompt is not specified, the model will generate a response from the beginning of a new document.
The placeholder {some_key} represents a variable that can be replaced with specific column data. You can map this key to a column in the next section using “EXTRACT INPUTS FROM PROMPT”.
User
The user prompt is a specific instruction or question provided by the user to the Nvidia model, directing it to perform a particular task or provide a response based on the user’s request.
<|endoftext|> is a document separator that the model sees during training, so if a prompt is not specified, the model will generate a response from the beginning of a new document. The placeholder {some_key} represents a variable that can be replaced with specific column data. You can map this key to a column in the next section using “EXTRACT INPUTS FROM PROMPT”.
Input
The placeholders {__} provided in the prompt can be mapped to columns to replace its value with the placeholder keys.
Input from prompt
All the placeholders {__} provided in the prompt fields above are extracted here to map them with the input column.
Input column
Select the column name to replace its value with the placeholder keys.
Output
The output can be configured to emit data received from input. This configuration includes utilizing Prompts, which can be interpreted as input columns via placeholders, and allows for emitting output either by specifying a column name or parsing it as JSON.
Process Response
Choose how to handle the response. Options:
Assign to Column: Store the entire response in a single column.
Parse as JSON: Map JSON keys to specific output columns.
Json Key in Response
Enter the JSON keys expected in response and map them to the output column names.
Output Column as JSON
For each JSON key, select or enter the corresponding column name for the data.
Output Column
Specify the column where the entire response will be stored.
RATE CONTROL
Choose how to utilize the AI model’s services:
Make Concurrent Requests with Token Limit: You can specify the number of simultaneous requests to be made to the AI model, and each request can use up to the number of tokens you provide.
This option is suitable for scenarios where you need larger text input for fewer simultaneous requests.
OR
Rate-Limited Requests: Alternatively, you can make a total of specified number of requests within a 60-second window.
This option is useful when you require a high volume of requests within a specified time frame, each potentially processing smaller amounts of text.
Retry requests
Enabling retry requests allows for automatic resubmission of failed or incomplete requests, enhancing data processing reliability.
No. of Retries
The number of times a request will be automatically retried if the response from the AI model is not in JSON format or does not have all the entities.
When
Please select the criteria for retry. Select “Any output key is missing” if all keys are mandatory. Else, select the mandatory keys.
Include previous response
Please mark if a previous incorrect response should be added to the retry request messages prompt. Else, leave it unchecked.
Additional User Prompt
Please type prompt text to be considered while retrying the request.
Upon selecting Embedding model in Select Model field, in the Model Selection tab, the below options will be available in the configuration tab.
Embeddings are numerical representations of text that help AI models understand meaning and relationships between words.
Input Column
Specifies the column containing the input field from Dataframe that will be converted into embeddings. The model processes this text and generates a numerical representation.
Output Column
Defines the column where the generated embeddings will be stored after processing. Ensure this column is unique and correctly formatted for storage.
Batch Size
Determines the number of rows processed in a single batch. A higher batch size may improve processing speed but requires more memory.
Input Type
Choose the format of the input text. Use “Passage” for full text segments stored as embeddings. Use “Query” for search terms compared against passage embeddings.
Encoding Format
Choose the storage format for embeddings. Use “Float” for floating-point vectors. Use “Base64” for encoded representations.
Truncate
Determines how text exceeding the model’s token limit is truncated.
None: No truncation (may cause errors if input exceeds the limit).
START: Truncates from the beginning, keeping the end of the text.
END: Truncates from the end, keeping the start of the text.
Model Parameters
The parameters described below are configuration settings that govern the behavior and performance of models, influencing how they respond to prompts and generate outputs.
Model
The name of the model will be shown.
Temperature
The sampling temperature to be used between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
Max Token
The maximum number of tokens to generate. The total length of input tokens and generated tokens is limited by the model’s context length.
Please note that it also affects the total tokens consumed per minute, potentially limiting the number of requests you can make. The total token count (input + output) must stay within the model’s context length. Higher values reduce the number of requests you can make per minute, as total token usage is limited by the model’s processing capacity.
Frequency Penalty
Adjust the likelihood of repeating words. Negative values (e.g., -0.2) encourage repetition, while positive values (e.g., 0.2) reduce repeated words by penalizing frequent tokens.
Top P
Top P controls the diversity of generated text by sampling from the top p probability mass of tokens. Lower values (e.g., 0.2) make the output more focused and deterministic, while higher values (e.g., 0.9) increase randomness and diversity.
Validation
Next step is to provide the below details under the Validation tab.
Validate Output Using
Validate the output according to your needs. Choose the rows you want to validate.
Top 3 Rows: Quickly validate the first three rows of the output. This is good for a rapid overview.
Custom Rows: For more precise validation, you can manually select specific rows to validate. Simply click on the rows you want to include.
Random Rows: Comprehensively validate random rows in the output.
Once you’ve made your selection, click the Validate button to initiate the validation process. The processor will perform the validation according to your chosen rows.
Review and Confirm
Thoroughly review the validation results to confirm if they align with the desired outcome. Adjust and Revalidate (if Necessary).
If you identify any errors or inconsistencies, you can go back to the Nvidia processor’s configuration section and make adjustment as needed.
Once you’re satisfied with the validation results, you can proceed to the next step and save the configurations.
If you have any feedback on Gathr documentation, please email us!