NVIDIA NIM Processor

NVIDIA NIM (NVIDIA Inference Microservices) is a suite of optimized, containerized AI inference endpoints designed to simplify the deployment and scaling of powerful AI models. Built on top of NVIDIA’s cutting-edge inference stack, NIM enables developers to integrate pre-trained generative AI models into applications quickly and efficiently, with minimal infrastructure overhead.

Configure the processor parameters as explained below.

Model Selection

Under the Model Selection tab of the NVIDIA NIM processor, provide the below details.

Connection Type

Connect to NIM using one of the below options:

NIM Cloud: Use this option if the model is on NIM Cloud.
NIM (Kubernetes): Use this option if model is deployed on Kubernetes.

Configuration fields for NIM Cloud:

Connection

A connection name can be selected from the list if you have created and saved connection details for NVIDIA NIM earlier. Or create one as explained in the topic - NVIDIA NIM Connection →

Configuration fields for NIM (Kubernetes):

Kubernetes Gateway URL

Kubernetes gateway URL where the model is deployed.

Authentication Type

Select the method used by your environment to authenticate access to Kubernetes.

None: No authentication method is used.
Basic: Authenticate via a username and password.
Token: Authenticate through a token-based system.

For basic authentication type:

Username

Enter the username required for basic authentication to access Kubernetes.

Password

Enter the password associated with the provided username.

For token-based authentication type:

Token Id

Enter the unique identifier for the token.

Token

Enter the authentication token associated with the provided Token Id.

Enable SSL

It is set to False by default.

Set this option to True, if the resource that is to be requested is SSL-enabled.

Keystore select option

If SSL is set to True, choose how the SSL-enabled resource should be verified.

Either a keystore file or a certificate file needs to be uploaded based on the chosen verification method.

The Keystore Password or Certificate Alias should then be provided as per the type of file uploaded for verification.

Select Model

Select the model from dropdown based on whether it is for text generation or embeddings.

Click Next, based on model type it will redirect to next screen.

Configuration Tab

Provide the below details in the Configuration tab.

Upon selecting Text generation models in Select Model field, the below options will be available in the configuration tab.

Prompt

A prompt is concise instruction or query in natural language provided to the Nvidia processor to guide its actions or responses. In the Prompts section, you have the flexibility to Load Existing Prompts.

Load Existing Prompts

Discover a set of ready-to-use sample prompts that can kickstart your interactions with the Nvidia processor. Customize prompts to suit your specific needs.

Save Prompts

Store your preferred prompts for future use.

Delete Prompts

Remove prompts that are no longer necessary.

Reset Prompt

To reset the prompt, clear the details in the prompt field, restoring it to its default state.

System

Provide high-level instructions and context for the Nvidia model, guiding its behavior and setting the overall tone or role it should play in generating responses.

<|endoftext|> is a document separator that the model sees during training, so if a prompt is not specified, the model will generate a response from the beginning of a new document.

The placeholder {some_key} represents a variable that can be replaced with specific column data. You can map this key to a column in the next section using “EXTRACT INPUTS FROM PROMPT”.

User

The user prompt is a specific instruction or question provided by the user to the Nvidia model, directing it to perform a particular task or provide a response based on the user’s request.

<|endoftext|> is a document separator that the model sees during training, so if a prompt is not specified, the model will generate a response from the beginning of a new document. The placeholder {some_key} represents a variable that can be replaced with specific column data. You can map this key to a column in the next section using “EXTRACT INPUTS FROM PROMPT”.

Input

The placeholders {__} provided in the prompt can be mapped to columns to replace its value with the placeholder keys.

Input from prompt

All the placeholders {__} provided in the prompt fields above are extracted here to map them with the input column.

Input column

Select the column name to replace its value with the placeholder keys.

Output

The output can be configured to emit data received from input. This configuration includes utilizing Prompts, which can be interpreted as input columns via placeholders, and allows for emitting output either by specifying a column name or parsing it as JSON.

Process Response

Choose how to handle the response. Options:

Assign to Column: Store the entire response in a single column.
Parse as JSON: Map JSON keys to specific output columns.

Json Key in Response

Enter the JSON keys expected in response and map them to the output column names.

Output Column as JSON

For each JSON key, select or enter the corresponding column name for the data.

Output Column

Specify the column where the entire response will be stored.

RATE CONTROL

Choose how to utilize the AI model’s services:

Make Concurrent Requests with Token Limit: You can specify the number of simultaneous requests to be made to the AI model, and each request can use up to the number of tokens you provide.

This option is suitable for scenarios where you need larger text input for fewer simultaneous requests.

Rate-Limited Requests: Alternatively, you can make a total of specified number of requests within a 60-second window.

This option is useful when you require a high volume of requests within a specified time frame, each potentially processing smaller amounts of text.

Retry requests

Enabling retry requests allows for automatic resubmission of failed or incomplete requests, enhancing data processing reliability.

No. of Retries

The number of times a request will be automatically retried if the response from the AI model is not in JSON format or does not have all the entities.

When

Please select the criteria for retry. Select “Any output key is missing” if all keys are mandatory. Else, select the mandatory keys.

Include previous response

Please mark if a previous incorrect response should be added to the retry request messages prompt. Else, leave it unchecked.

Additional User Prompt

Please type prompt text to be considered while retrying the request.

Upon selecting Embedding model in Select Model field, in the Model Selection tab, the below options will be available in the configuration tab.

Embeddings are numerical representations of text that help AI models understand meaning and relationships between words.

Input Column

Specifies the column containing the input field from Dataframe that will be converted into embeddings. The model processes this text and generates a numerical representation.

Output Column

Defines the column where the generated embeddings will be stored after processing. Ensure this column is unique and correctly formatted for storage.

Batch Size

Determines the number of rows processed in a single batch. A higher batch size may improve processing speed but requires more memory.

Input Type

Choose the format of the input text. Use “Passage” for full text segments stored as embeddings. Use “Query” for search terms compared against passage embeddings.

Encoding Format

Choose the storage format for embeddings. Use “Float” for floating-point vectors. Use “Base64” for encoded representations.

Truncate

Determines how text exceeding the model’s token limit is truncated.

None: No truncation (may cause errors if input exceeds the limit).
START: Truncates from the beginning, keeping the end of the text.
END: Truncates from the end, keeping the start of the text.

Model Parameters

The parameters described below are configuration settings that govern the behavior and performance of models, influencing how they respond to prompts and generate outputs.

Model

The name of the model will be shown.

Temperature

The sampling temperature to be used between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Max Token

The maximum number of tokens to generate. The total length of input tokens and generated tokens is limited by the model’s context length.

Please note that it also affects the total tokens consumed per minute, potentially limiting the number of requests you can make. The total token count (input + output) must stay within the model’s context length. Higher values reduce the number of requests you can make per minute, as total token usage is limited by the model’s processing capacity.

Frequency Penalty

Adjust the likelihood of repeating words. Negative values (e.g., -0.2) encourage repetition, while positive values (e.g., 0.2) reduce repeated words by penalizing frequent tokens.

Top P

Top P controls the diversity of generated text by sampling from the top p probability mass of tokens. Lower values (e.g., 0.2) make the output more focused and deterministic, while higher values (e.g., 0.9) increase randomness and diversity.

Validation

Next step is to provide the below details under the Validation tab.

Validate Output Using

Validate the output according to your needs. Choose the rows you want to validate.

Top 3 Rows: Quickly validate the first three rows of the output. This is good for a rapid overview.
Custom Rows: For more precise validation, you can manually select specific rows to validate. Simply click on the rows you want to include.
Random Rows: Comprehensively validate random rows in the output.

Once you’ve made your selection, click the Validate button to initiate the validation process. The processor will perform the validation according to your chosen rows.

Review and Confirm

Thoroughly review the validation results to confirm if they align with the desired outcome. Adjust and Revalidate (if Necessary).

If you identify any errors or inconsistencies, you can go back to the Nvidia processor’s configuration section and make adjustment as needed.

Once you’re satisfied with the validation results, you can proceed to the next step and save the configurations.

If you have any feedback on Gathr documentation, please email us!

NVIDIA NIM Processor

Model Selection #

Connection Type #

Connection #

Kubernetes Gateway URL #

Authentication Type #

Username #

Password #

Token Id #

Token #

Enable SSL #

Keystore select option #

Select Model #

Configuration Tab #

Prompt #

Load Existing Prompts #

Save Prompts #

Delete Prompts #

Reset Prompt #

System #

User #

Input #

Input from prompt #

Input column #

Output #

Process Response #

Json Key in Response #

Output Column as JSON #

Output Column #

RATE CONTROL #

Retry requests #

No. of Retries #

When #

Include previous response #

Additional User Prompt #

Input Column #

Output Column #

Batch Size #

Input Type #

Encoding Format #

Truncate #

Model Parameters #

Model #

Temperature #

Max Token #

Frequency Penalty #

Top P #

Validation #

Validate Output Using #

Model Selection

Connection Type

Connection

Kubernetes Gateway URL

Authentication Type

Username

Password

Token Id

Token

Enable SSL

Keystore select option

Select Model

Configuration Tab

Prompt

Load Existing Prompts

Save Prompts

Delete Prompts

Reset Prompt

System

User

Input

Input from prompt

Input column

Output

Process Response

Json Key in Response

Output Column as JSON

Output Column

RATE CONTROL

Retry requests

No. of Retries

When

Include previous response

Additional User Prompt

Input Column

Output Column

Batch Size

Input Type

Encoding Format

Truncate

Model Parameters

Model

Temperature

Max Token

Frequency Penalty

Top P

Validation

Validate Output Using