cuML Processor

cuML is a suite of GPU-accelerated machine learning algorithms built on NVIDIA’s RAPIDS AI framework.

It provides familiar scikit-learn-style APIs while leveraging the power of CUDA to achieve significant speedups for large datasets.

cuML includes a wide range of algorithms like linear regression, k-means clustering, PCA, t-SNE, and more, all optimized for execution on NVIDIA GPUs. It is designed for data scientists and developers allowing them to scale machine learning workflows efficiently using GPUs.

The cuML processor allows you to perform following operations:

  • Write custom code for defining transformations on Spark DataFrames.

  • Write custom code for processing input records at runtime.

To get started, use the Code Snippets available in the processor’s configuration.

cuml-code-snippets-option

Click on a desired topic to view its sample code and data set.

cuml-code-snippets-example

Copy the code, close the search topic to return to processor’s configuration, and provide the code snippet as inline Python code.


Processor Configuration

Read about the cuML processor configuration fields in this section.

Overwrite Python Executable

Select to override the default Python executable path.

Python Path

Enter the Python executable path.


Utilize Python Virtual Environment

Enable it to use a Python virtual environment for the cuML processor.


Environment Details

Select the environment name and version.

Example: Production - v1.

Environment details are visible in the drop-down list once they are created and saved.


Environment Type

Choose whether to use a Python or Micromomba environment for the virtual environment setup.


Python Environment Type

Python Packages

Enter the python package names separated by newlines or upload a package list. Options to upload/download a sample package list for reference is available.

cuml-python-package


Micromomba Environment Type

Python Version for Micromamba Environment

Specify the version of Python to be used in Micromomba environment. Example: 3.10


Include Micromamba libraries

Enable this option to include Micromomba specific libraries when setting up the environment.


Package Management

Whether to use PIP or Micromomba for managing and installing packages in the environment.


Python Packages

Enter Python package names separated by newlines or upload a package list.

Download a sample package list for reference.

Code Input

Enter inline code or upload a Python script that has processing logic in it.

  • Inline: This option enables you to write Python code in the text editor. If selected, you will view one additional field Python Code.

  • Upload: This option enables you to upload single and multiple python scripts (.py files) and python packages (.egg/.zip files). You have to specify module name (should be part of uploaded files or package) and method name that will be called by the cuML processor.

    When you select Upload, UPLOAD FILE option appears on the screen, browse and select the files that need to be used in the cuML processor.

    One additional field, Import Module will also appear on the screen, if the Upload option is selected.

Python Code

For inline input type, write custom Python code directly on text editor. Use CODE SNIPPETS for a quick reference.


Import Module

Specify module name which contains function that will be called by cuML processor. Here you will get list of all uploaded files in drop down list.

The drop down list will show only .py files. You can also write a module name if it does not appear in the drop-down list.

Function Name

Enter the name of the function defined in the Python code that will be called during execution.


Add Configuration: Enables to add Additional properties.

To pass configuration parameters in cuML processor.

You can provide configuration parameters in cuML processor in form of key value pair. These parameters will be available in form of dictionary in function given in Function Name field as second argument. So function given in field Function Name will take two arguments: (df, config_map)

Where first argument will be dataframe and second argument will be a dictionary that contains configuration parameters as key value pair.


Ask AI Assistant

Use the AI assistant feature to simplify the creation of Python code.

It allows you to generate complex Python code effortlessly, using natural language inputs as your guide.

Describe your desired expression in plain, conversational language. The AI assistant will understand your instructions and transform them into a functional Python code.

Tailor code to your specific requirements, whether it’s for data transformation, filtering, calculations, or any other processing task.

Note: Press Ctrl + Space to list input columns and Ctrl + Enter to submit your request.

Input Example:

Create a column called inactive_days by calculating difference between last_login_date and current date and give those records whose inactive_days is more than 60 days.

Notes

Optionally, enter notes in the Notes → tab and save the configuration.

Top