MLflow Models Listing Page

Manage and explore MLflow models associated with your connections in Gathr on the MLflow listing page.

To access this page, navigate through the Projects (main menu) > Models (sub-menu) > MLflow Tab.

Note: For models to display here, ensure at least one MLflow connection has been created in Gathr.

Upon selecting a connection, MLflow models associated with it will display.

You can organize, monitor, and leverage your MLflow models within Gathr’s intuitive interface.

The actions available on the MLflow Models listing page are explained below:

Select a Connection: Choose from your configured connections to view models specific to that connection and manage MLflow models.
Search: Enter the model names (case-sensitive) as keywords to quickly find models based on their names.
Sort Listed Models: Arrange models in ascending or descending order based on Model Names. In the Sort By field, select either ‘ASC’ (ascending) or ‘DEC’ (descending), and then click the ‘Sort’ button to list the models accordingly.
Display Number of Models per Page: Adjust the pagination settings to control how many models are displayed at once.
View Model Versions: Upon selecting a specific MLflow model from the listing, you can access detailed information and actions related to its versions. To know more, see View Model Versions.
Delete Model: Removes an MLflow model from the listing page as well as MLflow tracking server.

The details shown on the MLflow Models listing page are:

Model Name: The unique identifier of the MLflow model.
Model Type: Categorization of the model (Example: sklearn, transformers).
Created On: Timestamp indicating when the model was initially created.
Last Modified On: Timestamp of the most recent modification to the model.
Description: Optional user-provided information offering context or details about the model’s purpose and functionality.

View Model Versions

To explore model versions, click View Versions next to any model to access its detailed version listing page.

Here, you can view comprehensive information and perform various actions related to each version.

Deployment Configurations

Deploying an MLflow model as an endpoint enables it to process input data and provide predictions via a network interface.

Deploy your MLflow model onto orchestration platforms like Kubernetes directly from Gathr or outside, depending on your preference.

It’s recommended to deploy the models through Gathr as it streamlines the deployment process and subsequent usage in the Model Invoke Processor.

Important: To activate Model Deployment Configurations, ensure that the Allow Model Deployment field is enabled in the MLflow connection settings.

Deployment Configuration Parameters:

Cluster DNS

The cluster DNS endpoint or cluster URL where the MLflow model will be deployed.

For information on obtaining Cluster DNS, expand this section.

The Cluster DNS or cluster URL is typically provided by the administrator of the Kubernetes cluster.

Contact the administrator of the Kubernetes cluster to get the DNS name or URL of the cluster.
If using a managed Kubernetes service (e.g., Google Kubernetes Engine, Amazon EKS, Azure Kubernetes Service), you can usually find the cluster endpoint in the cloud provider’s console under the Kubernetes cluster details.
If you have access to the cluster’s kubeconfig file, it will contain the cluster’s API server endpoint. This can be found under the clusters section in the file.

Format of Cluster DNS:

Fully Qualified Domain Name (FQDN):
Example: https://kubernetes.example.com or https://api.cluster-name.k8s.local
IP Address:
Sometimes, it might be an IP address if DNS is not set up, such as https://192.168.1.100:6443.
Cloud-specific URLs:
For managed Kubernetes services, the URL might look like https://<cluster-name>.<region>.eks.amazonaws.com for AWS EKS or https://<cluster-name>.<region>.azmk8s.io for Azure AKS.

Always ensure you have the correct endpoint and that it is accessible from Gathr.

Namespace

Provide a cluster namespace where MLflow model will be deployed to organize and manage resources.

The namespace should already be there in the Kubernates cluster.

The Gathr user should have View/Read/Create access for model Deployment, Services, ConfigMap, HPA, and for the namespace.

For steps to Create a Namespace, expand this section.

Namespace in Kubernetes can be created by a user with administrative privileges or by someone who manages the Kubernetes cluster. It involves using the kubectl command-line tool to execute commands.

Create a Namespace to isolate resources using below command:
```
kubectl create namespace <namespace-name>
```

Create a Role with the required permissions in the specified namespace.

Create a YAML file with below configurations:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: <namespace-name>
  name: <role-name>
rules:
- apiGroups: [""]
  resources: ["pods", "services", "configmaps", "secrets"]
  verbs: ["get", "list", "watch", "create", "update", "delete"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "watch", "create", "update", "delete"]
- apiGroups: ["autoscaling"]
  resources: ["horizontalpodautoscalers"]
  verbs: ["get", "list", "watch", "create", "update", "delete"]
#- apiGroups: ["*"]
#  resources: ["*"]
#  verbs: ["*"]

Once created, apply the YAML file to the cluster using below command:

kubectl apply -f <filename>.yaml

Create a Service Account in the specified namespace using below command:

kubectl create serviceaccount <role-name>-sa --namespace <namespace-name>

Bind the Role to the Service Account using a RoleBinding.

Create a YAML file with below configurations:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: <role-name>-rolebinding
  namespace: <namespace-name>
subjects:
- kind: ServiceAccount
  name: <role-name>-sa
  namespace: <namespace-name>
roleRef:
  kind: Role
  name: <role-name>
  apiGroup: rbac.authorization.k8s.io

Once created, apply the YAML file to the cluster using below command:

kubectl apply -f <filename>.yaml

Create the Service Account Token for specified duration using below command:
```
kubectl create token <role-name>-sa -n <namespace-name> --duration=8760h
```

Create the Kubernetes Config File.

Use the following template to create a custom Kubernetes Config file for restricted authorization:

apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority-data: <certificate-authority-data>
    server: https://kuberneteshost:port
  name: <k8s-cluster-name>
contexts:
- context:
    cluster: <k8s-cluster-name>
    user: <service-account-username>
  name: <service-account-username>@<k8s-cluster-name>
current-context: <service-account-username>@<k8s-cluster-name>
preferences: {}
users:
- name: <service-account-username>
  user:
    token: <service-account-token>

Once created, apply the YAML file to the cluster using below command:

kubectl apply -f <filename>.yaml

Variables to Replace:

<namespace-name>: Namespace name to be created.
<role-name>: Role name to be created.
<certificate-authority-data>: Base64 encoded certificate authority data of k8s.
<k8s-cluster-name>: Name of your Kubernetes cluster.
<kuberneteshost>:<port>: The URL and port of your Kubernetes API server.
<service-account-username>: The name of the Service Account.
<service-account-token>: The token generated for the Service Account.

Model App Name

Provide a name for the MLflow model to uniquely identify it within the cluster.

👉

Model app name cannot be changed in case of Foundation models.

Required CPU (Cores)

Specify the CPU cores required for running the model image. For example, if you specify 2 cores, the model image will be allocated 2 CPU cores for execution.

Ephemeral Storage (MiB)

Allocate a temporary storage space for the model image during execution. This parameter specifies the ephemeral storage (in Mebibytes, or MiB) required for MLflow model images.

GPU Required

Select if the model requires GPU during execution.

GPU Type & Core

Please choose from NVIDIA or AMD GPUs depending on your cluster configuration and provide the required cores.

💡

Currently, Gathr supports model deployment on NVIDIA GPU-enabled Kubernetes clusters. Support for AMD GPUs will be added soon.

If you are using a GPU, make sure that it is already enabled on the Kubernetes cluster.

Verify GPU Plugin: (At Kubernetes Cluster Level)

Use below command on the Kubernetes node to check for the device plugin pod.

kubectl get pods -n kube-system | grep nvidia-device-plugin

Expected output:

nvidia-device-plugin-daemonset-fn2qc 1/1 Running 78 (3h7m ago) 97d

If you don’t get this output, you may need to install or configure the appropriate device plugin for your GPU type.

Verify GPU Plugin: (At Kubernetes Node Level)

Use below command on the Kubernetes node to list the nodes and check for GPU resources.

kubectl describe node <node-name> | grep -i gpu

Expected output:

nvidia.com/gpu: 1

If you don’t get this output, you may need to install or configure the appropriate device plugin for your GPU type.

Note: If the GPU has only one core and it is fully utilized by one model, then autoscaling or running multiple models simultaneously on the same GPU may not be feasible. The single core can only handle one task at a time efficiently.

For steps to Install and Configure NVIDIA GPU Driver, expand this section.

Steps for NVIDIA GPU Driver Installation and Configuration: In Docker and Kubernetes.
Enable GPU Support in Kubernetes
Once you have configured NVIDIA GPU Driver on all the GPU nodes in your cluster, you can enable GPU support by deploying the following Daemonset:
```
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.15.0/deployments/static/nvidia-device-plugin.yml
```
Note: This is a simple static daemonset meant to demonstrate the basic features of the nvidia-device-plugin.

Run Sample GPU Jobs

With the daemonset deployed, NVIDIA GPUs can now be requested by a container using the nvidia.com/gpu resource type:

Below is example of sample vector add Pod that uses GPU.

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  restartPolicy: Never
  containers:
    - name: cuda-container
      image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 GPU
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
EOF


$ kubectl logs gpu-pod
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

WARNING: If you don’t request GPUs when using the device plugin with NVIDIA images all the GPUs on the machine will be exposed inside your container.

Troubleshooting

You may get below errors while setting up.

NVIDIA Device Plugin DaemonSet is showing running 1/1 but not starting properly, its pod logs are below:

I0507 16:48:49.066195       1 main.go:279] Retrieving plugins.
W0507 16:48:49.066224       1 factory.go:31] No valid resources detected, creating a null CDI handler
I0507 16:48:49.066259       1 factory.go:104] Detected non-NVML platform: could not load NVML library: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
I0507 16:48:49.066288       1 factory.go:104] Detected non-Tegra platform: /sys/devices/soc0/family file not found
E0507 16:48:49.066306       1 factory.go:112] Incompatible platform detected
E0507 16:48:49.066311       1 factory.go:113] If this is a GPU node, did you configure the NVIDIA Container Toolkit?
E0507 16:48:49.066315       1 factory.go:114] You can check the prerequisites at:
https://github.com/NVIDIA/k8s-device-plugin#prerequisites
E0507 16:48:49.066319       1 factory.go:115] You can learn how to set the runtime at:
https://github.com/NVIDIA/k8s-device-plugin#quick-start
E0507 16:48:49.066323       1 factory.go:116] If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes
I0507 16:48:49.066327       1 main.go:308] No devices found. Waiting indefinitely.

Solution:

It comes when you forget to update default container runtime property to nvidia in /etc/containerd/config.toml file, please cross check the below property.

default_runtime_name = “nvidia”

Once done, please restart containerd process.

Required Memory (MiB)

Specify the memory required for running model image, measured in Mebibytes (MiB). For example, if you specify 4096 MiB, the model image will be allocated 4 GB of memory.

Target Port

Model is getting deployed on to cluster on this port.

For ML Models: The model will be deployed on the specified port on the Kubernetes cluster. This port will be directly accessible for consuming the model.
For LLM Models: Instead of being accessed directly, the port should be mapped to the AI Gateway. The AI Gateway acts as an intermediary, managing requests and routing them to the appropriate service or model.

The same port serves different purposes based on the type of model.

The port can range between 30000 - 32000.

Make sure the specified port is available.

Auto-Scaling Required

Select if auto-scaling is required. It allows automatic scale-in/scale-out of PODs within the Kubernetes cluster based on resource utilization.

To understand the Prerequisites for Kubernetes AutoScaling, expand this section.

To enable autoscaling in Kubernetes, it’s essential to have the Metrics Server installed, as it provides the necessary data on CPU and RAM usage for scaling decisions.

Steps to ensure you’re set up for auto-scaling:

Verify the Metrics Server is active by running the command:
```
kubectl get pod -n kube-system | grep metrics
```
The expected output should confirm the Metrics Server is running, similar to:
metrics-server-<unique-identifier> 1/1 Running 52 (4h7m ago) 64d

If the Metrics Server is not installed, you’ll need to set it up. Start by creating a file named metricserver.yaml with below configuration details. (Or, click here to download the YAML file.)

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=10250
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        image: registry.k8s.io/metrics-server/metrics-server:v0.7.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 10250
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          initialDelaySeconds: 20
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
          seccompProfile:
            type: RuntimeDefault
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
      - emptyDir: {}
        name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100

Once you have your YAML file ready, apply it using the command:
```
kubectl apply -f metricserver.yaml
```

For the latest version of the Metrics Server, you can refer to the official Kubernetes SIGs repository.

By following these steps, you’ll have the Metrics Server up and running, ready to support the autoscaling of your Kubernetes workloads based on their resource consumption.

Instances (POD Replicas)

Provide minimum and maximum POD replicas to scale-in/scale-out.

Auto-Scale CPU (%)

Please provide CPU percentage threshold for auto-scaling.

Auto-Scale Memory (%)

Please provide memory percentage threshold for auto-scaling.

Send Email Notification

Enable this field to share email notifications to the specified email addresses about model deployment status updates.

Email ID

Provide comma-separated email IDs to which the model deployment status updates should be shared.

Deploy Model

Deploy the MLflow model onto the orchestration platform by updating all necessary configuration fields. This action initiates the model’s deployment process.

A successful model deployment usually consists of these states:

The process of deploying a model begins with the submission of a deployment request, followed by the commencement of Docker image creation. This is the first step in a six-stage sequence.
Once the Docker image is successfully created, the system proceeds to generate the necessary Docker secrets. With the secrets in place, the Docker image can then be deployed.
The subsequent step involves setting up the model service, which is essential for the model’s operation.
Finally, the creation of the Model Horizontal Pod Autoscaler (HPA) is initiated, leading up to successful deployment of the model.

Un-Deploy

To remove the model from the orchestration platform, select this option. It ensures the complete termination of the deployment process.

Resume Deployment

Should the model deployment process encounter a failure, this option allows you to resume deployment from the precise point of interruption on the orchestration platform.

Map Model Version Endpoint with Gateway

After deployment, map the model endpoint with AI Gateway for easy accessibility and integration.

👉

The mapping of Endpoint and route is applicable exclusively to LLM models. For ML models, only the Model Serving URL (Endpoint) is required during the configuration of the Model Invoke processor, hence the mapping option will be disabled.

👉

Make sure that the AI Gateway is up and running.

Steps to map a model version endpoint with AI gateway:

Click on Map Model Version’s Endpoint with Gateway to open the mapping screen.
Follow step-by-step instructions to complete mapping.
Step 1: Copy the model service endpoint URL.
The next operation is done outside Gathr:
Access the server location where MLflow AI Gateway is installed and update the config.yaml of AI Gateway with the appropriate endpoint details based on the model type.
Example of Completion Model:
```
- name: bart
    endpoint_type: llm/v1/completions
    model:
      provider: mlflow-model-serving
      name: finetuned-bart
      config:
        model_server_url: http://10.80.72.197:30987
```
Example of Embedding Model:
```
  - name: embedding_all_mpnet_base_v2
    endpoint_type: llm/v1/embeddings
    model:
      provider: mlflow-model-serving
      name: all_mpnet_base_v2_embedding
      config:
        model_server_url: http://10.80.72.197:30675
```
Step 2: Proceed to map the AI Gateway Route within Gathr.
- The Model Service Endpoint will be pre-filled.
- Enter or select the Gateway URL associated with the endpoint updated in Step 1.
- The Select Route dropdown will populate based on the Gateway URL. Choose the appropriate route.
- Click on Map & Next to save the Gateway URL and selected route combination for the model version.
Step 3: Click Store Model Version Details to store mapping details. The latest saved details can be populated later in the Model Invoke processor.

Store Model Version Details to Populate in Model Invoke Processor

Store model details from the MLflow model versions listing page.

Auto-populate Model Invoke Processor’s configuration values for Model Selection tab with this feature.

Download the Orchestration Platform Deployment YAML File

Obtain deployment configurations in YAML format to refer the deployment details.

Compare Model Versions

Compare different versions of the selected MLflow model to assess parameters, metrices, and changes over time.

View Model Version Details

Dive deeper into the specifics of each version, including metadata, metrices, and various other details.

Delete Version

Delete specific versions of the MLflow model that are no longer needed or have been superseded.

This action deletes the model version from Gathr as well as MLflow tracking server.

If you have any feedback on Gathr documentation, please email us!

MLflow Models Listing Page

View Model Versions #

Deployment Configurations #

Cluster DNS #

Namespace #

Model App Name #

Required CPU (Cores) #

Ephemeral Storage (MiB) #

GPU Required #

GPU Type & Core #

Required Memory (MiB) #

Target Port #

Auto-Scaling Required #

Instances (POD Replicas) #

Auto-Scale CPU (%) #

Auto-Scale Memory (%) #

Send Email Notification #

Email ID #

Deploy Model #

Un-Deploy #

Resume Deployment #

Map Model Version Endpoint with Gateway #

Store Model Version Details to Populate in Model Invoke Processor #

Download the Orchestration Platform Deployment YAML File #

Compare Model Versions #

View Model Version Details #

Delete Version #

View Model Versions

Deployment Configurations

Cluster DNS

Namespace

Model App Name

Required CPU (Cores)

Ephemeral Storage (MiB)

GPU Required

GPU Type & Core

Required Memory (MiB)

Target Port

Auto-Scaling Required

Instances (POD Replicas)

Auto-Scale CPU (%)

Auto-Scale Memory (%)

Send Email Notification

Email ID

Deploy Model

Un-Deploy

Resume Deployment

Map Model Version Endpoint with Gateway

Store Model Version Details to Populate in Model Invoke Processor

Download the Orchestration Platform Deployment YAML File

Compare Model Versions

View Model Version Details

Delete Version