Apache Spark Compute Setup in Gathr

In Gathr, Apache Spark supports three types of cluster managers including Standalone, Yarn and Kubernetes.

Below are the steps to add Apache Spark account in Gathr.


Steps to Add Apache Spark Account

Under User Profile, select the Compute Setup tab.

Click the Add New Account option and provide the following details:

Account Type

Select the type of account (Apache Spark) you are registering for.

Name

Provide a unique name for the account to be registered.

Tags

Provide tags for the account. (Optional)

Spark cluster manager

Define the Spark cluster manager. Options available are: Standalone, Yarn and Kubernetes.

Spark Home

Provide the spark home path of the local system.

Upon selecting YARN, below options will be available:

Spark Python Path

Provide the path where pyspark libraries are available.

Resource Manager Host

Provide the resource manager hostnames used for spark-yarn deployment.

Resource Manager Webapp Port

Provide the resource manager UI port.

Resource Manager Port

Provide the resource manager internal port.

ResourceManager High Availability

Select the checkbox for high availability of ResourceManager.

ResourceManager HA Logical Names

Provide the logical names of HA ResourceManager.

ResourceManager HA Hosts

Provide host names of HA ResourceManager.

Upon selecting Standalone, below options will be available:

Spark Master URL

Specify the Spark Master URL e.g., spark://host:port

Spark UI Host

Enter the hostname or IP where the Spark UI is accessible.

Spark UI Port

Enter the port number for accessing the Spark UI.

Spark Python Path

Provide the path where pyspark libraries are available.

Enable Spark Authentication

Select the checkbox to enable the spark authentication.

Authentication Secret

Specify the shared secret for Spark authentication to secure communication between components.

Upon selecting Kubernetes, below options will be available:

Spark Master URL

Specify the Spark Master URL e.g., spark://host:port

Namespace

Provide namespace of the cluster.

Image Name

Provide spark base image to launch k8 spark pods.

Service Account Name

Provide service account name that will be used to submit gathr pipeline.

Image Pull Secrets

Provide image pull secrets.

API Token

Provide K8’S spark API token.

Spark Python path

Provide the path where pyspark libraries are available.

Provide details for the below options:

Enabled SSL

If selected as true, keystore/certificate file needs to be uploaded or the keystore file path should be provided in the respective configuration fields.

Keystore Select Option

Keystore Path

Provide the keystore path.

Key Password

This password protects a specific private key inside the keystore.

Keystore Password

Provide the keystore password.

Protocol

Select the SSL protocol. Example: SSL, SSL_PLAINTEXT or SSL_SASL

Store Type

Select the keystore type. Example: JKS, PKCS12, or other formats supported by your setup.

TrustStore Select Option

Select one of the options available from the drop-down list. i.e., Upload Trust Store File (SSL in JKS format) or Specify Trust Store File Path.

TrustStore Path

Provide the TrustStore path.

Truststore Password

Provide the truststore password.

The below options options will be available when Yarn is selected:

Yarn Kerberos Enable

Check the option to enable yarn kerberos.

HDFS Connection

Select the HDFS connection.

Gathr JAAS Login Configuration File Path

Provide Yarn kerberos JAAS login config file path.

Kerberos Sections

Select the Kerberos module for authentication.

Kerberos Configuration File Override

Select the checkbox to enable file override in kerberos configuration.

Top