Apache Spark Compute Setup in Gathr
- Steps to Add Apache Spark Account
- Account Type
- Name
- Tags
- Spark cluster manager
- Spark Home
- Spark Python Path
- Resource Manager Host
- Resource Manager Webapp Port
- Resource Manager Port
- ResourceManager High Availability
- ResourceManager HA Logical Names
- ResourceManager HA Hosts
- Spark Master URL
- Spark UI Host
- Spark UI Port
- Spark Python Path
- Enable Spark Authentication
- Authentication Secret
- Spark Master URL
- Namespace
- Image Name
- Service Account Name
- Image Pull Secrets
- API Token
- Spark Python path
- Enabled SSL
- Keystore Select Option
- Keystore Path
- Key Password
- Keystore Password
- Protocol
- Store Type
- TrustStore Select Option
- TrustStore Path
- Truststore Password
- Yarn Kerberos Enable
- HDFS Connection
- Gathr JAAS Login Configuration File Path
- Kerberos Sections
- Kerberos Configuration File Override
In this article
- Steps to Add Apache Spark Account
- Account Type
- Name
- Tags
- Spark cluster manager
- Spark Home
- Spark Python Path
- Resource Manager Host
- Resource Manager Webapp Port
- Resource Manager Port
- ResourceManager High Availability
- ResourceManager HA Logical Names
- ResourceManager HA Hosts
- Spark Master URL
- Spark UI Host
- Spark UI Port
- Spark Python Path
- Enable Spark Authentication
- Authentication Secret
- Spark Master URL
- Namespace
- Image Name
- Service Account Name
- Image Pull Secrets
- API Token
- Spark Python path
- Enabled SSL
- Keystore Select Option
- Keystore Path
- Key Password
- Keystore Password
- Protocol
- Store Type
- TrustStore Select Option
- TrustStore Path
- Truststore Password
- Yarn Kerberos Enable
- HDFS Connection
- Gathr JAAS Login Configuration File Path
- Kerberos Sections
- Kerberos Configuration File Override
In Gathr, Apache Spark supports three types of cluster managers including Standalone, Yarn and Kubernetes.
Below are the steps to add Apache Spark account in Gathr.
Steps to Add Apache Spark Account
Under User Profile, select the Compute Setup tab.
Click the Add New Account option and provide the following details:
Account Type
Select the type of account (Apache Spark) you are registering for.
Name
Provide a unique name for the account to be registered.
Tags
Provide tags for the account. (Optional)
Spark cluster manager
Define the Spark cluster manager. Options available are: Standalone, Yarn and Kubernetes.
Spark Home
Provide the spark home path of the local system.
Upon selecting YARN, below options will be available:
Spark Python Path
Provide the path where pyspark libraries are available.
Resource Manager Host
Provide the resource manager hostnames used for spark-yarn deployment.
Resource Manager Webapp Port
Provide the resource manager UI port.
Resource Manager Port
Provide the resource manager internal port.
ResourceManager High Availability
Select the checkbox for high availability of ResourceManager.
ResourceManager HA Logical Names
Provide the logical names of HA ResourceManager.
ResourceManager HA Hosts
Provide host names of HA ResourceManager.
Upon selecting Standalone, below options will be available:
Spark Master URL
Specify the Spark Master URL e.g., spark://host:port
Spark UI Host
Enter the hostname or IP where the Spark UI is accessible.
Spark UI Port
Enter the port number for accessing the Spark UI.
Spark Python Path
Provide the path where pyspark libraries are available.
Enable Spark Authentication
Select the checkbox to enable the spark authentication.
Authentication Secret
Specify the shared secret for Spark authentication to secure communication between components.
Upon selecting Kubernetes, below options will be available:
Spark Master URL
Specify the Spark Master URL e.g., spark://host:port
Namespace
Provide namespace of the cluster.
Image Name
Provide spark base image to launch k8 spark pods.
Service Account Name
Provide service account name that will be used to submit gathr pipeline.
Image Pull Secrets
Provide image pull secrets.
API Token
Provide K8’S spark API token.
Spark Python path
Provide the path where pyspark libraries are available.
Provide details for the below options:
Enabled SSL
If selected as true, keystore/certificate file needs to be uploaded or the keystore file path should be provided in the respective configuration fields.
Keystore Select Option
Keystore Path
Provide the keystore path.
Key Password
This password protects a specific private key inside the keystore.
Keystore Password
Provide the keystore password.
Protocol
Select the SSL protocol. Example: SSL, SSL_PLAINTEXT or SSL_SASL
Store Type
Select the keystore type. Example: JKS, PKCS12, or other formats supported by your setup.
TrustStore Select Option
Select one of the options available from the drop-down list. i.e., Upload Trust Store File (SSL in JKS format) or Specify Trust Store File Path.
TrustStore Path
Provide the TrustStore path.
Truststore Password
Provide the truststore password.
The below options options will be available when Yarn is selected:
Yarn Kerberos Enable
Check the option to enable yarn kerberos.
HDFS Connection
Select the HDFS connection.
Gathr JAAS Login Configuration File Path
Provide Yarn kerberos JAAS login config file path.
Kerberos Sections
Select the Kerberos module for authentication.
Kerberos Configuration File Override
Select the checkbox to enable file override in kerberos configuration.
If you have any feedback on Gathr documentation, please email us!