Manage EMR Clusters
- Create Cluster
- Configuration to Create Cluster
- Cluster Scaling and Provisioning and Concurrency
- Software Configuration
- Tags
- Master Nodes
- Core Nodes
- Task Nodes
- SSH
- Bootstrap Actions
- Create Cluster Template
- Cluster(s) Listing page
In this article
- Create Cluster
- Configuration to Create Cluster
- Cluster Scaling and Provisioning and Concurrency
- Software Configuration
- Tags
- Master Nodes
- Core Nodes
- Task Nodes
- SSH
- Bootstrap Actions
- Create Cluster Template
- Cluster(s) Listing page
Amazon EMR services are used to manage EMR cluster(s) from the Gathr application.
The Cluster Management page in Gathr supports features where superuser and workspace users can manage Amazon EMR clusters.
You can Create a Cluster→ or Create Cluster Template→.
Create Cluster
From the main menu navigate to the Settings > Advanced > Cluster Management page.

The EMR Cluster will be available under the Cluster Management, once the AWS EMR account is added in the Compute Setup →.
To create a cluster click the Create New Cluster option.

Configuration to Create Cluster
Provide the configuration details to create a cluster as mentioned below.

Name & Type
Option to provide a unique name of the cluster. The cluster type field is auto selected.
Network
Select VPC and subnet for the cluster to be launched from where Gathr is accessible.
Security
Select security group to be launched that has the required access to communicate with Gathr. Select security configuration to configure data encryption, Kerberos and S3 authorizartio.
IAM Roles
Select IAM Role to attach to EC2 instances in EMR pipeline cluster. Select job flow role to attach to EC2 instances in EMR pipeline cluster. Select Job Flow role to attach to EC2 instances in EMR pipeline cluster. Select IAM role to auto scale EMR cluster.
Root EBS Volume
Provide master EBS volume for the cluster. (EBS volume for core and task nodes will be same as Master EBS volume).
Cluster Scaling and Provisioning and Concurrency
EMR Managed Scaling
Check option to allow EMR to automatically adjust the number of EC2 instances required in core and task nodes based on workload.
Minimum Units
Provide the minimum number of core or task units allowed in a cluster. Minimum value is 1.
Maximum Units
Provide the maximum number of core or task units allowed in a cluster. Minimum value is 1.
Maximum On-Demand Limit
Provide the maximum allowed core or task units for On-Demand market type in a cluster. If this parameter is not specified, it defaults to maximum units value. Minimum value is 0.
Maximum Core Units
Provide the maximum allowed core nodes in a cluster. If this parameter is not specified, it defaults to maximum units value. Minimum value is 1.
Auto Termination
Check this option for auto termination of cluster. Once the cluster becomes idle, it will terminate after the duration specified. Choose a minimum of one minute or a maximum of 24 hours value.
Steps Concurrency
Check this option to enable running multiple steps concurrently. Once the last step completes, the cluster will enter a waiting state.
Maximum Steps
The maximum steps that can run at a time. Enter value between 2-256.
Software Configuration
Release
Select EMR for release version i.e, emr-7.1.0.
Software Configs
Select software configuration. You can choose the configuration options by clicking the checkboxes against them.
Custom AMI Id
Select or provide the ID of a Customer Amazon Linux AMI for the choosen cluster.
Enter Configuration
Provide configuration for any additional yarn properties to the cluster.
Tags
ADD TAG
Customized tags can be added for the EMR cluster. Provide key-value pair as tags.
Master Nodes
Instance Type
Select instance type for the master node. 30.5 GB Memory, 4 vCores, EBS only.
Instance Count
Provide instance count for the master node. To launch a cluster in HA mode, the instance count value should be 3.
Volume Type
Provide volume type for the master node.
Volumes per Instance
Provide number of EBS volume for the master node.
EBS Volume
Provide EBS volume size for the master node. EBS volume should be between 1 - 16384 GiB.
IOPS
Provide IOPS per volume for the master node.
Node Type
Provide EC2 instance type.
Spot Bid Price
Bid price for spot-instances.
Core Nodes
Instance Type
Select instance type for core node. 30.5 GB Memory,4vCores, EBS only.
Instance Count
Provide instance count for the core node.
Volume Type
Provide volume type for the core node.
Volumes per Instance
Provide number of EBS volume for the core node.
EBS Volume
Provide EBS volume size for the core node. EBS volume should be between 1 - 16384 GiB.
IOPS
Provide IOPS per volume for the core node.
Node Type
Provide EC2 instance type.
Spot Bid Price
Bid price for spot-instances.
Enable Autoscaling
Select the checkbox to enable auto scaling.

Provide details as explained below.
Minimum Nodes
Provide minimum number of nodes for auto scaling. Provide values for Scale Out Rules and Scale In Rules. You can also add further Rule(s).
Maximum Nodes
Maximum no of nodes for auto scaling.
Provide values for Scale Out Rules and Scale In Rules. You can also add further Rule(s).
Scale Out Rules
ADD RULE
Click to add additional scale out rules.
Rule Name
Provide name of the rule.
Add
Provide the number of EC2 instances to be added each time the autoscaling rule is triggered.
if
Choose the AWS CloudWatch metric that should be used to trigger autoscaling.
is
Enter the threshold value and condition for the CloudWatch metric selected above.
for
Enter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period.
Cooldown period
The time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed.
Scale In Rules
ADD RULE
Click to add additional scale in rules.
Rule Name
A name for the scale in rule should be provided.
Terminate
Provide the number of EC2 instances to be terminated each time the autoscaling rule is triggered.
if
Choose the AWS CloudWatch metric that should be used to trigger autoscaling.
is
Enter the threshold value and condition for the CloudWatch metric selected above.
for
Enter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period.
Cooldown period
The time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed.
Task Nodes
Instance Type
Select instance type for task node. 30.5 GB Memory,4vCores, EBS only.
Instance Count
Provide instance count for the task node.
Volume Type
Provide volume type for the task node.
Volumes per Instance
Provide number of EBS volume for the task node.
EBS Volume
Provide EBS volume size for the task node. EBS volume should be between 1 - 16384 GiB.
IOPS
Provide IOPS per volume for the task node.
Node Type
Provide EC2 instance type.
Spot Bid Price
Bid price for spot-instances.
Enable Autoscaling
Select the checkbox to enable auto scaling.
Minimum Nodes
Provide minimum number of nodes for auto scaling. Provide values for Scale Out Rules and Scale In Rules. You can also add further Rule(s).
Maximum Nodes
Maximum no of nodes for auto scaling.
Provide values for Scale Out Rules and Scale In Rules. You can also add further Rule(s).
Scale Out Rules
ADD RULE
Click to add additional scale out rules.
Rule Name
Provide name of the rule.
Add
Provide the number of EC2 instances to be added each time the autoscaling rule is triggered.
if
Choose the AWS CloudWatch metric that should be used to trigger autoscaling.
is
Enter the threshold value and condition for the CloudWatch metric selected above.
for
Enter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period.
Cooldown period
The time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed.
Scale In Rules
ADD RULE
Click to add additional scale in rules.
Rule Name
A name for the scale in rule should be provided.
Terminate
Provide the number of EC2 instances to be terminated each time the autoscaling rule is triggered.
if
Choose the AWS CloudWatch metric that should be used to trigger autoscaling.
is
Enter the threshold value and condition for the CloudWatch metric selected above.
for
Enter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period.
Cooldown period
The time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed.
SSH
EC2 Key Pair name
Select the pem file to SSH into cluster.
Bootstrap Actions
S3 Path
Provide S3 path. i.e, s3a://bucket/path
Click CREATE CLUSTER once the details are provided.
Create Cluster Template
Option to save the cluster configuration details as a template is available in Gathr. This feature allow users to save configurations by creating a template and use the template to create cluster(s).

Click Create Cluster Template option and provide the details by clicking Configuration to Create Cluster→.
Cluster(s) Listing page
On the Cluster Management listing page, all the created templates and cluster are listed.

The listing page displays the below details.
Logo of the Amazon EMR services used to manage EMR cluster(s) from the Gathr application.
Account Name
The account name provided while adding account in Compute Setup →. The same account was linked to the project which reflects here. For details click, Steps to link a custom compute environment to a project →.
Filter By
Option to filter out the created cluster/template by Name, ID, Cluster Type, Template Type, Status.
Search Bar
Option to search the created cluster/template.
Favourite
Option to mark specific cluster as favourite.
Reset
Option to reset all the filter options to reload the list.
Sort By
Option to Sort the listed clusters/templates by Name, Status, Creation Time and Updation Time.
Reload List Data
You can reload/refresh the GCP cluster listing by clicking at the Reload List Data option available on the Cluster List View page.
Save User Prefrences
Select a few filters and Sort By. Click on Save User Preferences button. The selected prefrences will be saved and visible on the user’s interface for future use.
Import Cluster from EMR
Clusters created on EMR can be imported as a cluster template or a cluster in Gathr.
If you create cluster from EMR console, then you have an option to Import the cluster at Gathr UI using the Import Cluster from EMR option.

If you have created a cluster in EMR console and you want to use that cluster in Gathr for running the pipelines, then click the Import Cluster from EMR option.
Upon clicking this option, you will be able to view the cluster created in EMR, on Gathr UI and you will be able to register the same cluster in Gathr.
Select EMR Cluster ID and Click Import as Cluster Template/Import as Cluster.
Listed Cluster/Template
The listed cluster/template has the below details.
Name
Name of the cluster.
Status
Current status of the cluster. i.e., RUNNING, STOPPED, SAVED, DELETED.
Pipelines on Cluster
The existing pipelines on the cluster.
Cluster Type
The type of cluster used i.e., Long Running or Job cluster.
Launch Time
Cluster launch time. Example: 2023-10-12 06:12:21 UTC
Duration
Running duration of the cluster. Example: 2 Hours 42 Minutes.
Start or Stop Cluster
You can start/stop a cluster that is created by clicking at the Start/Stop option available under on the listed cluster.
Options available on the Ellipses are explained below.
The below image shows the options available for the listed Cluster.

The below image shows the options available for the listed Template.

Refresh
Option to get the latest status of the cluster.
View
Option to get the detailed information of the clusters.
Details
Details of cluster including Account, Cluster Type, VPC, Log URL are available.
Basic Configuration
Under this tab the basic configuration details of cluster are provided which includes the following:
- Cluster ID
- Subnet ID
- Security Group
- Security Configuration
- EMR Service Role
- EC2 Instance Profile Role
- Auto Scaling Role
- Custom AMI ID
- Root EBS Volume
- Duration
- Auto-termination
- Steps Concurrency
- Maximum Steps
Software Configuration
Under this tab the software configuration details of cluster are provided which includes the following:
- Release
- Configuration
Master Nodes Attributes
Under this tab the Master Nodes Attributes of cluster are provided which includes the following:
- Instance Type
- Instance Count
- Volume Type
- EBS Volume
- IOPS
- Volumes Per Instance
- Node Type
- Spot Bid Price
Core Nodes Attributes
Under this tab the Core Nodes Attributes of cluster are provided which includes the following:
- Instance Type
- Instance Count
- Volume Type
- EBS Volume
- IOPS
- Volumes Per Instance
- Node Type
- Spot Bid Price
- Enable Autoscaling
Task Nodes Attributes
Under this tab the Task Nodes Attributes of cluster are provided which includes the following:
- Instance Type
- Instance Count
- Volume Type
- EBS Volume
- IOPS
- Volumes Per Instance
- Node Type
- Spot Bid Price
- Enable Autoscaling
Tags
Customized tags can be added for the EMR cluster. Provide value and Action(s) for tags.
SSH
Under this tab the SSH of cluster are provided which include the following:
- SSH Key Name 
- EC2 Key Pair name: Select the pem file to SSH into cluster. 
Bootstrap Actions
Under this tab the Bootstrap Actions of cluster are provided which include the following:
- S3 Path: Option to provide S3 path for bootstrap script locations. You can provide multiple script locations separated by semi-colon.
Application
The details of specific cluster utilized in application(s) are available under the tab.
The details include Project Name, Application Type (Advanced ETL, Ingestion, Data Assests, Data Validations), Application Name, No. of Applications.
Edit
You can edit a cluster/template, by clicking at the Edit option.
Clone
You can clone a cluster, by clicking on the Clone option.
Delete
You can Delete a cluster, by clicking at the Delete option under Action tab.
On a running cluster, if no pipelines are configured and you want to delete the cluster, then you will have two options to delete:
- Delete from EMR, where the cluster will be deleted from EMR and continue to remain in the Gathr database. So, later the same cluster can be started. 
- Delete cluster from both EMR and Gathr and the cluster will be removed from both. 
Mark as Favourite
Option to mark the specific cluster as favourite. Once a cluster is marked as favourite, you can unmark as favourite.
Log URL
Option to redirect user to the log url page of EMR bucket.
If you have any feedback on Gathr documentation, please email us!