Scala Processor

Scala is a general-purpose programming language built on the Java virtual machine.

It can interact with data that is stored in a distributed manner. Also, it can be used to process and analyze big data.

The Scala processor can be used for writing custom code in Scala language.

Please email to Gathr Support to enable the Scala Processor.

There are several Code Snippets that are available in the application and the same are explained in this topic to get you started with the Scala processor.

Processor Configuration

Configure the processor parameters as explained below.

Package Name

Name of the package for the Scala code class.

Class Name

Name of the Class for the Scala code.

Imports

Import statements for the Scala code.

Input Source

Input Source for the Scala Code.

Scala Code

Scala code to perform the operations on the JSON RDD object.

Ask AI Assistant

Use the AI assistant feature to simplify the creation of Scala queries.

It allows you to generate complex Scala queries effortlessly, using natural language inputs as your guide.

Describe your desired expression in plain, conversational language. The AI assistant will understand your instructions and transform them into a functional SQL query.

Tailor queries to your specific requirements, whether it’s for data transformation, filtering, calculations, or any other processing task.

Note: Press Ctrl + Space to list input columns and Ctrl + Enter to submit your request.

Input Example:

Select those records whose last_login_date is less than 60 days from current_date.

Jar Upload

Jar file is generated when you build Scala code.

Here, you can upload the third party jars so that the API’s can be utilized in Scala code by adding the import statements.

Notes

Optionally, enter notes in the Notes → tab and save the configuration.

Code Snippets

Described below are some sample Scala code use cases:

Add a new column with constant value to existing dataframe
Add a new column with random value to existing dataframe
Add a new column using expression with existing columns
Transform an existing column
Filter data on basis of some condition

Add a new column with constant value to existing dataframe

Description

This script demonstrates how to add a column with a constant value.

In the sample code given below, a column Constant_Column with value Constant_Value is added to an existing dataframe.

Sample Code

val sparkSession=dataset.sparkSession
import sparkSession.implicits._
import org.apache.spark.sql.functions._
dataset.withColumn("Constant_Column", lit("Constant_Value"))

Example

Input Data Set

CustomerId	Surname	CreditScore	Geography	Age	Balance	EstSalary
15633059	Fanucci	413	France	34	0	6534.18
15604348	Allard	710	Spain	22	0	99645.04
15693683	Yuille	814	Germany	29	97086.4	197276.13
15738721	Graham	773	Spain	41	102827.44	64595.25

Output Data Set

CustomerId	Surname	CreditScore	Geography	Age	Balance	EstSalary	Constant_Column
15633059	Fanucci	413	France	34	0	6534.18	Constant_Value
15604348	Allard	710	Spain	22	0	99645.04	Constant_Value
15693683	Yuille	814	Germany	29	97086.4	197276.13	Constant_Value
15738721	Graham	773	Spain	41	102827.44	64595.25	Constant_Value

Add a new column with random value to existing dataframe

Description

This script demonstrates how to add a column with random values.

Here, column Random_Column is added with random integer values.

Sample Code

val sparkSession=dataset.sparkSession
import sparkSession.implicits._
import org.apache.spark.sql.functions.rand
dataset.withColumn("random", (rand * 100).cast("bigint"))

Example

Input Data Set

CustomerId	Surname	CreditScore	Geography	Age	Balance	EstSalary
15633059	Fanucci	413	France	34	0	6534.18
15604348	Allard	710	Spain	22	0	99645.04
15693683	Yuille	814	Germany	29	97086.4	197276.13
15738721	Graham	773	Spain	41	102827.44	64595.25

Output Data Set

CustomerId	Surname	CreditScore	Geography	Age	Balance	EstSalary	Random_Column
15633059	Fanucci	413	France	34	0	6534.18	0.0241309661
15604348	Allard	710	Spain	22	0	99645.04	0.5138384557
15693683	Yuille	814	Germany	29	97086.4	197276.13	0.2652246569
15738721	Graham	773	Spain	41	102827.44	64595.25	0.8454138247

Add a new column using expression with existing columns

Description

This script demonstrates how to add new column using existing columns.

Here, column Transformed_Column is added by multiplying columns EstimatedSalary with Tenure.

Sample Code

val sparkSession=dataset.sparkSession
import sparkSession.implicits._
import org.apache.spark.sql.functions._
dataset.withColumn("Transformed_Column", col("EstimatedSalary") * col("Tenure"))

Example

Input Data Set

CustomerId	Surname	CrScore	Age	Tenure	Balance	EstimatedSalary
15633059	Fanucci	413	34	9	0	6534.18
15604348	Allard	710	22	8	0	99645.04
15693683	Yuille	814	29	8	97086.4	197276.13
15738721	Graham	773	41	9	102827.44	64595.25

Output Data Set

CustomerId	Surname	CrScore	Age	Tenure	Balance	EstimatedSalary	Transformed_Column
15633059	Fanucci	413	34	9	0	6534.18	58807.62
15604348	Allard	710	22	8	0	99645.04	797160.32
15693683	Yuille	814	29	8	97086.4	197276.13	1578209.04
15738721	Graham	773	41	9	102827.44	64595.25	581357.25

Transform an existing column

Description

This script demonstrates how to transform a column.

Here, rounding off values of column Balance and converting to integer.

Sample Code

val sparkSession=dataset.sparkSession
import sparkSession.implicits._
import org.apache.spark.sql.functions._
dataset.withColumn("Balance" , round(col("Balance")).cast("int"))

Example

Input Data Set

CustomerId	Surname	CreditScore	Geography	Gender	Age	Tenure	Balance
15633059	Fanucci	413	France	Male	34	9	0
15604348	Allard	710	Spain	Male	22	8	0
15693683	Yuille	814	Germany	Male	29	8	97086.4
15738721	Graham	773	Spain	Male	41	9	102827.44

Output Data Set

CustomerId	Surname	CreditScore	Geography	Gender	Age	Tenure	Balance
15633059	Fanucci	413	France	Male	34	9	0
15604348	Allard	710	Spain	Male	22	8	0
15693683	Yuille	814	Germany	Male	29	8	97086
15738721	Graham	773	Spain	Male	41	9	102827

Filter data on basis of some condition

Description

This script demonstrates how to filter data on the basis of some condition.

Here, Customers having Age>30 are selected.

Sample Code

val sparkSession=dataset.sparkSession
import sparkSession.implicits._
dataset.select($"*").filter($"Age">30)

Example

Input Data Set

CustomerId	Surname	CreditScore	Geography	Gender	Age
15633059	Fanucci	413	France	Male	34
15604348	Allard	710	Spain	Male	22
15693683	Yuille	814	Germany	Male	29
15738721	Graham	773	Spain	Male	41

Output Data Set

CustomerId	Surname	CreditScore	Geography	Gender	Age
15633059	Fanucci	413	France	Male	34
15738721	Graham	773	Spain	Male	41

If you have any feedback on Gathr documentation, please email us!

Scala Processor

Processor Configuration #

Package Name #

Class Name #

Imports #

Input Source #

Scala Code #

Ask AI Assistant #

Jar Upload #

Notes #

Code Snippets #

Add a new column with constant value to existing dataframe #

Add a new column with random value to existing dataframe #

Add a new column using expression with existing columns #

Transform an existing column #

Filter data on basis of some condition #

Processor Configuration

Package Name

Class Name

Imports

Input Source

Scala Code

Ask AI Assistant

Jar Upload

Notes

Code Snippets

Add a new column with constant value to existing dataframe

Add a new column with random value to existing dataframe

Add a new column using expression with existing columns

Transform an existing column

Filter data on basis of some condition