Functions

Gathr provides different kind of functions like Date, Lookup, String, Math, Array and Miscellaneous functions. Using these functions, you can directly apply computations on the dataset columns and get desired result.

The functions takes columns and/or literals as arguments.

In order to pass column “A” of a dataset “D” as a function argument, pass for example

@{column.D.A}

There are some exceptions where you need to pass the column name directly.

Following are the functions that take the column name directly.

expr(A)

rename_column(A)

In order to pass a “string literal” as argument to function pass it in double quotes for example

"IST"

In order to pass a “number literal” to a function, pass it as it is.

For example, 2 or 1.345

If we have a dataset named schemaName having a string/date/timestamp column date holding the value 2009-03-01 for a particular row, then

add_months@{column.schemaName.date},2) will return 2009-05-01

Note:

All the examples presented in this guide have column values instead of the entire expression i.e. @{column.schemaName.columnName}

add_months@{column.schemaName.date},2) will return 2009-05-01

add_months is a function, @{column.schemaName.date} is an expression and 2009-03-01 is the value of the expression.

add_months(@{column.schemaName.date},2) will be shown as:

add_months(2009-03-01,2)

A detailed description of these functions is given below:

Date Functions

Enables to perform calculations involving dates.

add_months

Calculates the new date by adding the given number of months. 

Input Parameters: Column arg0, int arg1.

arg0 (required) – The string/date/timestamp column to which months are to be added.

arg1 (required)– Number of months to be added.

Output Type

Date Column

Configuration Parameters

No

Returns

Returns the date that is arg1 after arg0.

Throws

Application Exception

Use Case:

add_month

Example

add_months("2009-03-01",2) will return "2009-05-01"

current_date

Computes the current date as a date column.

Input Parameters: There are no function arguments.

Output Type

Date Column

Configuration Parameters

No

Returns

Returns the current date as date column.

Throws

Application Exception

Use Case:

current_Date

Example

current_date() will return the current date.

If used on 2018-06-15 will return 2018-06-15

current_timestamp

Computes the current timestamp as a timestamp column.

Input Parameters: There are no function arguments.

Output Type

Timestamp Column.

Configuration Parameters

No.

Returns

Returns the current timestamp as a timestamp column.

Throws

Application Exception

Use Case:

timestamp

Example

current timestamp() will return the current timestamp.

date_add

Adds the specified number of months in ‘arg1’ to a date time value in ‘arg0’.

Input Parameters: Column arg0, int arg1

arg0 (required) – The string/date/timestamp column to which days to be added.

arg1 (required)– Number of days to be added.

Output Type

Date Column.

Configuration Parameters

No.

Returns

Returns the date that is arg0 after arg1.

Throws

Application Exception

Use Case:

date_add

Example

date_add("2009-03-01",2) will return "2009-03-03".

get_json_object

Extracts a json object from path.

Input Parameters: Column arg0, int arg1

• arg0:The json txt column

• arg1: The path

Output Type

Date Column

Configuration Parameters

No

Returns

Returns the extracted json object.

Throws

Application Exception

Example

get_json_object('{"a":"b"}', '$.a') will return b

months_between

If timestamp1 is later than timestamp2, then the result is positive. If timestamp1 and timestamp2 are on the same day of month, or both are the last day of month, time of day will be ignored. Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits unless roundOff=false.

Input Parameters: Column arg0, int arg1

• arg0:timestamp1 column.

• arg1:timestamp2 column.

• arg2:roundoff boolean

Output Type

Date Column

Configuration Parameters

No

Returns

Returns the months difference

Throws

Application Exception

Example

months_between('1997-02-28 10:30:00', '1996-10-30', false) will return 3.9495967741935485

to_utc_timestamp

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC.

Input Parameters:

arg0:The timestamp column.

arg1:The timezone column. 

Output Type

Date Column

Configuration Parameters

No

Returns

Returns the timestamp.

Throws

Application Exception

Example

to_utc_timestamp('2016-08-31', 'Asia/Seoul') will return 2016-08-30 15:00:00

date_trunc

Returns timestamp ts truncated to the unit specified by the format model fmt. fmt should be one of ["YEAR" "YYYY" "YY" "MON" "MONTH" "MM" "DAY" "DD" "HOUR" "MINUTE" "SECOND" "WEEK" "QUARTER"]

Input Parameters:

arg0: The FMT format.

arg1: The ts timestamp

Configuration Parameters

NO

Returns

Returns timestamp ts truncated to the unit specified by the format model fmt.

Throws

ApplicationException

Example

date_trunc('YEAR', '2015-03-05T09:32:05.359') will return 2015-01-01 00:00:00

dayofweek

Returns the day of the week for date/timestamp (1 = Sunday).

Input Parameters:

arg0: The date column.

Configuration Parameters

NO

Returns

Returns the day of the week for date/timestamp

Throws

ApplicationException

Example:

Dayofweek('2009-07-30') will return 5

to_date

Parses the date_str expression with the fmt expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the fmt is omitted.

Input Parameters:

arg0: str date_str column column.

arg1:the format string.

Configuration Parameters

NO

Returns

Returns the formatted date.

Throws

ApplicationException

Example:

to_date('2016-12-31', 'yyyy-MM-dd') will return 2016-12-31

to_timestamp

Parses the timestamp expression with the fmt expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted.

Input Parameters:

arg0:The timestamp column.

arg1:The format string.

Configuration Parameters

NO

Returns

Returns the formated timestamp.

Throws

ApplicationException

Example:

to_timestamp('2016-12-31', 'yyyy-MM-dd') will return 2016-12-31 00:00:00

date_format

Converts a date/timestamp/string column 'arg0' to a value of string in the format specified by the date format given by the 'arg1'.

Input Parameters: Column arg0, String arg1

arg0 (required)The date/timestamp/string column to be converted.

arg1 (required)– The format to which the date/timestamp/string to be converted.

Output Type

String Column

Configuration Parameters

No

Returns

The formatted date as string.

Throws

Application Exception

Use Case:

date_format

Example

date_format("2009-03-01","MM-dd-yyyy") will return "03-01-2009".

date_sub

Computes the date that is 'arg1' days before 'arg0'.

Input Parameters: Column arg0, int arg1

arg0 (required)The string/date/timestamp column from which days to be subtracted.

arg1 (required)– Number of days to be subtracted.

Output Type

Date Column

Configuration Parameters

No

Returns

The formatted date as string.

Throws

Application Exception

Use Case:

date_suub

Example

date_format("2009-03-01","MM-dd-yyyy") will return "03-01-2009".

datediff

Calculates the difference between the dates 'arg1' and 'arg0' date and returns the number of days. 

Input Parameters: Column arg0, Column arg1

arg0 (required)The end string/date/timestamp column.

arg1 (required)– The start string/date/timestamp column.

Output Type

Integer Column

Configuration Parameters

No

Returns

The number of days from 'arg1' date to 'arg0' date.

Throws

Application Exception

Use Case:

datediff

Example

datediff("2009-03-01","2009-02-27") will return 2.

dayofyear

Extracts the day of the year as an integer from a given date/timestamp/string.

Input Parameters: Column arg0

arg0 (required)The date/timestamp/string from which the day of year to be extracted.

Output Type

Integer Column

Configuration Parameters

No

Returns

The extracted day as an integer.

Throws

Application Exception

Use Case:

dayofyear

Example

dayofyear("2017-12-15") will return 349

unix_timestamp

Convert time string with given pattern to Unix time stamp (in seconds), returns null if fails.

https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html

Input Parameters: Column arg0,Object arg1

arg0 (required)The string/date/timestamp column to be converted.

arg1 (required)– The string/date/timestamp format.

Output Type

Long Column

Configuration Parameters

No

Returns

The converted Unix time stamp

Throws

Application Exception

Use Case:

unix_timestamp

Example

unix_timestamp("2017-12-15 11:56","yyyy-MM-dd hh:mm") will return 1513339008.

default_unix_timestamp

Gets current Unix timestamp in seconds.

Input Parameters: There are no function arguments.

Output Type

Long Column

Configuration Parameters

No

Returns

The current Unix time stamp

Throws

Application Exception

Use Case:

default_unix

Example

default_unix_timestamp() will return current time in long format.

from_utc_timestamp

Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp that corresponds to the same time of day in the given timezone.

Input Parameters: Column arg0, Object arg1

arg0 (required) – The string/date/timestamp column in UTC.

arg1 (required) – The timezone to which the timestamp has to converted.

Output Type

Timestamp Column

Configuration Parameters

No

Returns

The timestamp in the given timezone.

Throws

Application Exception

Use Case:

from-utc

Example

from_utc_timestamp("2017-12-15 11:40",IST) will return "2017-12-15 17:10"

hour

Extracts the hours as an integer from a given date/timestamp/string.

Input Parameters: Column arg0

arg0 (required)The date/timestamp/string column from which the hours to be extracted..

Output Type

Integer Column

Configuration Parameters

No

Returns

Returns the extracted hours as an integer.

Throws

Application Exception

Use Case:

hour

Example

date_format("2009-03-01","MM-dd-yyyy") will return "03-01-2009".

last_day

The date/timestamp/string column, returns the last day of the month which the given date belongs to.

Input Parameters: Column arg0

arg0 (required)The date/timestamp/string column from which last day of month to be extracted...

Output Type

Date Column

Configuration Parameters

No

Returns

Returns the computed last day.

Throws

Application Exception

Use Case:

last_day

Example

last_day("2017-12-15") will return "2017-12-31".

Lookup Functions

These functions help to look up data beyond the application flow.

lookupHBase

It is used to fetch the records from Hbase table based on the following input parameters.

Input parameters: String tableName,String rowID, String columnFamily, int resultRow, String resultColumnName.

tableName (required): Hbase Table name with name space as string type.

rowID (required): String representation of unique key for record.

columnFamily (required): Column family name is in string format.

resultRow (required): Result row number.

resultColumnName (required): Result column name.

Configuration Parameters: Below are the configuration parameters of lookupHBase function.

ConnectionName (required): Hbase cluster connection name.

RowidCaching (required): Caching will be applied on the basis of row id.

Output Type

Column

Returns

Returns the value for the given table, rowId, columnFamily, result row number, result column name.

Throws

Application Exception

.

Use Case:

lookup_output

Example:

lookupHBase("tableName","rowID","columnFamily",1,"age") will return value of age from row 1 of result for given rowId and columnFamily.

lookupRDBMS

It is used to fetch the record from RDBMS table. The application supports database lookup for Postgres, MySQL and Oracle.

Input Parameters: String sqlQuery, int resultRow, String resultColumnName

sqlQuery(required) – string type, select sql query specific to database.

resultRow (required): Result row number.

resultColumnName (required): Result column name.

Configuration Parameters:

ConnectionName(required)– Complete jdbc URL specific to database.

FunctionReturnType (required)– Return type of the function based on the result column being queried.

Output Type

Column

Returns

Returns the value for the given query, row number and column name.

Throws

Application Exception

Use Case:

lookup_RDBMS

DB2LookupRDBMS

Example:

lookupRDBMS(“select * from tableName",1,"age") will return value of age for row 1 of query result.

lookupES

Returns the result with respect to query from ElasticSearch index, outputs the given column for given row.

Input Parameters: String stringQuery, int resultRow, String resultColumnName

stringQuery(required) – string Elastic search query.

resultRow (required): Result row number

resultColumnName (required): Result column name.

Configuration Parameters:

ConnectionName(required)– Elastic search connection name.

indexName(required)- Elastic search index name.

indexType(required)- Elastic search index type.

functionReturnType (required)– Return type of the function based on the result column being queried.

.

Output Type

Column

Returns

Return the result of Elastic search query

with respect to row number and column name.

Throws

Application Exception

Use Case: 

lookup_ES

Example:

lookupES("{\'query\':{\'match_all\':{}}}",1,"age") will return row 1 and column named age from search result of ES index.

lookupWS

This method is used to get the response of web service. Gathr supports REST based web service.

Input Parameters: There is no input parameter.

Configuration Parameters:

endpoint(required): The end point of URL.

methodType(required): Web service method like GET, POST, etc.

Content-type(required): The content type like application/JSON, etc.

wsparams(required): The web service parameters.

.

Output Type

Column

Returns

Return the response of web service

Throws

Application Exception

Use Case:

look_es

lookupSOLR

Returns the result, with respect to query, from Solr Index.

Input Parameters: String queryString, int resultRow, String resultColumnName

queryString (required): Solr search query.

resultRow (required): Result row number.

resultColumnName (required): Result column name.

Configuration Parameters:

connectionName(required)- Solr connection name.

indexName(required)- Solr index name.

Output Type

Column

Returns

Return the result of solr search query

with respect to index name, row number and column name.

Throws

Application Exception

Use Case:

look_up_solr

Example:

lookupSOLR(''*:*'',1,”age”) will return you search result from Solr index for row 1 and column age of query result.

lookupHDFSPATH

Returns the boolean result, with respect to path exits or not, from HDFS.

Input Parameters: 

hdfsPath (required) – Path on HDFS.

Configuration Parameters: 

ConnectionName(required)– HDFS Connection name.

FunctionReturnType (required)– Return type of the function based on the result column being queried.

Output Type

Boolean Column

Returns

Returns Boolean as result from HDFS.

Throws

Application Exception

Use Case:

lookup_HDFS

Example:

lookupHDFSPATH(''hdfsPath'') will return whether the path exists or not on HDFS.

lookupHBase64

Returns the Base64 encoded result, with respect to row and column-family passed as argument to the function, from HBase table, outputs the given column for given row.

Input parameters: String tableName,String rowID, String columnFamily, int resultRow, String resultColumnName

•    tableName (required): Hbase Table name with name space as string type.

• rowID (required): String representation of unique key for record.

• columnFamily (required): Column family name is in string format.

• resultRow (required): Result row number.

• resultColumnName (required): Result column name.

Configuration Parameters: Below are the configuration parameters of lookupHBase function.

ConnectionName(required)– Hbase Connection name.

RowIdCaching (required)– Caching will be applied on the basis of row id.

.

Output Type

Column

Returns

Returns the value for the given table, rowId, columnFamily result row number, result column name.

Throws

Application Exception

Use Case:

lookUp_hbase

Example:

lookupHBase64("tableName","rowID","columnFamily",1,"age") will return value of age from row 1 of result for given rowId and columnFamily.

String Functions

Ascii

Computes the numeric value of the first character of the string column, and returns the result as an int column.

Input Parameters: Column arg0

arg0(required) - The string column for which the first character's numeric value to be calculated.

Output Type

String Column

Configuration Parameters

No

Returns

The ASCII value as an int column.

Throws

Application Exception

Use Case:

usecae_asci

Example:

ascii(“An apple”) will return 65

base64

Converts the argument from a binary bin to a base 64 string.

Input Parameters:

arg0: The Column to be converted to base64 string.

Output Type

String Column

Configuration Parameters

No

Returns

Returns the base64 string.

Throws

Application Exception

Example:

base64('Spark SQL') will return U3BhcmsgU1FM

bin

Returns the string representation of the long value expr represented in binary.

Input Parameters:

arg0: The numerical Column to be converted to represented in binary.

Output Type

String Column

Configuration Parameters

No

Returns

Returns the binary representation.

Throws

Application Exception

Example:

bin(13) will return 1101

decode

Decodes the first argument using the second argument character set.

Input Parameters:

arg0:Column to be decoded.

arg1: The charset

Output Type

String Column

Configuration Parameters

No

Returns

Returns the decoded column.

Throws

Application Exception

Example:

decode(encode('abc', 'utf-8'), 'utf-8') will return abc

encode

Encodes the first argument using the second argument character set.

Input Parameters:

arg0:Column to be encoded.

arg1: The charset

Output Type

String Column

Configuration Parameters

No

Returns

Returns the encoded column.

Throws

Application Exception

Example:

encode('abc', 'utf-8') will return abc

itrim

Removes the leading string contains the characters from the trim string.

Input Parameters:

arg0:the trim string characters to trim, the default value is a single space.

arg1: a string expression.

Output Type

String Column

Configuration Parameters

No

Returns

Returns the trimed string.

Throws

Application Exception

Example:

ltrim('Sp','SsparkSQLS')will return ArkSQLS

soundex

Returns Soundex code of the string.

Input Parameters:

arg0:String column

Output Type

String Column

Configuration Parameters

No

Returns

Returns Soundex code of the string.

Throws

Application Exception

Example:

soundex('Miller') will return M460

split

Splits str around occurrences that match regex

Input Parameters:

arg0: str string column.

arg1:the regex string

Output Type

String Column

Configuration Parameters

No

Returns

Returns the spilts.

Throws

Application Exception

.

Example:

split('oneAtwoBthreeC', '[ABC]')will return ["one","two","three",""]

trim

Remove the leading and trailing trimStr characters from str

Input Parameters:

arg0:The trimStr String column.

arg1:The str string

Output Type

String Column

Configuration Parameters

No

Returns

Returns the trimed string.

Throws

Application Exception

Example:

trim('SL', 'SsparkSQLS') will return parkSQ

unbase64

Converts the argument from a base 64 string str to a binary.

Input Parameters:

arg0:The base 64 String column

Output Type

String Column

Configuration Parameters

No

Returns

Returns the unbase64 of string.

Throws

Application Exception

Example:

unbase64('U3BhcmsgU1FM') will return Spark SQL

Concat

Concatenates multiple input string columns together into a single string column.

Input Parameters: Column... arg0

arg0 (required) - The String columns to be concatenated.

Output Type

String Column

Configuration Parameters

No

Returns

The concatenated string as a single string column.

Throws

Application Exception

Use Case:

concat_output

Example:

concat("format","string") will return "formatstring"

Concat_ws

Concatenates multiple input string columns together into a single string column, using the given separator.

Input Parameters: Object arg0, Column... arg1

• arg0(required) - The separator to be used.   

arg1 (required) - The String columns to be concatenated.

Output Type

Column

Configuration Parameters

No

Returns

The concatenated strings using the given separator as a single string column

Throws

Application Exception

Example:

concat_ws("-","format","string") will return "format-string".

Expr

Parses the expression string into the column that it represents.

Input Parameters: Object arg0

arg0 (required) - The expression string to be parsed

Output Type

Column

Configuration Parameters

No

Returns

The column return by the expression evaluation.

Throws

Application Exception

Use Case:

exp

Example:

expr("colA * 3") will return 6 if the value of column 'colA' is 2 for particular row.

Format_string

Formats the arguments in printf-style and returns the result as a string column.

Input Parameters: Object arg0, Column arg1

arg0 (required) - The printf style format.

arg1 (required) - The columns to be formatted.

Output Type

String Column

Configuration Parameters

No

Returns

The formatted arguments as a string column.

Throws

Application Exception

format_string_usercasse

Example:

format_string("the %s jumped over the %s, %d times","cow","moon",2) will return "the cow jumped over the moon 2 times".

Initcap

Computes a new string column by converting the first letter of each word to uppercase.

Input Parameters: Column arg0

• arg0 (required) - The input string column

Output Type

String Column

Configuration Parameters

No

Returns

The converted string column.

Throws

ApplicationException

Use Case:

initicap

Example:

initcap("apple") will return "Apple")

Instr

Locate the position of the first occurrence of given substring in the given string column.

Input Parameters: Column arg0, Object arg1

arg0 (required) - The string column in which the location to be determined.

arg1 (required) - The substring for which the position to be determined.

Output Type

Integer Column

Configuration Parameters

No

Returns

Returns the computed last day.

Throws

Application Exception

instr

Example:

instr("apple","le") will return 4

Length

Computes the length of a given string or binary column.

Input Parameters: Column arg0

arg0 (required) - The string column for which the length to be determined.

Output Type

Integer Column

Configuration Parameters

No

Returns

The computed length.

Use Case:

lenegth

Example:

length("apple") will return 5

Levenshtein

Computes the Levenshtein distance of the two given string columns.

Input Parameters: Column arg0, Column arg1

• arg0 (required) - The first string column from which the Levenshtein distance from the second string column to be determined.

• arg1 (required) - The second string column

Output Type

Integer Column

Configuration Parameters

No

Returns

The computed Levenshtein distance.

Throws

Application Exception

Use Case:

leven

Example:

levenshtein("kitten", "sitting") will return 3

Locate

Locate the position of the first occurrence of given substring in a string column, after the given position.

Input Parameters: Object arg0, Column arg1, int arg2

arg0 (required) - The String for which the location to be determined.

arg1 (required) - The String column in which the location to be determined.

arg2 (required) - The position after which the location to be determined

Output Type

Integer Column

Configuration Parameters

No

Returns

The position of the first occurrence of substring.

Throws

Application Exception

Use Case:

locate

Example:

locate("apple","An apple",1) will return 3

Lower

Converts a string column to lower case.

Input Parameters: Column arg0

• arg0 (required) - The string column to be converted to lower case

Output Type

String Column

Configuration Parameters

No

Returns

String in lower case.

Throws

Application Exception

Use Case:

lower

Example:

lower("Apple") will return "apple"

Lpad

Left-pad the string column with the given string, to a given length.

Input Parameters: Column arg0, int arg1, Object arg2

• arg0 (required) - The string column to be left-padded.

• arg1 (required) - The length for the padding.

• arg2 (required) - The string to use for left-pad.

Output Type

String Column

Configuration Parameters

No

Returns

The Left-padded string.

Throws

Application Exception

Use Case:

lpad

Example:

lpad("SQL Tutorial", 20, "ABC") will return "ABCABCABSQL Tutorial"

Ltrim

Trim the spaces from left end for the specified string column.

Input Parameters: Column arg0

arg0 (required) - The string column from which left spaces to be trimmed.

Output Type

String Column

Configuration Parameters

No

Returns

The trimmed string column.

Use Case:

ltrim

Example:

ltrim(“apple”) will return “apple”

Regexp_extract

Extracts a specific group matched by a Java regex, from the specified string column.

Arguments: Column arg0, Object arg1, int arg2

arg0 (required) - The string column from which the group to be extracted.

arg1 (required) - The string specifying the regex.

arg2 (required) - The regex group id.

Output Type

String Column

Configuration Parameters

No

Returns

The extracted group.

Throws

Application Exception

Use Case:

regexextract

Example:

regexp_extract("foothebar","foo(.*?)(bar)", 2) will return "bar"

Regexp_replace

Replace all substrings of the specified string column that match regexp with the given replacement.

Input Parameters: Column arg0, Object arg1, Object arg2

arg0 (required) - The string column from which substrings to be replaced.

arg1 (required) - The pattern to be used.

arg2 (required) - The replacement.

Output Type

String Column

Configuration Parameters

No

Returns

The string column after replacement of substrings that match regexp.

Throws

Application Exception

Use Case:

replace

Example:

regexp_replace("foobar", "oo|ar", "") will return "fb"

Rename_column

Rename the given column.

Input Parameters: Object arg0

arg0 (required) - The column name to be renamed.

Output Type

Column

Configuration Parameters

No

Returns

The renamed column.

Throws

Application Exception

Use Case:

rename

Example:

rename_column("age") will rename "age" to given output column name.

Repeat

Repeats each value in 'arg0' column 'arg1' times.

Input Parameters: Column arg0,int arg1

arg0 (required) - A column needs to be repeated

arg1 (required) - Integer value representing no of times arg0 is to be repeated.

Output Type

Column

Configuration Parameters

No

Returns

A repeated value.

Throws

Application Exception

Use Case:

repeat

Example:

repeat("str",2) will return 'strstr'

Reverse

Reverses the string column and returns it as a new string column.

Input Parameters: Column arg0

arg0 (required) - The string column to be reversed.

Output Type

String Column

Configuration Parameters

No

Returns 

The reversed string column.

Throws

Application Exception

Use Case:

reverset

Example:

reverse("apple") will return "elppa"

Rpad

Right-pad the string column with the given string, to a given length.

Input Parameters: Column arg0, int arg1, Object arg2

• arg0 (required) - The string column to be right-padded.

• arg1 (required) - The length for the padding.

• arg2 (required) - The string to use for right-pad.

Output Type

StringColumn

Configuration Parameters

No

Returns

The Right-padded string.

Throws

Application Exception

Use Case:

rpad

Example:

rpad("SQL Tutorial", 20, "ABC") will return "SQL TutorialABCABCAB"

Rtrim

Trim the spaces from right end for the specified string column.

Input Parameters: Column arg0

arg0 (required) - The string column from which right spaces to be trimmed.

Output Type

String Column

Configuration Parameters

No

Returns

The trimmed string column.

Use Case:

rtrim

Example:

rtrim("apple ") will return "apple"

Substring

Substring starts at given position and is of given length when given argument is String type or returns the slice of byte array that starts at given position in byte and is of given length when given argument is Binary type.

Input Parameters: Column arg0, int arg1, int arg2

• arg0 (required) - The String column from which substring to be extracted.

• arg1 (required) - The start position for the substring.

• arg2 (required) - The end position for the substring.

Output Type

String Column

Configuration Parameters

No

Returns

Returns the computed last day.

Use Case:

substring

Example:

substring("foo bar",4,6) will return "bar"

Substring_index

Computes the substring from given string column before given count occurrences of the given delimiter.

Input Parameters: Column arg0, Object arg1, int arg2

• arg0 (required) - The String column from which substring to be extracted.

• arg1 (required) - The delimiter.

• arg2 (required) - The count occurrences for the delimiter.

Output Type

Column

Configuration Parameters

No

Returns

The result substring.

Use Case:

substrign _index

Example:

substring_index("www.xyz.com",".",2) will return www.xyz

Translate

Translate any character in the given string by a given character in given replaceString.

Input Parameters: Column arg0, Object arg1, Object arg2

arg0 (required) - The string column in which the translation to be done.

arg1 (required) - The matching character.

arg2 (required) - The replacement character.

Output Type

String Column

Configuration Parameters

No

Returns

The translated string.

Example:

translate("The foo bar","f","t") will return "The too bar"

Trim

Trim the spaces from both ends for the specified string column.

Input Parameters: Column arg0

arg0 (required) - The string column from which spaces to be trimmed.

Output Type

String Column

Configuration Parameters

No

Returns

The trimmed string value.

Throws

Application Exception

Use Case:

trim

Example:

trim(" An apple ") will return "An apple"

Upper

Converts a string column to upper case.

Input Parameters: Column arg0

arg0 (required) - The string column to be converted to upper case.

Output Type

String Column

Configuration Parameters

No

Returns

The converted string column.

Use Case:

uppercolumn

Example:

upper("aPPle") will return "APPLE"

Math Function

Abs

Computes the absolute value.

Input Parameters: Column arg0

arg0 (required) - The column for which absolute value to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed absolute value.

Throws

Application Exception

Use Case:

abs

Example:

abs(77.76) will return 77.76

Acos

Computes the cosine inverse of the given value; the returned angle is in the range 0.0 through pi.

Input Parameters: Column arg0

arg0 (required) - The column for which cosine inverse to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed cosine inverse in the range 0.0 through pi.

Throws

Application Exception

Use Case:

acos

Example:

acos(0.45) will return 1.104031001096478

Asin

Computes the sine inverse of the given value; the returned angle is in the range -pi/2 through pi/2.

Input Parameters: Column arg0

arg0 (required) - The column for which sine inverse to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed sine inverse in the range -pi/2 through pi/2.

Throws

Application Exception

Use Case:

asin

Example:

asin(0.45) will return 0.4667653256984187

Atan

Computes the tangent inverse of the given value.

Input Parameters: Column arg0

arg0 (required) - The column for which tangent inverse to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed the tangent inverse.

Throws

Application Exception

Example:

atan(0.45) will return 0.42285391621948626

Atan2

Computes the angle theta from the conversion of rectangular coordinates (arg0, arg1) to polar coordinates (arg1, theta).

Input Parameters: Column arg0, Column arg1

• arg0 (required) - The column for the x rectangular coordinate.

• arg1 (required) - The column for the y rectangular coordinate.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed angle theta.

Throws

Application Exception

Use Case:

atan_2

Example:

atan2(12, 71.21) will return 1.403849169952035

Atan2_left_arg_double

Computes the angle theta from the conversion of rectangular coordinates (arg0, arg1) to polar coordinates (arg1, theta).

Input Parameters: double arg0, Column arg1

• arg0 (required) - The x rectangular coordinate.

• arg1 (required) - The column for the y rectangular coordinate.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed angle theta.

Throws

Application Exception

Use Case:

atan_21

Example:

atan2_left_arg_double(12, 71.21) will return 1.403849169952035

Atan2_right_arg_double

Computes the angle theta from the conversion of rectangular coordinates (arg0, arg1) to polar coordinates (arg1, theta).

Input Parameters: Column arg0, double arg1

• arg0 (required) - The column for the x rectangular coordinate.

• arg1 (required) - The y rectangular coordinate.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed angle theta.

Throws

Application Exception

Use Case:

right_arg

Example:

atan2_right_arg_double(12, 71.21) will return 1.403849169952035

Bround

Computes the value of the column arg0 rounded to 0 decimal places with HALF_EVEN round mode.

Input Parameters: Column arg0

arg0 (required) - The column for which value rounded to 0 decimal places with HALF_EVEN round mode to be calculated

Output Type

Number Column

Configuration Parameters

No

Returns

The computed value

Throws

Application Exception

Use Case:

brond

Example:

bround(71.21) will return 71.0

Bround_with_scale

Round the value of arg0 to arg1 decimal places with HALF_EVEN round mode if arg1 is greater than or equal to 0 or at integral part when arg1 is less than 0.

Input Parameters: Column arg0, int arg1

• arg0 (required) - The column for which value rounded to 0 decimal places with HALF_EVEN round mode to be calculated.

• arg1 (required) - The scale to use

Output Type

Number Column

Configuration Parameters

No

Returns

The computed value.

Throws

Application Exception

Use Case:

bround_scale

Example:

bround_with_scale(71.21, 1) will return 71.2

Cbrt

Computes the cube-root of the given column.

Input Parameters: Column arg0

arg0 (required) - The column for which cube-root to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed cube-root.

Throws

Application Exception

Use Case:

cbrt1

Example:

cbrt(80.89) will return 4.324789202233814

Ceil

Computes the ceiling of the given column.

Input Parameters: Column arg0

• arg0 (required) - The column for which ceiling to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed ceiling.

Throws

Application Exception

Example:

ceil(77.76) will return 4.2682720044742055

Conv

Converts a number column from a given base to another.

Input Parameters: Column arg0,Column args1,Column args2

• arg0 (required) - A number/String number column

• arg1 (required) - Integer column of base from which a number is to be converted

• arg2 (required) - Integer column of base to which a number is to be converted.

Output Type

String Column

Configuration Parameters

No

Returns

The string in the target base.

Throws

Application Exception

Use Case:

conv

Example:

conv(258,10,2) will return 100000010

Cos

Computes the cosine of the given value.

Input Parameters: Column arg0

• arg0 (required) - The column for which cosine to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed cosine.

Throws

Application Exception

Use Case:

cos

Example:

cos(76.56) will return 0.3977126102073901

Cosh

Computes the hyperbolic cosine of the given value.

Input Parameters: Column arg0

• arg0 (required) -The column for which hyperbolic cosine to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed hyperbolic cosine.

Throws

Application Exception

Use Case:

cosh-2

Example:

cos(76.56) will return 0.3977126102073901

Degrees

Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

Input Parameters: Column arg0

• arg0 (required) -The column for which the equivalent angle measured in degrees to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

The converted angle measured in degrees.

Throws

Application Exception

Use Case:

degrees

Example:

degrees(71.21) will return 4080.0324066707394

Exp

Computes the exponential of the given value.

Input Parameters: Column arg0

• arg0 (required) -The column for which exponential to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed exponential.

Throws

Application Exception

Use Case:

expr_1

Example:

exp(0.78) will return 2.18147220308578

Expm1

Computes the exponential of the given value minus one.

Input Parameters: Column arg0

• arg0 (required) -The column for which exponential minus one to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed value.

Throws

Application Exception

Use Case:

expm1

Example:

expm1(0.23) will return 0.2586000151807663

Factorial

Computes the factorial of the given value.

Input Parameters: Column arg0

• arg0 (required) -The column for which factorial to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

The computed factorial.

Throws

Application Exception

Use Case:

factorial

Example:

factorial(11) will return 39916800

Floor

Computes the floor of the given column.

Input Parameters: Column arg0

• arg0 (required) -The column for which floor to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed floor.

Throws

Application Exception

Use Case:

floor

Example:

floor(71.21) will return 71

Format_number

Formats numeric column arg0 to a format like '#,###,###.##', rounded to arg1 decimal places, and returns the result as a string column.

Input Parameters: Column arg0, int arg1

• arg0 (required) -The column to be formated.

• arg1 (required) -The integer specifying the decimal places to be used for rounding.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the formatted result as a string column.

Throws

Application Exception

Example:

format_number(7120.12, 1) will return 7,120.1

Greatest

It gives the greatest value of the list of values. This function takes at least 2 parameters.

Input Parameters: Column arg0,Column args1,...

• arg0 (required) -A column from the schema

• arg1 (required) -A column from the schema

Output Type

Number Column

Configuration Parameters

No

Returns

The greatest column

Throws

Application Exception

Use Case:

gretaest

Example:

greatest(258,259) will return 259

Hex

If the argument is an INT, hex returns the number as a STRING in hexadecimal format. Otherwise if the number is a STRING, it converts each character into its hexadecimal representation and returns the resulting STRING.

Input Parameters: Column arg0

• arg0 (required) -A int/string column

Output Type

String Column

Configuration Parameters

No

Returns

A string value.

Throws

Application Exception

Use Case:

hex

Example:

hex(258) will return 102

Hypot

Computes sqrt(arg0^2^ + arg1^2^) without intermediate overflow or underflow.

Input Parameters: Column arg0, Column arg1

• arg0 (required) -Will be used while computing sqrt.

• arg1 (required) -Will be used while computing sqrt.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed sqrt(arg0^2^ + arg1^2^).

Throws

Application Exception

Use Case:

hypot

Example:

hypot(71.21, 10.5) will return 71.97995533209642

Hypot_left_arg_double

Computes sqrt(arg0^2^ + arg1^2^) without intermediate overflow or underflow.

Input Parameters: double arg0, Column arg1

• arg0 (required) -Will be used while computing sqrt.

• arg1 (required) -Will be used while computing sqrt.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed sqrt(arg0^2^ + arg1^2^).

Throws

Application Exception

Use Case:

hypo-left

Example:

hypot_left_arg_double(71.21, 10.5) will return 71.97995533209642

Hypot_right_arg_double

Computes sqrt(arg0^2^ + arg1^2^) without intermediate overflow or underflow.

Input Parameters: Column arg0, double arg1

• arg0 (required) -Will be used while computing sqrt.

• arg1 (required) -Will be used while computing sqrt.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed sqrt(arg0^2^ + arg1^2^).

Throws

Application Exception

Use Case:

hypo_right

Example:

hypot_right_arg_double(71.21, 10.5) will return 71.97995533209642

Least

It gives the least value of the list of values. This function takes at least 2 parameters.

Input Parameters: Column arg0,Column args1,...

• arg0 (required) - A column from the schema.

• arg1 (required) - A column from the schema.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the least column

Throws

Application Exception

Use Case:

least

Example:

least(258,259) will return 259

Log

Computes the natural logarithm of the given value.

Input Parameters: Column arg0

• arg0 (required) - The column for which natural logarithm to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed natural logarithm.

Throws

Application Exception

Use Case:

log

Example:

log(20) will return 2.995732273553991

Log1p

Computes the natural logarithm of the given value plus one.

Input Parameters: Column arg0

• arg0 (required) - The column for which natural logarithm plus one to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed natural logarithm plus one.

Throws

Application Exception

Example:

log1p(20) will return 3.044522437723423

Log_with_base

Computes the first argument-base logarithm of the second argument.

Input Parameters: double arg0, Column arg1

• arg0 (required) - The base to used.

• arg1 (required) - The column for which logarithm to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed algorithm.

Throws

Application Exception

Use Case:

log_with_base

Example:

log_with_base(10, 20) will return 1.301029995663981

Negate

Computes the Unary minus, i.e. -(arg0)

Input Parameters: Column arg0

• arg0 (required) - The column for which unary minus to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed Unary minus, i.e. -(arg0).

Throws

Application Exception

Example:

negate(20) will return -20

Pmod

Computes the positive value of arg0 mod arg1.

Input Parameters: Column arg0, Column arg1

• arg0 (required) - The dividend.

• arg1 (required) - The divisor.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed positive value of arg0 mod arg1.

Throws

Application Exception

Use Case:

pmod

Example:

pmod(19, 0.78) will return 0.2800007

Pow

Computes the value of the first argument raised to the power of the second argument.

Input Parameters: Column arg0, Column arg1

• arg0 (required) - The base.

• arg1 (required) - The exponent.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed value

Throws

Application Exception

Use Case:

pow

Example:

pow(20, 2) will return 400

Pow_left_arg_double

Computes the value of the first argument raised to the power of the second argument.

 Input Parameters: double arg0, Column arg1

• arg0 (required) - The base.

• arg1 (required) - The exponent

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the compute value

Throws

Application Exception

Use Case:

pow_left_double

Example:

pow_left_arg_double(20, 2) will return 400

Pow_right_arg_double

Computes the value of the first argument raised to the power of the second argument.

Input Parameters: Column arg0, double arg1

• arg0 (required) - The base.

• arg1 (required) - The exponent

Output Type

Column

Configuration Parameters

No

Returns

Returns the computed value.

Throws

Application Exception

Use Case:

pow_right

Example:

pow_right_arg_double(20, 2) will return 400

Radians

Converts an angle measured in degrees to an approximately equivalent angle measured in radians.

Input Parameters: Column arg0

• arg0 (required) - The column for which equivalent angle measured in radians to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the converted angle measured in radians.

Throws

Application Exception

Use Case:

radians

Example:

radians(20) will return 0.3490658503988659

Rand

Generate a random column with independent and identically distributed (i.i.d.) samples from 0.0 to 1.0.

Input Parameters: No input arguments.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the generated column.

Throws

Application Exception

Example:

rand() will return a new column with independent and identically distributed (i.i.d.) samples from 0.0 to 1.0.

Rand_with_seed

Generate a random column with independent and identically distributed (i.i.d.) samples from 0.0 to 1.0.

Input Parameters: long arg0

• arg0 (required) - The seed

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the generated column.

Throws

Application Exception

Use Case:

rand_with_seed

Example:

rand_with_seed(2) will return a new column with independent and identically distributed (i.i.d.) samples from 0.0 to 1.0.

Randn

Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

Input Parameters: No input arguments.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the generated column.

Throws

Application Exception

Use Case:

randn

Example:

randn() will return a new column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

Randn_with_seed

Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

 Input Parameters: long arg0

• arg0 (required) - The seed

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the generated column.

Throws

Application Exception

Use Case:

rand-with_seed

Example:

randn_with_seed(2) will return a new column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

Rint

Computes the double value that is closest in value to the argument and is equal to a mathematical integer.

Input Parameters: Column arg0

• arg0 (required) - The column for which double value to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed last day.

Throws

Application Exception

Use Case:

rint2

Example:

rint(80.89) will return 81.0

Round

Computes the value of the column arg0 rounded to 0 decimal places.

Input Parameters: Column arg0

arg0 (required) - The column for which value rounded to 0 decimal places to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed value.

Throws

Application Exception

Use Case:

round_column

Example:

round(80.89) will return 81.0

Round_with_scale

Computes the value of arg0 to scale decimal places if arg1 is greater than or equal to 0 or at integral part when arg1 is less than 0.

 Input Parameters: Column arg0, int arg1

• arg0 (required) - The column for which value rounded to 0 decimal places to be calculated.

• arg0 (required) - The seed

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed value.

Throws

Application Exception

Use Case:

round-with_scale

Example:

round_with_scale(80.89,1) will return 80.9

Signum

Computes the signum of the given value.

Input Parameters: Column arg0

• arg0 (required) - The column for which signum to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed signum.

Throws

Application Exception

Example:

signum(20) will return 1.0

Sin

Computes the sine of the given value.

Input Parameters: Column arg0

• arg0 (required) - The column for which sine to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed sine

Throws

Application Exception

Example:

sin(20) will return 0.9129452507276277

Sinh

Computes the hyperbolic sine of the given value.

Input Parameters: Column arg0

• arg0 (required) - The column for which sine to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed hyperbolic sine.

Throws

Application Exception

Example:

sinh(20) will return 2.4258259770489514E8

SqlSqrt

Computes the square root of the specified float value.

Input Parameters: Column arg0

• arg0 (required) - The column for which square root to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed square root.

Throws

Application Exception

Use Case:

sqlrt

Example:

sqlSqrt(20) will return 4.47213595499958

Tan

Computes the tangent of the given value.

Input Parameters: Column arg0

arg0 (required) - The column for which tangent to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed tangent

Throws

Application Exception

Use Case:

tan

Example:

tan(20) will return 2.237160944224742

Tanh

Computes the hyperbolic tangent of the given value.

Input Parameters: Column arg0

• arg0 (required) - The column for which hyperbolic tangent to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the computed hyperbolic tangent.

Throws

Application Exception

Example:

tanh(20) will return 1.0

ToDegrees

Computes the angle measured in radians to an approximately equivalent angle measured in degrees.

Input Parameters: Column arg0

• arg0 (required) - The column for which degree to be calculated

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the angle measured in degrees.

Throws

Application Exception

Use Case:

todgereess

Example:

toDegrees(3.14159) will return 180

ToRadians

Computes the angle measured in degrees to an approximately equivalent angle measured in radians.

Input Parameters: Column arg0

• arg0 (required) - A column for which radians to be calculated.

Output Type

Number Column

Configuration Parameters

No

Returns

Returns the angle measured in radians.

Throws

Application Exception

Example:

toRadians(180) will return 3.14159

Array Functions

Array

Returns an array with the given elements.

Input Parameters:

arg0: The given columns to create array column.

Configuration Parameters

No

Returns

Returns an array with the given elements.

Throws

Application Exception

Example:

array(1, 2, 3) will return [1,2,3]

Array_union

Returns an array of the elements in the union of array1 and array2, without duplicates.

Input Parameters:

arg0: The first array column.

arg1: The second array column.

Configuration Parameters

No

Returns

Returns an array of the elements in the union of array1 and array2, without duplicates.

Throws

Application Exception

Example:

array_union(array(1, 2, 3), array(1, 3, 5)) will return [1,2,3,5]

array_distinct

Removes duplicate values from the array.

Input Parameters:

arg0: The given array column.

Configuration Parameters

No

Returns

Returns the array with duplicate values removed.

Throws

Application Exception

Example:

array_distinct(array(1, 2, 3, null, 3)) will return [1,2,3,null]

array_except

Returns an array of the elements in array1 but not in array2, without duplicates.

Input Parameters:

arg0: First array column.

arg1: Second array column.

Configuration Parameters

No

Returns

Returns an array of the elements in array1 but not in array2, without duplicates.

Throws

Application Exception

Example:

array_except(array(1, 2, 3), array(1, 3, 5)) will return [2]

array_intersect

Performs intersection of array1 and array2, without duplicates.

Input Parameters:

arg0: First array column.

arg1: Second array column.

Configuration Parameters

No

Returns

Returns an array of the elements in the intersection of array1 and array2, without duplicates.

Throws

Application Exception

Example:

array_intersect(array(1, 2, 3), array(1, 3, 5)) will return [1,3]

array_join

Concatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered.

Input Parameters:

arg0: array column.

arg1: delimiter.

arg2: nullReplacement.

Configuration Parameters

No

Returns

Returns the concatenated array.

Throws

Application Exception

Example:

array_join(array('hello', null ,'world'), ' ', ',') will return hello , world

array_max

Returns the maximum value in the array. NULL elements are skipped.

Input Parameters:

arg0: The array column.

Configuration Parameters

No

Returns

Returns the maximum value in the array. NULL elements are skipped.

Throws

Application Exception

Example:

array_max(array(1, 20, null, 3)) will return 20

array_min

Returns the minimum value in the array. NULL elements are skipped.

Input Parameters:

arg0: The array column.

Configuration Parameters

No

Returns

Returns the minimum value in the array. NULL elements are skipped.

Throws

Application Exception

Example:

array_min(array(1, 20, null, 3)) will return 1

array_position

Returns the (1-based) index of the first element of the array as long.

Input Parameters:

arg0: The array column.

arg1: The position.

Configuration Parameters

No

Returns

Returns the (1-based) index of the first element of the array as long.

Throws

Application Exception

Example:

array_position(array(3, 2, 1), 1) will return 3

array_remove

Remove all elements that equal to element from array.

Input Parameters:

arg0: The array column.

arg1: The position.

Configuration Parameters

No

Returns

Returns the array with elements removed.

Throws

Application Exception

Example:

array_remove(array(1, 2, 3, null, 3), 3) will return [1,2,null]

array_repeat

Returns the array containing element count times.

Input Parameters:

arg0: The array column.

arg1: The count

Configuration Parameters

No

Returns

Returns the array containing element count times.

Throws

Application Exception

Example:

array_repeat('123', 2) will return ["123","123"]

array_sort

Sorts the input array in ascending order. The elements of the input array must be order-able. Null elements will be placed at the end of the returned array.

Input Parameters:

arg0: The array column.

Configuration Parameters

No

Returns

Returns the sorted array.

Throws

Application Exception

Example:

array_sort(array('b', 'd', null, 'c', 'a')) will return ["a","b","c","d",null]

array_union

Returns an array of the elements in the union of array1 and array2, without duplicates.

Input Parameters:

arg0: The first array column.

arg1: The second array column.

Configuration Parameters

No

Returns

Returns an array of the elements in the union of array1 and array2, without duplicates.

Throws

Application Exception

Example:

array_union(array(1, 2, 3), array(1, 3, 5)) will return [1,2,3,5]

array_overlap

Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise.

Input Parameters:

arg0: The first array column.

arg1: The second array column.

Configuration Parameters

No

Returns

Returns true or false.

Throws

Application Exception

Example:

arrays_overlap(array(1, 2, 3), array(3, 4, 5)) will return true.

array_zip

Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

Input Parameters:

arg0: The Columns to be zipped.

Configuration Parameters

No

Returns

Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

Throws

Application Exception

Example:

arrays_zip(array(1, 2, 3), array(2, 3, 4)) Will return [{"0":1,"1":2},{"0":2,"1":3},{"0":3,"1":4}]

Array_contains

Returns TRUE if the array contains value.

Input Parameters: Column arg0, Object arg1

• arg0 (required) - An array column.

• arg1 (required) - A value to be checked.

Output Type

Boolean Column

Configuration Parameters

No

Returns

A boolean true/false

Throws

Application Exception

Use Case:

array_contains

Example:

array_contains(["black","red"] ,"red") will return true

explode

Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns.

Input Parameters:

arg0: The expr column.

Configuration Parameters

No

Returns

Returns the exploded column.

Throws

Application Exception

Example:

explode(array(10, 20)) will return 10, 20 in a new column.

explode_outer

Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns.

Input Parameters:

arg0: The expr column.

Configuration Parameters

No

Returns

Returns the exploded column.

Throws

Application Exception

Example:

explode_outer(array(10, 20)) will return 10, 20.

Miscellaneous Functions

BitwiseNOT

Computes bitwise NOT of the given integer.

Input Parameters: Column arg0

• arg0 (required) - An integer or a double column.

Output Type

Integer Column

Configuration Parameters

No

Returns

The one's complement of the integer

Throws

Application Exception

Use Case:

bitwise

Example:

bitwiseNOT(7) will return 8

Coalesce

Returns the first non-null argument if exists. Otherwise, null.

Input Parameters:

arg0:columns representing expressions.

Output Type

Integer Column

Configuration Parameters

No

Returns

Returns the first non-null argument if exists. Otherwise, null.

Throws

Application Exception

Example:

coalesce(NULL, 1, NULL) will return 1

from_json

Returns a struct value with the given jsonStr and schema.

Input Parameters:

arg0: The Json string column.

arg1: The schema column.

arg2: The properties map.

Configuration Parameters

No

Returns

Returns the struct value.

Throws

Application Exception

Example:

from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE') will return {"a":1, "b":0.8} and from_json('{"time":"26/08/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'))will return {"time":"2015-08-26 00:00:00.0"}

hash

Returns a hash value of the arguments.

Input Parameters:

arg0:The columns for which hash to be calculated.

Configuration Parameters

No

Returns

Returns a hash value of the arguments.

Throws

Application Exception

Example:

hash('Spark', array(123), 2) will return -1321691492

input_file_name

Returns the name of the file being read, or empty string if not available.

Configuration Parameters

No

Returns

Returns the name of the file being read, or empty string if not available.

Throws

Application Exception

Example:

input_file_name() - will return the name of the file being read

map

Creates a map with the given key/value pairs.

Input Parameters:

arg0:The columns for key and value.

Configuration Parameters

No

Returns

Returns the map.

Throws

Application Exception

Example:

map(1.0, '2', 3.0, '4') will return {1.0:"2",3.0:"4"}

map_concat

Returns the union of all the given maps

Input Parameters:

arg0:The map columns.

Configuration Parameters

No

Returns

Returns the union of all the given maps.

Throws

Application Exception

Example:

map_concat(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd')) will return {1:"a",2:"c",3:"d"}

map_from_arrays

Creates a map with a pair of the given key/value arrays. All elements in keys should not be null.

Input Parameters:

arg0:Array of keys.

arg1:Array of values.

Configuration Parameters

No

Returns

Returns the map.

Throws

Application Exception

Example:

map_from_arrays(array(1.0, 3.0), array('2', '4')) will return {1.0:"2",3.0:"4"}

map_from_entries

Returns a map created from the given array of entries.

Input Parameters:

arg0:Array of entries.

Configuration Parameters

No

Returns

Returns the map.

Throws

Application Exception

Example:

map_from_entries(array(struct(1, 'a'), struct(2, 'b'))) will return {1:"a",2:"b"}

map_keys

Returns an unordered array containing the keys of the map.

Input Parameters:

arg0:Map column.

Configuration Parameters

No

Returns

Returns the array.

Throws

Application Exception

Example:

map_keys(map(1, 'a', 2, 'b')) will return [1,2]

map_values

Returns an unordered array containing the values of the map.

Input Parameters:

arg0:Map column.

Configuration Parameters

No

Returns

Returns the array.

Throws

Application Exception

Example:

map_values(map(1, 'a', 2, 'b')) will return ["a","b"]

nanvl

R Returns expr1 if it's not NaN, or expr2 otherwise.

Input Parameters:

arg0:expr1 column.

arg1:expr2 column.

Configuration Parameters

No

Returns

Returns expr1 if it's not NaN, or expr2 otherwise.

Throws

Application Exception

Example:

nanvl(cast('NaN' as double), 123) will return 123.0

not

Perform logical not of given column.

Input Parameters:

arg0:Given boolean column

Configuration Parameters

No

Returns

Returns logical not of given column.

Throws

Application Exception

Example:

not(false) will return true

schema_of_json

Returns schema in the DDL format of JSON string.

Input Parameters:

arg0:Given json string column

Configuration Parameters

No

Returns

Returns schema of the json.

Throws

Application Exception

Example:

schema_of_json('[{"col":0}]') will return array>

sequence

Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions.Supported types are: byte, short, integer, long, date, timestamp.The start and stop expressions must resolve to the same type. If start and stop expressions resolve to the 'date' or 'timestamp' type then the step expression must resolve to the 'interval' type, otherwise to the same type as the start and stop expressions.

Input Parameters:

arg0:start - an expression. The start of the range.

arg1:stop - an expression. The end the range (inclusive).

arg2:step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it's 1 day and -1 day respectively. If start is greater than stop then the step must be negative, and vice versa.

Configuration Parameters

No

Returns

Returns the sequence

Throws

Application Exception

Example:

sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval 1 month) will return [2018-01-01,2018-02-01,2018-03-01]

size

Returns the size of an array or a map. The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true. If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input. By default, the spark.sql.legacy.sizeOfNull parameter is set to true.

Input Parameters:

arg0: array or map column.

Configuration Parameters

No

Returns

Returns the size

Throws

Application Exception

Example:

size(array('b', 'd', 'c', 'a')) will return 4

struct

Creates a struct with the given field values.

Input Parameters:

arg0:columns using which the struct will be created.

Configuration Parameters

No

Returns

Returns the struct column

Throws

Application Exception

to_json

Returns a JSON string with a given struct value.

Input Parameters:

arg0:struct column.

arg1:additional options map.

Configuration Parameters

No

Returns

Returns a JSON string with a given struct value.

Throws

Application Exception

Example:

to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy')) will return {"time":"26/08/2015"}

unhex

Converts hexadecimal expr to binary.

Input Parameters:

arg0:The hexadecimal column.

Configuration Parameters

No

Returns

Returns the binary.

Throws

Application Exception

Example:

decode(unhex('537061726B2053514C'), 'UTF-8') will return Spark SQL

Crc32

Computes a cyclic redundancy check value for string.

Input Parameters: Column arg0

• arg0 (required) - A string column.

Output Type

Integer Column

Configuration Parameters

No

Returns

Bigint value.

Throws

Application Exception

Use Case:

cr32

Example:

crc32("ABC") will return 2743272264

Isnan

Returns true if the input is Not a Number.

Input Parameters: Column arg0

• arg0 (required) - A column whose values needs to be checked.

Output Type

Boolean Column

Configuration Parameters

No

Returns

A boolean true if the value is not a number.

Throws

Application Exception

Use Case:

isnan

Example:

isnan("abc") will return true

Isnull

Returns true if “a” is NULL and false otherwise.

Input Parameters: Column arg0

• arg0 (required) - A column whose values needs to be checked.

Output Type

Boolean Column

Configuration Parameters

No

Returns

A boolean true if the value is null

Throws

Application Exception

Use Case:

isnull

Example:

isnull("abc") will return false

Lit

Creates a Column of literal value.

Input Parameters: Object arg0

• arg0 (required) - A constant value.

Output Type

Column

Configuration Parameters

No

Returns

A column with the constant value

Throws

Application Exception

Use Case:

lit

Example:

lit(1) will return a column having 1 in all the rows of the column.

Md5

Calculates an MD5 128-bit checksum for the string.

Input Parameters: Column arg0

• arg0 (required) - A string column

Output Type

String Column

Configuration Parameters

No

Returns

The value is returned as a string of 32 hex digits, or NULL if the argument was NULL.

Throws

Application Exception

Use Case:

Example:

md5('ABC') will give you an output '902fbdd2b1df0c4f70b4a5d23525e932'.

Monotonically_increasing_id

A column expression that generates monotonically increasing 64-bit integers.

Input Parameters: No input arguments.

Output Type

Integer Column

Configuration Parameters

No

Returns

Monotonically increasing integers.

Throws

Application Exception

Example:

monotonically_increasing_id() will return rows as 0,1,2...

Sha1

Calculates the SHA-1 digest for string and returns the value as a hex string

Input Parameters: Column arg0

• arg0 (requird) - A string column

Output Type

String Column

Configuration Parameters

No

Returns

A hex string.

Throws

Application Exception

Use Case:

Example:

sha1("ABC") will return '3c01bdbb26f358bab27f267924aa2c9a03fcfdb8'

Sha2

Calculates the SHA-2 family of hash functions of a string value and returns the value as a hex string. NumBits controls the number of bits in the message digest.

Input Parameters: Column arg0,int numBits

• arg0 (required) - A string column

• arg1 (required) - one of 224, 256, 384, or 512.

Output Type

String Column

Configuration Parameters

No

Returns

A hex String.

Throws

Application Exception

Use Case:

Example:

sha2("Sam",256) will return '4ecde249d747d51d8..'

ShiftLeft

Bitwise left shift, Shifts a b positions to the left.

Input Parameters: Column arg0,int numBits

• arg0 (required) - A number column

• arg1 (required) - An integer column

Output Type

Number Column

Configuration Parameters

No

Returns

If the given value is a long value, this function will return a long value else it will return an integer value

Throws

Application Exception

Use Case:

Example:

shiftLeft(258,2) will return 1032

ShiftRight

Bitwise right shift, Shifts a b positions to the right.

Input Parameters: Column arg0,int numBits

• arg0 (required) - A number column

• arg1 (required) - An integer column

Output Type

Number Column

Configuration Parameters

No

Returns

If the given value is a long value, this function will return a long value else it will return an integer value

Throws

Application Exception

Use Case:

shiftright

Example:

shiftRight(258,2) will return 64

ShiftRightUnsigned

Bitwise unsigned right shift, Shifts a b positions to the right.

Input Parameters: Column arg0,int numBits

• arg0 (required) - A number column

• arg1 (required) - An integer column.

Output Type

Number Column

Configuration Parameters

No

Returns

If the given value is a long value, this function will return a long value else it will return an integer value

Throws

Application Exception

Use Case:

shift_right

Example:

shiftRightUnsigned(258,2) will return 64