Administration

This topic presents the side navigation panel, referred as the main menu and its features that can be used to perform several administrative tasks in StreamAnalytix. The illustration of the main menu is given below and the tasks that can be performed with these features are explained further in detail.

Note: The main menu is only displayed for the Superusers (System Admin) login.Homepage

Manage Workspaces

StreamAnalytix provides multi-tenancy support through Workspaces.

Superuser can create multiple workspaces and based on the user authentication and authorization configuration settings, add users to a workspace.

One user can be mapped to multiple workspaces and can even be assigned with multiple roles based on the customized requirements.

Create a Workspace

A superuser can create any number of workspaces.

To begin, go to Manage Workspaces and click on the Create New Workspace option.

Enter the details in the Create Workspace form as described in the table below:

Property

Description

Workspace Name

Enter a unique name for your workspace.

Version Control

The user can select any one of the below options:

StreamAnalytix Metastore, Bitbucket, GitHub or Gitlab.

Note: If the user selects Bitbucket, GitHub or Gitlab option, then the user has an option to connect to Git either by providing credentials (i.e., Git username and password) or by uploading Git SSH key.

The user can FETCH WORKSPACE after validating the credentials .

When the user authentication and authorization is configured to be controlled with StreamAnalytix Metastore option, in that condition the superuser will be prompted to fill the necessary values for the below fields:

User

The superuser can select one of the two options provided:

- Create new user

- Existing user

If the user selects Existing user option, then existing user(s) could be selected from the Assign users option who should have access to this new workspace.

The user can optionally configure Artifactory by providing Artifactory URL, Username and Password.

Click Create to finish the action.

If the user selects Create new user option, then continue with the subsequent configuration.

Username

Enter a valid username. It will be used to login.

Email Id

Enter an email id. It will be used for any communication with the user.

Password

Enter a Password.

Confirm Password

Re-type the password.

Language

Choose the language, from English (US) and German (DE).

Multiple Workspace

Optionally, assign the new user to multiple workspaces that are existing in StreamAnalytix.

Configure Artifactory

The user can configure artifactory by checkmarking the checkbox. Provide Artifactory URL, Username and Password.

When the user authentication and authorization is configured to be controlled with LDAP/Active Directory option, in that condition the superuser will be prompted to fill the necessary values for the below fields:

LDAP User Role Mapping

Note: Multiple group mappings can be provided for each role.

Workspace Admin

It is a mandatory role.

Note: At least one LDAP group is needed for mapping the Workspace Admin role.

Add Roles

Along with the mandatory Workspace Admin role, the Add Roles option can be used to add the below optional roles in Workspaces.

The user groups that are existing in the LDAP/AD directory can be added in the Mapped LDAP Group field.

Data Analyst - I

Data Analyst - II

Read-Only

Any other custom role that is created by the superuser will also get listed here.

Validate

This option helps to validate the group IDs that the user mentions in the Mapped LDAP Group field. Upon successful validation, “LDAP groups validated successfully.” message is displayed. Also, if any of the group IDs that are mentioned do not exist in LDAP the user will be notified with a message for the same.

Configure Artifactory

The user can configure artifactory by checkmarking the checkbox. Provide Artifactory URL, Username and Password.


Click on Create option to save the changes and the new Workspace will be listed in the Manage Workspaces page.

Edit Workspace

How to assign Spark Cores: You can assign Spark cores to a workspace, after the workspace is created. Click in the tab: Spark cores and fill in the number of cores required for the workspace. If the tab is left blank, it signifies that there is no limit to the usage of Spark cores.

analytix

To edit a workspace, go to the Workspaces page and click on the Edit button.

edit

Build in Metastore

The user can edit below parameters of an existing workspace:

Version Control

Assign Users

Git User Name and Password (User can FETCH WORKSPACE)

Configure Artifactory by specifying Artifactory URL, User Name and Password.

Click UPDATE to save changes.

With LDAP/Active Directory

The user can edit all the parameters of an existing workspace which were provided during the Workspace creation except the Workspace Name.

Click UPDATE to save changes.

Enter Workspace

To enter a workspace, click on the enter icon.

enter

Once the user enters into a Workspace, similar components will appear on the Workspace landing page as explained earlier in the Getting Started topic.

To know more about the Workspace menu, see Projects, Manage Workspace Connections, Register Container Image and Manage Users (Workspace Menu).

Navigate Between Workspaces

After entering in a workspace, click the workspace icon in the upper right corner of the page to view a drop-down list of the workspaces. Choose the workspace name from the list in which you wish to enter.

Note: There is no provision to delete any workspace.

Manage Data Pipeline

With the Manage Data Pipeline feature, the superuser can view a consolidated Pipeline execution status for all the Workspaces. The Superuser can either search and select particular Workspace(s) or select all to view the consolidated list of all the pipelines in StreamAnalytix.

By clicking on any of the pipeline the Actions tab appears which has the summary of that pipeline. The Superuser can start, download or view the history of any pipeline with the help of this Actions tab.

ManagePipeline

Manage Setup

Refer to the Installation topic, Setup section.

This section defines the properties of- Cluster Configuration, StreamAnalytix Settings, Database, Messaging Queue, Elasticsearch, Cassandra and Version Control.

Manage Configuration

Configuration page enables configuration of StreamAnalytix properties.ConfigHomepage

Update Configuration

Each sub-category contains configuration in key-value pairs. You can update multiple property values in single shot.

Update the values that you want then scroll down to bottom and click on Save button.ConfigHomepage2

You will be notified with a successful update message as shown below:

notify

Search Properties

Performs search operation to find property key or property value. You can search by using partial words of key labels, key names or key values.searchKeywordConfig

The above figure shows, matching configuration values and count for the searched keyword “url”.

View Key Description

By hovering the mouse on a property label, a box will show the fully qualified name of the key and click on the i button for its description.escriptionProperty

The above figure shows, matching configuration values and count for the searched keyword “url”.

Copy Key Name

Copy the fully qualified name of property key by clicking on key’s label as shown below.

The key name will be copied to clipboard.copy

StreamAnalytix configuration settings are divided into various categories and sub-categories according to the component and technology.

Web Studio

Configurations properties related to application server, i.e. StreamAnalytix web studio. This category is further divided into various sub-categories.

Platform

Field

Description

StreamAnalytix UI Host

The IP address of StreamAnalytix.

StreamAnalytix Installation Directory

The installation directory of the StreamAnalytix.

StreamAnalytix Web URL

The URL of StreamAnalytix web studio.

StreamAnalytix UI Port

The UI port of StreamAnalytix.

LogMonitoring UI Host

The host address of LogMonitoring.

LogMonitoring UI Port

The port of LogMonitoring.

Messaging Type

Specifies the Message Queuing System that application uses internally for messaging. Possible value is RABBITMQ (for RabbitMQ.

StreamAnalytix Monitoring Reporters Supported

The monitoring reporter type and the possible values should be comma separated graphite, console and logger.

Metric Server

Monitoring Metric Server (Graphite or Ambari).


RDMS

Field

Description

Password

The database password.

Driver Class

The database driver class name.

Connection URL

The database URL for the database.

User

The database username.

Database Dialect

The type of database on which StreamAnalytix database is created. Possible values are MySQL, PostgreSQL, Oracle.


Zookeeper

Field

Description

Host List

The comma separated list of <IP>:<PORT> of all nodes in zookeeper cluster where configuration will be stored.


Indexing

Field

Description

Indexer Type

The default indexer type. For e.g. - Solr or ElasticSearch.

Index Default Replication Factor

Number of additional copies of data to be saved.

Enable Index Default is Batch

Default value for the Batch parameter of indexing.

Index Default Batch Size

Default batch size for the indexing store.

Enable Index Default Across Field Search

Search without specifying column names, takes extra space and time.

Index Default Number of Shards

Number of shards to be created in index store.

Index Default Routing Required

The default value for the Routing parameter of indexing.

Indexer Default Source

The default value for the Source parameter of indexing.

Index Retries

The number of retries for indexing.

Index Retries Interval(in ms)

The retries interval for the indexing when ingestion fails.

Indexer time to live in seconds

Indexed data older than mentioned time in seconds from current time will not be fetched.


Persistence

Field

Description

Persistence Store

The default persistence type. For e.g. - Hbase, Cassandra.

Persistence Default Is batch Enable

Defines if by default batching should be enabled in persistence.

Persistence Default Batch Size

The batch size for the persistence store.

Persistence Default Compression

The default compression type for the persistence store.


Security

Field

Description

User Authentication Source

This property specifies that which authentication source, either StreamAnalytix database or LDAP server, must be chosen to match or bind user’s credentials while login into application. If configured with LDAP, user who is trying to login into the application should exist in LDAP server.

User Authorization Source

Specify user’s authorization mechanism, accordingly user will be assigned appropriate role in the StreamAnalytix webstudio. Possible values are LDAP and DB. Default value is DB.

Superuser (Seed User) Authentication Source

This property specifies the authentication source, either StreamAnalytix database or LDAP server, must be chosen to match or bind user’s credentials while login into application. If configured with LDAP, user who is trying to login into the application should exist in LDAP server.


RT Dashboard

Field

Description

SuperAdmin Password

The super admin password (Required to access the Dashboard UI).

ReportClient path

The path of ReportClient.properties required to connect with Report Engine.

Connection Name

The connection name created for StreamAnalytix in Dashboard.

Organization ID

The name of organization for StreamAnalytix in Intellicus.

SuperAdmin User ID

The Dashboard super user Username to access the Intellicus via UI.

SuperAdmin Organization

The Dashboard superuser organization name, required to access Intellicus via UI.

StreamAnalytix URL

The dashboard web admin URL, used for showing Dashboard UI from within StreamAnalytix admin.


Databricks

Field

Description

Databricks Enabled

To enable Databricks on this environment.

Databricks Instance URL

Databricks Instance URL to connect databricks account and access it over REST calls.

Databricks Authentication Token

Databricks Access token provided here will be associated with superuser account for StreamAnalytix. It will be saved as encrypted text.

Databricks DBFS Upload Jar Path

DBFS Path for the StreamAnalytix specific jars and files.

Maximum Polling Time (in minutes)

Maximum Polling Time.

Polling Interval

Polling Interval (in seconds).

Databricks Mediator Service URL

This is StreamAnalytix web service URL for Databricks.


EMR

Field

Description

Jar Upload Path

S3 Path for the StreamAnalytix specific jars and files.

Log URI

S3 Path for creating Logs for EMR Cluster launched by StreamAnalytix.

EMR Mediator Service URL

This is StreamAnalytix webservice URL for EMR.

AWS Key

AWS Access key to be associated StreamAnalytix' superuser account.

AWS Secret Key

AWS Access key to be associated superuser account for StreamAnalytix. It will be saved as encrypted text.

AWS Region

The region that the AWS EMR is to be launched in.

EMR Enabled

To enable EMR on this environment.


Processing Engine

Configurations properties related to application processing engines come under this category. This category is further divided into two sub-categories.

Spark

Field

Description

Spark Livy URL

Livy web server URL through which StreamAnalytix submit pipelines on Spark.

Spark Home

The spark installation directory.

Spark Master URL

It is the Spark Master URL for e.g. spark://host1:7077

Spark cluster manager

Defines spark cluster manager i.e. standalone or yarn.

Spark Job Server Log Directory

Directory path where pipeline logs will be generated when using Spark Job server.

Spark UI Port

It is the port on which the spark master UI is running.

spark.history.server

The history server URL.

Spark Hadoop is HDP

If your environment is HDP, set it to True, otherwise set it to false and use for setting proxy user

Resource Manager Host

The resource manager hostname used for spark yarn deployment.

Resource Manager Webapp Port

Yarn Resource Manager UI Port.

Resource Manager Port

The resource manager port used for storm-yarn deployment.

ResourceManager High Availability

Enables Resource Manager’s High Availability.

ResourceManager HA Logical Names

ResourceManager High Availability Logical IDs defined at HA configuration.

ResourceManager HA Hosts

ResourceManager High Availability host names defined at HA configuration.

ResourceManager HA ZK Address

ResourceManager High Availability ZooKeeper-Quorum's address which is defined for HA configuration.

Spark Job Submit Mode

Submit mode of Spark pipeline using Job-Server.

Spark UI Host

Host name of the Spark Master.

Job Server Spark Home

The spark installation directory with which Job Server is configured.

Job Server URL

The host URL of Job Server.

Spark REST Host and Port

Spark REST Host name and port for e.g Host1:6066

Spark Python Path

This environment variable is used to augment the default search path for Python module files. Directories and individual zip files containing pure Python modules can be added to this path. StreamAnalytix uses this variable to find PySpark modules usually located at $SPARK_HOME/python/lib.


Messaging Queue

Configurations properties related to messaging brokers come under this category. This category is further divided into sub-categories.

RabbitMQ

Field

Description

Password

RabbitMQ Password to create connection.

Port

Port number of RabbitMQ.

RabbitMQ STOMP URL

RabbitMQ stomp URL.

Host List

IP address of the machine where RabbitMQ is running.

RabbitMQ Virtual Host

The RabbitMQ virtual hosts.

User

Username of RabbitMQ to create connection.

RabbitMQ Web URL

Web URL of RabbitMQ.


Kafka

Field

Description

Kafka Metadata Broker List

The list of comma separated IP:port of Kafka brokers.

Kafka Zookeeper Server List

The list of comma separated IP:port of zookeeper for creating Kafka topic from StreamAnalytix UI.

Kafka Topic Administration

When set to true it specifies that with in the application a Kafka connection has permission to create topics in Kafka.


NoSQL

Configuration properties related to NoSQL databases come under this category. This category is further divided into two sub-categories:

HBase

Field

Description

HBase Zookeeper Host

The zookeeper host names used for HBase cluster.

HBase Zookeeper Port

The zookeeper port for HBase cluster.

HBase Client Retry Number

The number of retries for the HBase client.

HBase Zookeeper Parent Node

Parent node in zookeeper for HBase service metadata.

HBase Zookeeper Recovery Retry

The no. of times to retry the connection to HBase zookeeper.

system-config.hbase.table.administration

When set to true it specifies that with in the application a HBase Default connection has permission to create tables and name spaces in HBase.


Cassandra

Field

Description

Cassandra Host List

Addresses of servers where Cassandra is running.

Cassandra User

Username for Cassandra data store authentication.

Cassandra Password

Password for Cassandra data store authentication.

Cassandra Thrift Client Retry Count

The number of retries the Cassandra client will make to make a connection with server.

Cassandra Thrift Client Delay Between Retries (in ms)

The time (in ms) after which the Cassandra client retries to make a connection to server.

Cassandra Keyspace Replicaton Factor

Defines how many copies of the data will be present in the cluster.

Cassandra Keyspace Replicaton Strategy

Strategy determines the nodes where replicas are placed. Simple Strategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology.

Cassandra Connection Retry Count

Cassandra connection retry count.


Indexing Store

Configurations properties related to search engines come under this category. This category is further divided into two sub-categories:

ElasticSearch

Field

Description

Enable Authentication

Select the check box, if Elasticsearch authentication is enabled.

Elasticsearch Cluster Name

Name of the Elasticsearch cluster.

Elasticsearch Connection URL

The http connection URL for Elasticsearch.

Connection Timeout in secs

ElasticSearch connection timeout in seconds.

Elasticsearch Embedded Data Directory

The data directory for running embedded Elasticsearch.

Elasticsearch Embedded Enable data

Defines either to store data into disk or memory (true for disk, false for memory).

Elasticsearch Embedded Enable HTTP

Defines either the http connection is enabled or not for embedded Elasticsearch.

Elasticsearch Embedded Enable local

The value of this field should be true.

Elasticsearch Embedded Node Name

The node name of embedded as node.

Elasticsearch HTTP Connection URL

The http connection URL for Elasticsearch.

Elasticsearch HTTP Port

The port on which Elasticsearch REST URI is hosted.

Keystore Password

Elasticsearch keystore password.

Keystore Path

Elasticsearch keystore file (.p12) path.

Request Timeout in secs

Request Retry Timeout for ElasticSearch connection in seconds.

Enable Security

If security is enabled on Elasticsearch, set this to true.

Socket Timeout in secs

Socket Timeout for ElasticSearch connection in seconds

Enable SSL

Select the checkbox if SSL is enabled on Elasticsearch

Username

Elasticsearch authentication username.


Solr

Field

Description

Solr Zookeeper Hosts

The Zookeeper hosts for the Solr server.

Solr Configuration Version

Solr version number to create the zookeeper config node path for solr data.


Metrics Store

Configuration properties related to metric servers come under this category. This category is further divided into various sub-categories.

Graphite

Field

Description

Port

Port number of Graphite.

Host

IP address of the machine where Graphite is running.

UI Port

UI port number of Graphite.


Ambari

Field

Description

Metric Collector Port

Ambari Metric Collector port.

Metric Collector Host

Hostname where Ambari Metric Collector is running.


Hadoop

Configuration properties related to Hadoop, i.e. StreamAnalytix web studio, come under this category. This category is further divided into various sub-categories.

Hive

Field

Description

Hive Meta Store URI

Defines the hive metastore URI.

Hive Server2 JDBC URL

Password for HiveServer2 JDBC connection. In case no password is required pass it as empty("").

Hive Server2 Password

Defines the Hive server-2 password.

Hive Warehouse Dir

Defines the warehouse directory path of Hive server.


HDFS

Field

Description

Hadoop Enable HA

Hadoop cluster is HA enabled or not.

File System URI

The file system URI. For e.g. - hdfs://hostname:port, hdfs://nameservice, file://, maprfs://clustername

Hadoop User

The name of user through which the hadoop service is running.

Hadoop DFS Name Services

The name service id of Hadoop HA cluster.

Hadoop Namenode 1 Details

The RPC Address of namenode1.

Hadoop Namenode 2 Details

The RPC Address of namenode2.


Others

Miscellaneous configuration properties can be defined in the Others tab. This tab is further divided into various sub-categories as explained below:

LDAP

Field

Description

User Name

Provide the user name for which the LDAP configuration is validating.

LDAP Connection URL

Provide the URL of LDAP server which is a string that can be used to encapsulate the address and port of a directory server. For eg. ldap://host:port

User Distinguished Name

A unique name which is used to find the user in LDAP Server.

Password

Password against which the user will be authenticated in LDAP Server.

User Search Base

Defines the part of the directory tree under which DN searches will be performed.

Group Search Base

Defines the part of the directory tree under which group searches will be performed.

Group Search Filter

The filter which is used to search for group membership. The default is member={0 corresponding to the groupOfMembers LDAP class. In this case, the substituted parameter is the full distinguished name of the user. The parameter {1} can be used if you want to filter on the login name.

User Search Filter

The filter which will be used to search DN within the User Search Base defined above.

Group Name Search Filter

This field will search further user groups that are mentioned here in the LDAP directory.

Admin Group Name

LDAP group name which maps to application's Admin role.

Developer Group Name

LDAP group name which maps to application's Developer role.

Devops Group Name

LDAP group name which maps to application's Devops role.

Tier-II Group Name

LDAP group name which maps to application's Tier-II role.


Note: Superuser can use the TEST CONFIGURATION option to validate configuration parameters for the LDAP connections.

-   Valid LDAP configuration will provide the user with a success message stating that configuration validation is successful.

-   Invalid LDAP configuration will provide the user with an error message stating exactly the configuration parameter details that are incorrect or missing.

Configuring LDAP

Authentication is the process of identifying a user’s identity by obtaining credentials and using them to verify identity.

Post authentication, user must gain authorization for performing certain tasks.

The authorization process determines the access control list (ACL) to manage user access.

It is a process of applying policies to determine the tasks or services permitted to a user.

StreamAnalytix provides three ways for user authentication and authorization:

1. Use a Database for both authentication and authorization.

2. Use a LDAP server for both authentication and authorization.

3. Use a LDAP server for authentication and a Database for authorization.

Configuration

In StreamAnalytix, the configuration for user authentication and authorization is under Web Studio.

# User Authentication and Authorization source values are:

# User Authentication and Authorization source

authentication.source: db (or ldap)

authorization.source: db (or ldap)


Possible values are db and ldap for build-in system database and LDAP active directory server, respectively. The default value is db.

User Authentication

This property specifies the authentication source (database or LDAP).

StreamAnalytix supports two types of authentication systems:

Own built-in database: Enables administrator to create and save user details in the system database.

If the property authentication.source is db, user who is trying to login into the application should exist in database. If not, only a Superuser or a Developer can add them as a new user.

LDAP: Configure the system to integrate with an Active Directory server for user management.

If the property authentication.source is ldap, make sure the user exists in the LDAP directory.

In this case, a user with the role Developer is pre-created to leverage multi-tenancy support provided by the platform. LDAP server directly authenticates Dev-Ops and Tier-II users.

User Authorization

This property specifies the authorization source (database or LDAP) to map users with their role(s).

StreamAnalytix supports two types of authorization systems:

Own built-in database: If the property authorization.source is db, two cases follow:

Case I: authentication.source is db.

In this case, user who is trying to login into the application, should exist in database with any of the four roles. During authorization process, the user role is fetched from database and assigned to user’s security context.

Case II: authentication.source is ldap. In this case, user should exist in LDAP directory (in order to perform user authentication) as well as in the database (in order to retrieve user role).

LDAP: If the property authorization.source is ldap, it is mandatory to configure authentication.source also with LDAP.

In this case, user role is assigned based on LDAP group’s common names (cn).

If authorization is done via LDAP, the user needs to mention LDAP group names mapping with the application’s user roles on configuration page.

This specifies which LDAP group users belongs to which application’s role.

Configure four types of group names inside Configuration<LDAP.

Admin Group Name: LDAP group name which maps to application's Admin role.

Developer Group Name: LDAP group name which maps to application's Developer role.

DevOps Group Name: LDAP group name that maps to application's DevOps role.

Tier-II Group Name: LDAP group name that maps to application's Tier-II role.

ldap

Below is the screenshot of group names of LDAP server:

dietroy

You can also import the required LDAP-Group vs. StreamAnalytix-Role mapping into the database prior to login by using the sample script as shown below:

Query Example (with MySQL):

INSERT INTO company_sax_rolemappings (‘company_role’, ‘sax_role’) values (‘PROD_DEV_USER’, ‘ROLE_ADVANCED_USER’), (‘PROD_OPS_USER’, ‘ROLE_NORMAL_USER’);


At the time of LDAP authorization, group’s common name (cn), where the authenticated user exist, will be search and retrieved.

Then the group name is mapped with StreamAnalytix role with the help of table data (shown above). This evaluated role will then be finally assigned to the user.

Limitations:

There are a few constraints for Manage Users tab’s visibility on the UI if both authentication and authorization use LDAP.

Manage Users tab is not visible to Super-user since DevOps and Tier-II users do not need to be managed explicitly, rather they are managed by LDAP directory itself.

In contrast, for Developer users, Manage Users tab is visible since Developer user details need to be stored and managed in database for multi-tenancy support.\

Couchbase

Field

Description

Max Pool Size

The Couchbase Max Pool Size.

Default Bucket Memory Size

The memory size of default bucket in Couchbase.

Password

The Couchbase password.

Default Bucket Replica No

The Couchbase default bucket replication number.

Host Port

The port no. of Couchbase.

Host Name

The host on which the Couchbase is running.

HTTP URL

The Couchbase http URL.

Bucket List

The Couchbase bucket list.

Polling timeout

The polling timeout of Couchbase.

Polling sleeptime

The sleep time between each polling.

User Name

The username of the Couchbase user.


Kerberos

Field

Description

Hadoop NameNode Kerberos Principal

Service principal of name node.

Kerberos Configuration File Override

Set to true if you want the keytab_login.conf file to be (re)created for every running pipeline when Kerberos security is enabled.

Hadoop Core Site Location

The property should be used when trying to connect HDFS from two different realms. This property signifies the path of Hadoop core-site.xml containing roles for cross-realm communications.

Hbase Master Kerberos Principal

Service principal of HBase master.

ResourceManager Kerberos Principal

Service principal of resource manager

Hbase Regionserver Kerberos Principal

Service principal of region server.

Hive Metastore Kerberos principal

Service principal of Hive metastore.

HiveServer2 Kerberos Principal

Service principal of hive server 2


Configuring Kerberos

You can add extra Java options for any Spark Superuser pipeline in following way:

Login as Superuser and click on Data Pipeline and edit any pipeline.

Kafka

HDFS

HBASE

SOLR

Zookeeper

Configure Kerberos

Once Kerberos is enabled, go to Superuser UI > Configuration > Environment > Kerberos to configure Kerberos.

kerberso

Configure Kerberos in Components

Go to Superuser UI > Connections, edit the component connection settings as explained below:

HBase, HDFS

co

Field

Description

Key Tab Select Option

A Keytab is a file containing pair of Kerberos principals and encrypted keys. You can use Keytab to authenticate various remote systems. It has two options:

Specify Keytab File Path: Path where Keytab file is stored

Upload Keytab File: Upload Keytab file from your local file system.

Specify Keytab File Path

If the option selected is Specify Keytab File Path, system will display the field KeyTab File Path where you will specify the keytab file location.

Upload Keytab File

If the option selected is Upload Keytab File, system will display the field Upload Keytab File that will enable you to upload the Keytab file.


By default, Kerberos security is configured for these components: Solr, Kafka and Zookeeper. No manual configuration required.

Note: For Solr, Kafka and Zookeeper, Security is configured by providing principals and keytab paths in keytab_login.conf. This file then needs to be placed in StreamAnalytix/conf/common/kerberos and StreamAnalytix/conf/thirdpartylib folders.

Jupyter

Field

Description

jupyter.hdfs.port

HDFS Http port.

jupyter.hdfs.dir

HDFS location where uploaded data will be saved.

jupyter.dir

Location where notebooks will be created.

jupyter.notebook.service.port

Port on which Auto create Notebook service is running.

jupyter.hdfs.connection.name

HDFS connection name use to connect HDFS (from StreamAnalytix connection tab).

jupyter.url

URL contains IP address and port where Jupyter services are running.


Cloudera

Field

Description

Navigator URL

The Cloudera Navigator URL.

Navigator API Version

The Cloudera Navigator API version used.

Navigator Admin User

The Cloudera navigator Admin user.

Navigator User Password

The Cloudera navigator Admin user password.

Autocommit Enabled

Specifies of the auto-commit of entities is required.


Airflow

Field

Description

Airflow Server Token Name

It is the key that is used to authenticate a request. It should be same as the value given in section Plugin Installation>Authentication for property ‘sax_request_http_token_name’

Airflow Server Token Required

Check if the token is required.

Airflow Server Token Value

HTTP token to authenticate request. It should be same as the value given in section Plugin Installation > Authentication for property ‘sax_request_http_token_value’

Airflow Server URL

Airflow URL to connect to Airflow.


Artifactory

Field

Description

Artifactory URL

Provide artifactory URL for package management.

User Name

Provide a unique user name.

Password

Enter the Password.


Cluster Templates

Cluster Templates configuration allows a user to edit the Memory and Cores allocation for the predefined Small, Medium, and Large cluster templates.

This will be useful in Notebook Environment configuration while selecting or defining a value for Template parameter i.e., Small, Medium, Large, or Custom. For more details, see Topic Notebook Environment.

Default

All default or shared kind of configurations properties come under this category. This category is further divided into various sub-categories.

Platform

Field

Description

Application Logging Level

The logging level to be used for StreamAnalytix logs.

StreamAnalytix HTTPs Enabled

Whether StreamAnalytix application support HTTPs protocol or not.

Spark HTTPs Enabled

Whether Spark server support HTTPs protocol or not.

Test Connection Time Out

Timeout for test connection (in ms).

Java Temp Directory

The temp directory location.

StreamAnalytix Reporting Period

Whether to enable View Data link in application or not.

View Data Enabled

Whether to enable View Data link in application or not.

TraceMessage Compression

The type of compression used on emitted TraceMessage from any component.

Message Compression

The type of compression used on emitted object from any component.

Enable StreamAnalytix Monitoring Flag

Flag to tell if monitoring is enabled or not.

CEP Type

Defines the name of the cep used. Possible value is esper as of now.

Enable Esper HA Global

To enable or disable HA.

CepHA Wait Interval

The wait interval of primary CEP task node.

StreamAnalytix Scheduler Interval

The topology stopped alert scheduler's time interval in seconds.

Enable StreamAnalytix Scheduler

Flag to enable or disable the topology stopped alert.

StreamAnalytix Session Timeout

The timeout for a login session in StreamAnalytix.

Enable dashboard

Defines whether dashboard is enable or disable.

Enable Log Agent

Defines if Agent Configuration option should be visible on StreamAnalytix GUI or not.

Enable Storm Error Search

Enable showing pipeline Application Errors tab using LogMonitoring search page.

StreamAnalytix Pipeline Error Search Tenant Token

Tenant token for Pipeline Error Search.

StreamAnalytix Storm Error Search Index Expression

Pipeline application error index expression (time based js expression to create indexes in ES or Solr, that is used during retrieval also).

Kafka Spout Connection Retry Sleep Time

Time between consecutive Kafka spout connection retry.

Cluster Manager Home URL

The URL of StreamAnalytix Cluster Manager

StreamAnalytix Pipeline Log Location

StreamAnalytix Pipeline Log Location.

HDFS Location for Pipeline Jars

HDFS Location for Pipeline Jars.

Scheduler Table Prefix

Tables name starting with a prefix which are related to storing scheduler's state.

Scheduler Thread Pool Class

Class used to implement thread pool for the scheduler.

Scheduler Thread Pool Thread Count

This count can be any positive integer, although only numbers between 1 and 100 are practical.

This is the number of threads that are available for concurrent execution of jobs.

If only a few jobs run a few times a day, then 1 thread is plenty. However if multiple jobs, with most of them running every minute, then you probably want a thread count like 50 or 100 (this is dependent on the nature of the jobs performed and available resources).

Scheduler Datasource Max Connections

The maximum number of connections that the scheduler datasource can create in its pool of connections.

Scheduler Misfire Threshold Time

Milliseconds the scheduler will tolerate a trigger to pass its next-fire-time by, before being considered misfired.

HDP Version

Version of HDP ecosystem.

CDH Version

Version of CDH ecosystem.

Audit Targets

Defines the Audit Logging Implementation to be use in the application, Default is file.

Enable Audit

Defines the value (true/false) for enabling audit in application.

Persistence Encryption Key

Specifies the encryption key used to encrypt data in persistence.

Ambari HTTPs Enabled

Whether Ambari server support HTTPs protocol or not.

Graphite HTTPs Enabled

Whether Graphite server support HTTPs protocol or not.

Elastic Search HTTPs Enabled

Whether Elasticsearch engine support HTTPs protocol or not.

SQL Query Execution Log File Path

File location for logging StreamAnalytix SQL query execution statistics.

SQL Query Execution Threshold Time (in ms)

Defines the max limit of execution time for sql queries after which event will be logged (in ms).

Lineage Persistence Store

The data store that will be used by data lineage feature.

Aspectjweaver jar location

The absolute path of aspectweaver jar required for inspect pipeline or data lineage.

Is Apache Environment

Default value is false. For all apache environment set it to "true".


Zookeeper

Field

Description

Zookeeper Retry Count

Zookeeper connection retry count.

Zookeeper Retry Delay Interval

Defines the retry interval for the zookeeper connection.

Zookeeper Session Timeout

Zookeeper's session timeout time.


Spark

Field

Description

Model Registration Validation Timeout (in seconds)

The time, in seconds, after which the MLlib, ML or H2O model registration and validation process will be failed if the process not complete.

Spark Fetch Schema Timeout(in seconds)

The time, in seconds, after which the fetch schema process of register table will be failed if the process not complete.

Spark Failover Scheduler Period (in ms)

Regular intervals to run scheduler tasks. Only applicable for testing connection of Data Sources in running pipeline.

Spark Failover Scheduler Delay (in ms)

Delay after which a scheduler task can run once it is ready. Only applicable for testing connection of Data Sources in running pipeline.

Refresh Superuser Pipelines and Connections

Whether to refresh Superuser Pipelines and Default Connections in database while web studio restart.

SparkErrorSearchPipeline Index Expression

Pipeline application error index expression (time based js expression to create indexes in ES or Solr, that is used during retrieval).

Enable Spark Error Search

Enabled to index and search spark pipeline error in LogMonitoring.

Register Model Minimum Memory

Minimum memory required for web studio to register tables, MLlib, ML or H2O models. Example -Xms512m.

Register Model Maximum Memory

Maximum memory required for web studio to register tables, MLlib, ML or H2O models. Example -Xmx2048m.

H2O Jar Location

Local file system's directory location at which H2O model jar will be placed after model registration.

H2O Model HDFS Jar Location

HDFS path location at which H2O model jar will be placed after model registration.

Spark Monitoring Scheduler Delay(in ms)

Specifies the Spark monitoring scheduler delay in milliseconds.

Spark Monitoring Scheduler Period(in ms)

Specifies the Spark monitoring scheduler period in milliseconds.

Spark Monitoring Enable

Specifies the flag to enable the spark monitoring.

Spark Executor Java Agent Config

Spark Executor Java Agent configuration to monitor executor process, the command includes jar path, configuration file path and Name of the process.

Spark JVM Monitoring Enable

Specifies the flag to enable the spark monitoring.

Spark Version

By default the version is set to 2.3.

Note: Set spark version to 2.2 for HDP 2.6.3”

Livy Supported JARs Location

HDFS location where livy related jar file and application streaming jar file have been kept.

Livy Session Driver Memory

Minimum memory that will be allocated to driver while creating livy session.

Livy Session Driver Vcores

Minimum virtual cores that will be allocated to driver while creating Livy session.

Livy Session Executor Memory

Minimum executor instances that will be allocated while executing while creating Livy seconds where sample data has been kept while schema auto detection.

Livy Session Executor Vcores

Minimum virtual cores that will be allocated to executor while creating Livy session.

Livy Session Executor Instances

Minimum executor instances that will be allocated while executing while creating Livy session.HDFS where sample data has been kept while schema auto detection.

Livy Custom Jar HDFS Path

The full qualified path of HDFS where uploaded custom jar has been kept while creating pipeline.

Livy Data Fetch Timeout

The query time interval in seconds for fetching data while data inspection.

isMonitoringGraphsEnabled

Whether monitoring graph is enabled or not.

ES query monitoring index name

this property stores the data of monitoring in this given index of default ES connection.

Scheduler period for ES monitoring purging

in this time interval purging scheduler will invoke and check whether the above index is eligible for purging (in sec.) (tomcat restart require).

Rotation policy for of ES monitoring graph

“It can have two values daily or weekly”

If daily index will be rotated daily else weekly means only a single day data will be stored in single index otherwise a data of a week will be stored in an index.

Purging duration of ES monitoring index

It’s a duration after which index will be deleted default is 604800 sec. Means index will be deleted after 1 week.” (tomcat restart requires)

Enable purging scheduler for ES Graph monitoring

If we need purging of index or not depend on this flag. Purging will not take place if flag is disable. It requires restart of Tomcat Server.


Monitoring

Field

Description

Enable Monitoring Graphs

By checking this option, the batch monitoring statistics can be captured of the created pipeline.


RabbitMQ

Field

Description

RabbitMQ Max Retries

Defines maximum number of retries for the RabbitMQ connection.

RabbitMQ Retry Delay Interval

Defines the retry delay intervals for RabbitMQ connection.

RabbitMQ Session Timeout

Defines session timeout for the RabbitMQ connection.

Real-time Alerts Exchange Name

Defines the RabbitMQ exchange name for real time alert data.


Kafka

Field

Description

Kafka Message Fetch Size Bytes

The number of byes of messages to attempt to fetch for each topic-partition in each fetch request.

Kafka Producer Type

Defines whether Kafka producing data in async or sync mode.

Kafka Zookeeper Session Timeout(in ms)

The Kafka Zookeeper Connection timeout.

Kafka Producer Serializer Class

The class name of the Kafka producer key serializer used.

Kafka Producer Partitioner Class

The class name of the Kafka producer partitioner used.

Kafka Key Serializer Class

The class name of the Kafka producer serializer used.

Kafka 0.9 Producer Serializer Class

The class name of the Kafka 0.9 producer key serializer used.

Kafka 0.9 Producer Partitioner Class

The class name of the Kafka 0.9 producer partitioner used.

Kafka 0.9 Key Serializer Class

The class name of the Kafka 0.9 producer serializer used.

Kafka Producer Batch Size

The batch size of data produced at Kafka from log agent.

Kafka Producer Topic Metadata Refresh Interval(in ms)

The metadata refresh time taken by Kafka when there is a failure.

Kafka Producer Retry Backoff(in ms)

The amount of time that the Kafka producer waits before refreshing the metadata.

Kafka Producer Message Send Max Retry Count

The number of times the producer will automatically retry a failed send request.

Kafka Producer Request Required Acks

The acknowledgment of when a produce request is considered completed.


Security

Field

Description

Kerberos Sections

Section names in keytab_login.conf for which keytabs must be extracted from pipeline if krb.config.override is set to true.

Hadoop Security Enabled

Set to true if Hadoop in use is secured with Kerberos Authentication.

Kafka Security Enabled

Set to true if Kafka in use is secured with Kerberos Authentication.

Solr Security Enabled

Set to true if Solr in use is secured with Kerberos Authentication.

Keytab login conf file Path

Specify path for keytab_login.conf file.


CloudTrial

Field

Description

Cloud Trial

The flag for Cloud Trial. Possible values are True/False.

Cloud Trial Max Datausage Monitoring Size (in bytes)

The maximum data usage limit for cloud trial.

Cloud Trial Day Data Usage Monitoring Size (in bytes)

The maximum data usage for FTP User.

Cloud Trial Data Usage Monitoring From Time

The time from where to enable the data usage monitoring.

Cloud Trial Workers Limit

The maximum number of workers for FTP user.

FTP Service URL

The URL of FTP service to create the FTP directory for logged in user (required only for cloud trial).

FTP Disk Usage Limit

The disk usage limit for FTP users.

FTP Base Path

The base path for the FTP location.


Monitoring

Enable Monitoring Graphs

Set to True to enable Monitoring and to view monitoring graphs.

QueryServer Monitoring Flag

Defines the flag value (true/false) for enabling the query monitoring.

QueryServer Moniting Reporters Supported

Defines the comma-separated list of appenders where metrics will be published. Valid values are graphite, console, logger.

QueryServer Metrics Conversion Rate Unit

Specifies the unit of rates for calculating the queryserver metrics.

QueryServer Metrics Duration Rate Unit

Specifies the unit of duration for the queryserver metrics.

QueryServer Metrics Report Duration

Time period after which query server metrics should be published.

Query Retries

Specifies the number of retries to make a query in indexing.

Query Retry Interval (in ms)

Defines query retry interval in milliseconds.

Error Search Scroll Size

Number of records to fetch in each page scroll. Default value is 10.

Error Search Scroll Expiry Time (in secs)

Time after which search results will expire. Default value is 300 seconds.

Index Name Prefix

Prefix to use for error search system index creation. The prefix will be used to evaluate exact index name with partitioning. Default value is sax_error_.

Index number of shards

Number of shards to create in the error search index. Default value is 5.

Index Replication Factor

Number of replica copies to maintain for each index shard. Default value is 0.

Index Scheduler Frequency (in secs)

Interval (in secs) after which scheduler will collect error data and index in index store.

Index Partitioning Duration (in hours)

Time duration after which a new index will be created using partitioning. Default value is 24 hours.

Data Retention Time (in days)

Time duration for retaining old data. Data above this threshold will be deleted by scheduler. Default value is 60 days.


Audit

Field

Description

Default Value

Enable Event Auditing

Defines the value for enabling events auditing in the application.

true

Events Collection Frequency

(in secs)

Time interval (in seconds) in which batch of captured events will be processed for indexing.

10

Events Search Scroll size

Number of records to fetch in each page scroll on result table.

100

Events Search Scroll Expiry

(in secs)

Time duration (in seconds) for search scroll window to expire.

300

Events Index Name Prefix

Prefix string for events index name. The prefix will be used to evaluate exact target index name while data partitioning process.

sax_audit_

Events Index Number of Shards

Number of shards to create for events index.

5

Events Index Replication Factor

Number of replica copies to maintain for each index shard.

0

Index Partitioning Duration

(in hours)

Time duration (in hours) after which a new index will be created for events data. A partition number will be calculated based on this property. This calculated partition number prefixed with Events Index Name Prefix value will make target index name.

24

Events Retention Time (in days)

Retention time (in days) of data after which it will be auto deleted.

60

Events Indexing Retries

Number of retries to index events data before sending it to a WAL file.

5

Events Indexing Retries Interval

(in milliseconds)

It defines the retries interval (in milliseconds) to perform subsequent retries.

3000


Query Server

Field

Description

QueryServer Monitoring Flag

The flag value (true/false) for enabling the query monitoring.

QueryServer Monitoring Reporters Supported

The comma-separated list of appenders where metrics will be published. Valid values are graphite, console, logger.

QueryServer Metrics Conversion Rate Unit

Specifies the unit of rates for calculating the queryserver metrics.

QueryServer Metrics Duration Rate Unit

Specifies the unit of duration for the queryserver metrics.

QueryServer Metrics Report Duration

Time after which query server metrics should be published.

QueryServer Metrics Report Duration Unit

The units for reporting query server metrics.

Query Retries

The number of retries to make a query in indexing.

Query Retry Interval (in ms)

Defines query retry interval in milliseconds.


Others

Field

Description

Audit Targets

Defines the audit logging implementation to be used in the application, Default is fine.

ActiveMQ Connection Timeout(in ms)

Defines the ActiveMQ connection timeout interval in ms.

MQTT Max Retries

Max retries of MQTT server.

MQTT Retry Delay Interval

Retry interval, in milliseconds, for MQTT retry mechanism.

JMS Max Retries

Max retries of JMS server.

JMS Retry Delay Interval

Retry interval, in milliseconds, for JMS retry mechanism.

Metrics Conversion Rate Unit

Specifies the unit of rates for calculating the queryserver metrics.

Metrics Duration Rate Unit

Specifies the unit of duration for the metrics.

Metrics Report Duration

Specifies the duration at interval of which reporting of metrics will be done.

Metrics Report Duration Unit

Specifies the unit of the duration at which queryserver metrics will be reported.

StreamAnalytix Default Tenant Token

Token of user for HTTP calls to LogMonitoring for adding/modifying system info.

LogMonitoring Dashboard Interval(in min)

Log monitoring application refresh interval.

Logmonitoring Supervisors Servers

Servers dedicated to run LogMonitoring pipeline.

Export Search Raw Field

Comma separated fields to export LogMonitoring search result.

Elasticsearch Keystore download path prefix

Elasticsearch keystore download path prefix in case of uploading keystore.

Tail Logs Server Port

Listening port number where tail command will listen incoming streams of logs, default is 9001.

Tail Logs Max Buffer Size

Maximum number of lines, that can be stored on browser, default is 1000.

sax.datasets.profile.frequency.distribution.count.limit

Defines the number of distinct values to be shown in the frequency distribution graph of a column in a Dataset.

sax.datasets.profile.generator.json.template

common/templates/DatasetProfileGenerator.json

Template of the spark job used to generate profile of a Dataset.

Pipeline error notification enabled URL

A mail will be sent containing the App URL or optional MQ error search

Pipeline error notification email Id’s

If the Application gets killed by Yarn, hen too an email will be sent.

Load IDW functions on Inspect And Pipeline Run

The user can select this option to load the IDW functions on inspect and pipeline runtime.

Impersonation User Editable

The user can select this option to edit the user inpersonation.

Superuser connections allowed

This option allows the Superuser to control visibility of default and newly created Connections by the Superuser, at any other user workspace level.

It is checked by default. If unchecked, it will allow all the users to only view the connections created by them in their respective workspaces.

Metering Retention Period (days)

This option is specific to Cloud environments. It will not impact any performance or functionality in Non-Cloud environments.

H2O Authentication Enabled

If this parameter is checked, then the user will be provided with additional fields to enter the user credentials while creating a Notebook Environment for the H2O cluster.


Manage Connections

Connections allow StreamAnalytix to connect to services like ElasticSearch, JDBC, Kafka, RabbitMQ and many more. A user can create connections to various services and store them in StreamAnalytix application. These connections can then be used while configuring the services in various features of StreamAnalytix which require these services connection details, for e.g., Dataset, Models, Application, Pipeline, Data Validation and so on.

Manage Superuser Connections

To navigate to the Superuser Connections page, the user can click on the Connections feature which is available in the StreamAnalytix main menu.

The default connections are available out-of-box once you install the application. All the default connections expect RabbitMQ are editable.

The user can use these connections or create new connections.

SU_Connections

Create Connections

A superuser can create new connections using the Connections tab. The steps to do the same are as follows:

To add a new connection, follow the below steps:

1. Login as a Superuser.

2. Go to Connections page.

3. Click on Add Connection.

4. Select the component from the drop-down list for which you wish to create a connection.

Note: For applicable component types that require authentication, the configuration field values related to connection authentication are optional for creating connections.

The superuser has a choice to either provide authentication parameter values to test and establish connections, or such values can be left blank to create a connection template with only the mandatory configuration values.

Such components template when utilized for any dataset, model or pipeline feature, the user is provided with an option to override connection authentication.

Example: As shown in the below image, for the pipeline feature when Override Credential option is check-marked, Username and Password fields will get displayed. After providing the necessary credentials, user can do Test Connection to validate the correctness of the credentials provided.

Override_Credentials

To know more about each component’s configuration details, see below connection component types.

ASWIoT

For creating a AWS IoT connection, select AWS IoT from the Component Type drop-down list and provide connection details as explained below.

Field

Description

Component Type

Shows all the available connections. Select AWS IoT Component type from the list.

Connection Name

Name of the connection. For example, AWSIoT.

AWS KeyId

This is the AWS Key i.e. the credential to connect to AWS console.

Secret Access Key

This is AWS secret access key, to access AWS services.

Client EndPoint

AWS IoT Client End Point, which is unique for IoT.

Role ARN

User role ARN. It is used to create rules.

Region

AWS Region.

Connection ClientId

Any stream name.


Azure Blob

For creating a Azure Blob connection, select Aure Blob from the Component Type drop-down list and provide connection details as explained below:

Field

Description

Component Type

Shows available connections.

Connection Name

Name of the connection. For example, AzureBlob.

Azure Blob Connection String

Azure Blob connection String.


Cassandra

For creating a Cassandra connection, select Cassandra from the Component Type drop-down list and provide connection details as explained below:

Field

Description

Component Type

Shows available connections.

Connection Name

Name of the connection. For example, Cassandra

Hosts

Hosts name and ports of Cassandra machine (comma separated).

Test Connection Retries

Number of retries for component connection.

Authentication Enabled

Enable if authentication is enabled in Cassandra.

Username

Username for Cassandra datastore authentication

Password

Password for Cassandra datastore authentication

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated. If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


Cosmos

For creating a Cosmos connection, select Cosmos from the Component Type drop-down list and provide connection details as explained below:

Field

Description

Component Type

Shows available connections.

Connection Name

Name of the connection.

Cosmos Endpoint URI

End point URI of Aure Cosmos DB Account.

Key

Azure Cosmos DB Key

Consistency Level

Consistency levek in Azure Cosmos DB.

Options are:

l Strong

l Bounded Staleness

l Session

l Eventual

l Consistent Prefix

TEST CONNECTION

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated.

If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


Couchbase

For creating a Couchbase connection, select Couchbase from the Component Type drop-down list and provide connection details as explained below:

Field

Description

Component Type

Shows available connections.

Connection Name

Name of the connection.

Hosts

Hosts name and ports of Couchbase machine (comma separated).

Username

Username for Couchbase datastore authentication.

Password

Password for Couchbase datastore authentication.

TEST CONNECTION

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated.

If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


DBFS

For creating a DBFS connection, select DBFS from the Component Type drop-down list and provide other details required for creating the connection.

Field

Description

Component Type

Shows all the available connections. Select S3 Component type from the list.

Connection Name

Name of the connection to be created.

Directory Path

Enter DBFS Parent Path for checkpointing.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated.

If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


EFS

For creating an Elasticsearch connection, select Elasticsearch from the Component Type drop-down list and provide connections details as explained below.

Field

Description

Component Type

Shows available connections. Select EFS Component type from the list.

Connection Name

Connection name to create.

Directory Path

Enter mounted root directory EFS.


ElasticSearch

For creating an Elasticsearch connection, select Elasticsearch from the Component Type drop-down list and provide connections details as explained below.

Field

Description

Component Type

Shows available connections. Select Elasticsearch Component type from the list.

Connection Name

Name of the connection. For example, Elasticsearch

Hosts

Hosts name and ports of Elasticsearch machine.

httpPort

Port number where elastic search is running.

Cluster Name

The name of the cluster to which elastic search will connect.

Connection Timeout secs

Maximum time taken by the client to connect to the Elasticsearch server (unit seconds).

Socket Timeout secs

If the continuous incoming data flow did not occur for a specified period of time, socket timeout connection occurs (unit seconds).

Request Retry Timeout secs

Sets the maximum timeout in case of multiple retries of same request.

Enable Security

Enable X-Pack security plugin for Elasticsearch authentication.

Enable SSL

If security is enabled on Elasticsearch, set this to true.

Keystore select option


Specify keystore path: Specify the Elasticsearch keystore file (.p12) path.

Upload keystore path: Upload the Elasticsearch keystore file (.p12) path.

Keystore file path

Mention the Elasticsearch keystore file (.p12) path.

Keystore password

Keystore password.

Enable Authentication

Select the checkbox if Authentication is enabled on Elasticsearch.

Username

Elasticsearch authentication username.

Password

Elasticsearch authentication password.


Hbase

For creating an Hbase connection, select Hbase from the Component Type drop-down list and provide connections details as explained below.

Field

Description

Component Type

Shows all different types of available connections. Select HBase Component type from the list.

Connection Name

Name of the connection. For example, HBase

HDFS User

HDFS user name.

zK Host

Zookeeper host name for Hbase cluster.

zK Port

Zookeeper port for Hbase cluster.

Client Retries Number

Number of retries for the Hbase Client. For example, 2.

zk Recovery Retry

Number of retries to reconnect to HBase Zookeeper.

zk Parent Node

Parent node in Zookeeper for hbase service metadata.

Table Administration

Enable this if you want to create table in Hbase.

TEST CONNECTION

After entering all the details, click on the Test Connection button, if credentials provided are correct, services are up, and running, the user will get the message: ‘Connection is available’. If the user enters wrong credentials or server is down, then upon clicking the Test Connection button the ‘Connection unavailable’ message will be prompted.


HDFS

For creating a HDFS connection, select HDFS from the Component Type drop-down list and provide connections details as explained below:

Field

Description

Component Type

Shows all different types of available connections. Select HDFS Component type from the list.

Connection Name

Name of the connection. For example, HDFS.

File System URI

File System URI of the machine where HDFS installed.

Username

The name of the user through which Hadoop user is running.

HA Enabled

Hadoop cluster is HA Enabled or not.

Name Node1 Name

NameNode1 identifier/label.

Name Node1 RPC Address

RPC Address of the Name Node1.

Name Node2 Name

NameNode2 identifier/label.

Name Services

Name service id of Hadoop cluster.

Test Connection

After entering all the details, click on the Test Connection button. If credentials provided are correct and the services are up and running, user will get the message: ‘Connection is available’. If user enters wrong credentials or the server is down upon clicking the Test Connection button, the ‘Connection unavailable’ message will be prompted.


HIVE Emitter

For creating a HIVE Emitter connection, Select HIVE Emitter from the Component Type drop-down list and provide connections details as explained below:

Field

Description

Component Type

Shows all the available connections. Select HIVE Emitter Component type from the list.

Connection Name

Name of the connection.

MetaStore URI

Thrift URI to connect to HIVE Metastore service.

Hive Server2 URL

Defines the Hive server 2 JDBC URL. i.e. jdbc:hive2://<host>:<port>/default;principal=<principal>; ssl=<true/false>;sslTrustStore=<truststorepath>;trustStorePassword=<pass>;sslKeyStore=<keystorepath>; keyStorePassword=<pass>; transportMode=<http/binary>;httpPath=cliservice


Please view the note below the table.

HiveServer2 Password

Password for HiveServer2 JDBC connection. If there is no password, leave it blank.

FileSystem URI

HDFS File System URI

UserName

HDFS User name authorized to access the services.

HA Enabled

Check this option, if Name Node of HDFS is HA enabled.

KeyTab Select Option

You can add extra Java options for any Spark Superuser pipeline in following way:


1. Login as Superuser and click on Data Pipeline and edit any pipeline.


Upload key tab file: Enables to upload Key tab file for authentication.

Depending on the option selected.

Specify keytab file path

If the option selected is Specify Keytab File Path, system will display the field KeyTab File Path where you will specify the keytab file location.

Upload key tab file

If the option selected is Upload Keytab File, system will display the field Upload Keytab File that will enable you to upload the keytab file.

Hive Zookeeper Quorum**

Zookeeper hosts used by LLAP. For example: host1:2181;host2:2181;host3:2181

Hive Daemon Service Hosts**

Application name for LLAP service. For example @llap0

Hive Warehouse Load Staging Directory**

Temp directory for batch writes to Hive. For e.g /tmp

Table Administration

Enable this if you want to create table in Hive Emitter.

Create

Click on the Create button to create the connection.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated. If the user enters wrong credentials or server is down, a ‘Connection unavailable’ message will be prompted.


Note:

** Properties marked with these two asterix** are presents only in HDP3.1.0 environment.

The value of Hive Server2 URL will be the value of HiveServer2 Interactive JDBC url (given the in the screenshot). In the HDP 3.1.0 deployment, this is an additional property:

HiveServer2 Interactive JDBC URL: The value is as mentioned below:admin-Hive-interactive

JDBC

For creating a JDBC connection, select JDBC from the Component Type drop-down list and provide connections details as explained below:

Field

Description

Component Type

Shows all different types of available connections. Select JDBC Component type from the list. Then select database type as “IBM DB2”.

Connection Name

Name of the connection.

Database Type

Type of database in which data needs to be dumped. The databases available are: MySQL, POSTGRESQL, ORACLE, MSSQL, Custom, and IBM DB2.

Database Name

The name of the database to be used.

Host

The host of the machine where database is deployed.

Port

The port of the database where database is deployed.

UserName

The username of the database.

Password

The password of the database.

Create

Click on the Create button to create the connection.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


Note: JDBC-driver jar must be in class path while running a pipeline with JDBC emitter or while testing JDBC connection.

Kafka

For creating a Kafka connection, select Kafka from the Component Type drop-down list and provide connections details as explained below.Kafka

Field

Description

Component Type

Shows all the available connections. Select Kafka Component type from the list.

Connection Name

Name of the connection to be created.

zK Hosts

Defines the list of comma separated IP port of Zookeeper for creating Kafka topic from StreamAnalytix UI.

Kafka Brokers

List of Kafka nodes against which connections need to be created.

Enable Topic Administration

Enabling topic administration will allow creation, updation and deletion of a topic.

Enable SSL

Select the check box if connection is to be created with SSL enabled Kafka.

Enable Truststore

Specify Truststore file path or upload Truststore file (in JKS Format).


If option selected is Specify Truststore Path, you will view two additional fields: Truststore Path and Truststore Password.


If option selected is Upload Truststore Path, you will view two additional fields: Upload Truststore File and Truststore Password.

Upload Truststore File

Upload Truststore file by clicking on UPLOAD FILE button.

Truststore Path

Location of the Truststore file.

Truststore Password

Password of the Truststore file.

Enable Authentication

Select the checkbox if client authentication is enabled.

if selected, you will view four additional fields: Keystore Select Option, Upload keystore File, Keystore Password and Password.

Keystore Select Option

Either specify keystore file path or upload keystore file (in JKS Format)

Upload Keystore File

Enables to upload keystore file by clicking on UPLOAD FILE button.

Keystore File Path

Location of the keystore file.

Keystore Password

Keystore password for the Keystore file.

Confluent Kafka

Check the option if Kafka is Confluent. Confluent is a data streaming platform based on Apache Kafka.

Schema Registry URL

Provide schema registry URL. For example:


“schema.registry.url", "http://localhost:8081"

Enable Kerberos

You can use a kerberised and a non kerberised Kafka in the same pipeline. While creating a connection for Kafka, you will be able to select if Kafka Channel will be Kerberised or not.


Check this box if you want to enable a Kerberised Kafka.

SASL Mechanism

When you select Kerberos enabled Kafka connection, StreamAnalytix applies SASL mechanism to enable client authentication.


Kafka brokers supports client authentication via SASL framework.


SASL encryption can be enabled with SSL encryption.


The most common Kerberos supported SSL mechanism i.e. GSSAPI.

GSSAPI is used for Kerberos V5 authentication and offers a data-security layer on the data exchanged.

Security Protocol

The security protocol here enables username/password authentication mechanism that is typically used for encryption to implement secure authentication.


The types of protocols that can be selected are:


No Security: PLAINTEXT (Default)

KERBEROS ENABLED: SASL_PLAINTEXT

KERBEROS + SSL Enabled: SASL_SSL



Apache Kafka brokers supports client authentication via SASL. SASL authentication can be enabled concurrently with SSL encryption (SSL client authentication will be disabled).

SASL JAAS Configuration

Kafka uses Java Authentication and Authorization Service (JAAS) for SASL configuration.

Provide JAAS configurations (Kafka client stanza).


Example: (Refer all the values from Kafka client stanza)


com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true principal="<principal>" keyTab="<keytab_location>" storeKey=true serviceName="<kafka_service_name>" debug=true;

Password

Password of the private key in the keystore file.

Create

Click on the Create button to create the connection.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


Kinesis

For creating a Kinesis connection, select Kinesis from the Component Type drop-down list and provide other details required for creating the connection.

Field

Description

Component Type

Shows all the available connections. Select Kinesis component type from the list.

Access Key Id

Name of the connection to be created.   

Secret Key

AWS account secret key.

Create

Click on the Create button to create the connection.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


KUDU

For creating a KUDU connection, select KUDU from the Component Type drop-down list and provide other details required for creating the connection.

Field

Description

Component Type

Shows all the available connections. Select KUDU component type from the list.

Access Key Id

Name of the connection to be created.   

HOSTS

IP address and port of the machine where RabbitMQ is running.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


MQTT

For creating an MQTT connection, select MQTT from the Component Type drop-down list and provide other details required for creating the connection.

Field

Description

Component Type

Shows all different types of available connections.

Connection Name

Name of the connection to be created.

Host

IP address of the machine where Socket is running.

Port

Port of the machine where Socket is running.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


OpenJMS

For creating an OpenJMS connection, select OpenJMS from the Component Type drop-down list and provide other details required for creating the connection.

Field

Description

Component Type

Shows all the available connections.

Connection Name

Name of the connection to be created.

Connection Factory

A connection factory is an object that a JMS client uses to create a connection with OpenJMS.

Host

IP address of the machine where Socket is running.

Port

Port of the machine where Socket is running.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


RabbitMQ

For creating a RabbitMQ connection, Select RabbitMQ from the Component Type drop-down list and provide connections details as explained below:

Field

Description

Component Type

Shows all the available connections.Select RMQ Component type from the list.

Connection Name

Name of the connection to be created. For example, RabbitMQ,

Host

IP address and port of the machine where RabbitMQ is running.

UserName

Username of RabbitMQ to create connection.

Password

Password of RabbitMQ to create connection.

Create

Click on the create button to create the connection.

Test Connection

After entering all the details, click on the Test Connection button, if credentials provided are correct, services are up, and running, you will get the message Connection is available.

If you enter wrong credentials or server is down and you click on Test Connection, you will get the message Connection unavailable.


RDS

For creating a RDS connection, select RDS from the Component Type drop-down list and provide other details required for creating the connection.

Field

Description

Component Type

Shows all the available connections. Select S3 Component type from the list.

Connection Name

Name of the connection to be created. For example, RabbitMQ.

Database Type

Database Type (MSql, PostgreSql, Oracle, MsSql, Amazon Aurora-MySql, Amazon Aurora-PSSQL)

Enable SSL

Select the checkbox if SSL is enabled on RDS.

Keystore Select Option

Specify keystore path: Specify the Elasticsearch keystore file (.p12) path.

Upload keystore path: Upload the Elasticsearch keystore file (.p12) path.

Upload Keystore File

Enables to upload keystore file by clicking on UPLOAD FILE button.

Database Name

Name of the database.

Host

Hostname of RDS.

Port

Portname of RDS.

Username

Username of the account.

Password

Password of the account.

Create

Click on the Create button to create the connection.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated.

If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


RedShift

For creating a RedShift connection, select RedShift from the Component Type drop-down list and provide other details required for creating the connection.

Field

Description

Component Type

Shows all different types of available connections.Select RedShift Component type from the list.

Connection Name

Name of the connection to be created.

Driver Type

Define Driver version for Redshift.

Database Name

Name of the Database to be used.

Host

Hostname of the machine where Redshift cluster is running.

Port

Port number on which redshift cluster is listening.

UserName

Redshift Username.

Password

Redshift Password.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


S3

For creating a S3 connection, select S3 from the Component Type drop-down list and provide other details required for creating the connection.

Field

Description

Component Type

Shows all the available connections. Select S3 Component type from the list.

Connection Name

Name of the connection to be created. For example, RabbitMQ.

AWS KeyId

S3 account access key.

Secret Access Key

S3 account secret key.

Create

Click on the Create button to create the connection.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successful connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


Salesforce

For creating a Salesforce connection, select Salesforce from the Component Type drop-down list and provide other details required for creating the connection.

Field

Description

Component Type

Shows all the available connections. Select Salesforce Component type from the list.

Connection Name

Name of the connection to be created.

Username

Username of Salesforce to create connection.

Password

Password of Salesforce to create connection.

securityToken

Security token is a case-sensitive alphanumeric key that is used in combination with a password to access Salesforce via API.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successfull connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


Socket

For creating a Socket connection, select Socket from the Component Type drop-down list and provide connections details as explained below.

Field

Description

Component Type

Shows all the available connections.Select Socket component type from the list.

Connection Name

Name of the connection to be created.

Host

IP address of the machine where Socket is running.

Port

Port of the machine where Socket is running.

Create

Click on the Create button to create the connection.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successfull connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


SFTP

For creating a SFTP connection, select SFTP from the Component Type drop-down list and provide other details required for creating the connection.

Note: The user can create the connection using the following options:

- Host and user name

Host, user name and password

-Host, user name and pem file (to be uploaded)

Host, user name and password protected pem file.

Field

Description

Component Type

Shows all the available connections. Select SFTP Component type from the list.

Connection Name

Name of the connection to be created. For example, SFTP.

Host

Provide the host for connection.

Port

Mention the port.

User Name

Specify the user name.

pem Enabled

Upload the pem file.

Password

Specify the password to create the connection.


Snowflake

For creating Snowflake connection, select Snowflake from the Component Type drop-down list and provide other details required for creating the connection:

Field

Description

Component Type

Shows different types of connections available. Select Snowflake.

Connection Name

Name the connection to be created. To create Snowflake connection mention Snowflake.

Connection URL

Provide driver URL to be used. Eg:

jdbc:snowflake://impetuspartner.us-east-1.snowflakecomputing.com

Database Name

Mention the database name to access the data.

User Name

Specify the database username.

Password

Specify the database user password.

KeyStore Select Option

Upload the keystore file or specify the keystore path.

Upload Keystore File

The user will be required to upload the keystore file if he selects this option.


Solr

For creating a Solr connection, Select Solr from the Component Type drop-down list and provide connections details as explained below.

Field

Description

Component Type

Shows all different types of available connections. Select SOLR component type from the list.

Connection Name

Name of the connection to be created.

zkHost

The Zookeeper host for Solr sever.

Create

Click on the Create button to create the connection.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successfull connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


SQS

For creating an SQS connection, select SQS from the Component Type drop-down list and provide other details required for creating the connection.

Field

Description

Component Type

Shows all the available connections. Select Salesforce Component type from the list.

Connection Name

Name of the connection to be created.

AWS Key id

S3 account access key.

Secret Access Key

S3 account secret key.

Region

AWS IOT Region.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successfull connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


Tibco

For creating a Tibco connection, select Tibco from the Component Type drop-down list and provide connections details as explained below.

Field

Description

Component Type

Shows all the available connections. Select Tibco Component type from the list.

Connection Name

Name of the connection to be created.

Host

Host of the Tibco broker.

Port

Port of the MQTT Broker

Create

Click on the Create button to create the connection.

Test Connection

After entering all the details, click on the Test Connection button.

If credentials are correct, a successfull connection message is generated.


If you enter wrong credentials or server is down, you will get the a Connection unavailable message.


Twitter

For creating a Twitter connection, select Twitter from the Component Type drop-down list and provide connections details as explained below.

Field

Description

Component Type

Shows all the available connections. Select Twitter Component type from the list.

Connection Name

Name of the connection to be created.

Consumer Key

Twitter User consumer key.

Consumer Secret Key

Twitter User consumer secret key. Consumer keys are the identity of an application owner.

Access Token

Application owners need to generate Authorization-access tokens that are used to call the Twitter API without sharing their passwords.

This is a twitter user access token.

Access Token Secret

This is a twitter user access token secret


Vertica

For creating a Vertica connection, select Vertica from the Component Type drop-down list and provide connections details as explained below.

Field

Description

Component Type

Shows all different types of available connections. Select VERTICA Component type from the list.

Connection Name

Name of the connection. For example, HDFS.

Database Name

Database name to access data.

Host

Database host name or IP address.

Port

Database port name.

Username

The username of database user.

Password

The password of database user.


5. After entering all the details, click on the Test Connection button, if all the connection component parameters are correct, services are up and running, the user will get the message, “Connection is available”.

If the user credentials are incorrect or server is down, “Connection unavailable” message is displayed.

6. Once the user clicks on the CREATE button, the particular connection gets listed in the Connections page.

Note: The connections created in a Workspace are also listed here and they can be identified by the workspace and owner name. All the connections that are created by the Superuser and the default connections will have their workspace and owner name as superuser.

To know more about Managing Connections at a workspace level, see .

Auto Update Connection

On updating a default connection, its respective configuration also gets updated.

Auto Update Configuration

In reverse of auto update connection, auto update configuration is also possible.

If you update any component’s configuration property, from Configuration Page, then the component’s default connection will also be auto updated.

For example: Updating RabbitMQ host URL configuration will auto update RabbitMQ Default connection.

auto_config

Manage Workspace Connections

To navigate to the Workspace Connections page, the user can click on the Connections feature which is available in the workspace menu.

The users with privilege to create connections can create new connections at the workspace level in the same manner as it is explained in the manage superuser connections topic. To know more, see Manage Superuser Connections.

Note:

- Unique names must be used to create new connections inside a Workspace for similar component types. User will get notified in the UI if the specified connection name already exists.

- The visibility of default connections and the connections created by Superuser at any Workspace level is controlled by the Superuser.

- The connections created in a Workspace can be differentiated by the Workspace and Owner name in the list. The superuser created connections will appear in the list with the Workspace and Owner name as Superuser.

- Connections listed in a workspace can be used to configure features like Datasets, Pipelines, Applications, Data Validations, Import Export Entities & Register Entities inside a Project. While using the connections for above listed features, the superuser connections can be differentiated from other workspace created connections by a suffix, “global” which is given after the connection name.

PipelineConnection

- Connections will not be visible and cannot be consumed outside of the workspace in which they are created.

Register Cluster

The user can register a desired cluster by utilizing the Register Cluster option. It can be done either by uploading a valid embedded certificate within a config file, or by uploading config file and certificates separately during registration process. The cluster once registered can be utilized across all workspaces while configuring a sandbox.

Currently, only Kubernetes clusters can be registered on StreamAnalytix.

Registered Clusters Listing

On the Cluster Configuration listing page the existing clusters will be listed.

Field Name

Description

Name

Name of the registered cluster.

Up Since

Timestamp information about the cluster since the time it is up.

CPU/Memory

The consumed CPU/Memory will be listed here.

Actions

The user can Edit/Unregister the registered cluster(s) information.


Steps to Register Cluster

The user can register a cluster by clicking at the top right + icon.

Configure the cluster by providing the following details:

Field Name

Description

Type

Select type of cluster to be registered, i.e. kubernetes.

Name

Provide a unique name of cluster to be registered.

Ingress URL

Provide ingress URl with accurate server host and port details.

Upload Certificate as

The user has an option to upload a valid cluster configuration file having certificate data (like certificate-authority-data, server, client-certificate-data, client-key-data) definition.

Else, upload configuration and certificates separately.

These are further explained below:

Config with embedded certificate: The user can upload K8 config files with all the certificates embedded in a single zip file.

Upload Config (zip): Kubernetes configuration zip file with all the certificates embedded.

Upload certificate files: The user can upload configuration and certificates separately.

Provide the API Server URL.

Upload the below files:

- Authority Certificate: Upload the API server client authority certificate.

- Client Certificate: Upload the client certificate for API server authentication.

- Client Key: Upload the client key for server API authentication.


The user can TEST the cluster configuration and SAVE.

Upon successful registration, the registered cluster will get added in the listing page.

Register Container Image

The option to register Container Images within StreamAnalytix are provided in the main menu as well as the workspace menu.

When user registers a container image, it will be visible as drop-down options in the sandbox configuration page inside project. These container images (sandbox) can be launched on the preferred container (for example, Kubernetes) to access the desired integrated development environments (Examples: Jupyter Lab, Visual Studio Code, Custom and Default) of the user’s choice on the sandbox.

Default IDE option will only be visible when the Register Container Image option is accessed by superuser via the main menu.

The container images that are registered from the main menu by the superuser can be utilized across all workspaces. Whereas, the container images that are registered from the workspace menu remain private to the specific workspace where it is registered.

Registered Container Images Listing

The container images that are registered will appear on the Registered Images page.

The information and actions displayed for the listed Container Images are explained below:

Field Name

Description

Image Name

Name of the container image registered.

Container Image URI

URI registered on container registry and accessible to the cluster.

Tags

Custom tags added during container image registration.

Actions

The user can Edit/Unregister the registered container image(s).


Steps to Register Container Image

The user can register a container image by clicking at the top right + icon.

Configure the container image by providing the following details:

Field Name

Description

Image Name

Provide a unique name of container to be registered.

Description

Provide container image description.

Tags

Custom tags for the container image can be added.

Image URI

URI registered on container registry and accessible to the cluster must be provided.

IDE

The IDE options that are selected here will be available during Sandbox configuration.

Upload Docker File

An option to upload the docker file for user’s reference.

Upload YAML [Zip]

A zip file containing supported YAML’s for the image can be uploaded.

Once the YAML file is uploaded, the user can view details of the uploaded file by clicking on the View YAML File icon.

Note: To know more about the YAML files upload, see Upload YAML Example given after the table.


Upload YAML Example

Consider the below points for YAML file upload:

•   Upload file with .zip extension.

•   It should directly contain the valid YAML files.

•   Use below expressions to populate YAML fields at runtime during sandbox configuration:

"@{<kind>:<field path>}" - The expression used to refer the specified field from any other YAML file.

Example: In "@{deployment:metadata.name}" expression, the first part "deployment" is kind (i.e., type of YAML) and the next part "metadata.name" is the field that is supposed to be fetched from the specified YAML type.

${value:"<default-value>",label:"<field label>"} - The expression used to display a dynamic field label along with a default value, which is editable.

Example:

${value:"sandbox-<<UUID>>",label:"Enter Sandbox Name"}

Field label will be: Enter Sandbox Name and default value will be: sandbox-A123.

"<<UUID>>" - This expression is used to generate a unique ID for a specific field.

Example:

- name: BASE_PATH

value: "/<<UUID>>"

In the above YAML configuration snippet, the BASE_PATH will always have a unique value generated via the "/<<UUID>>" expression.

Click REGISTER to complete the process. The registered image will appear in the listing page.

Audit Trail

Audit Trail captures and presents all important activities and events in the platform for auditing.

Interaction events include pipeline creation, pipeline inspection, license uploads, test-cases execution, configuration updates, connection updates, notebook creation, model validation and all other interactions possible within StreamAnalytix. ATHomePage2

Audit Trail provides following features to search, view and filter user interaction events in graphical and tabular formats.

Event Search

There are two modes of searching an event, Basic and Advanced.

Basic Search

Events can be searched by providing required parameters inside filter menu on top of Audit Trail page. The search results returned are from all the entities, which is across all workspaces, all types of operations and so on.

In a basic search, following are the options by which you can perform a search.

l Time Range search

l Time Duration search.

l Full Text Search

l Keyword Search

Note: Time Range and Time Duration search is also performed in Advanced Search.

Different filter operations are available which are listed below.

Time Range Search

Provide time intervals by setting Start Date Time and End Date Time filters to get those event interactions which were performed in specified time range.

Default value is 12 hours ago from the current system time.TimeRangeSearch

Click on Set button for it to reflect the selected date time.

TimeRangeSearch2

Duration Based Search

Select Duration option for defining time intervals. Provide duration as integer value with desired time unit. Default duration value is 12 and unit is hours.durationBasedSearch

Possible units are minutes, hours, days and weeks.durationBasedSearch2

Full Text Search

To search events based on keyword or pattern, use Full text search filter option.

Use wildcard (*) to create pattern or provide exact value. System will search events by matching all field values of record.FullTextSearch

Keyword Search

To perform search on any of the field value of the event record, use colon based pattern.

For example- interactionBy:John*, where, interactionBy is one of the field name of event record which specifies the user name who performed that event and John* is value of field interactionBy. keywordsearch

Possible field names which can be used to perform Keyword search are as follows:

Field Name

Description

timestamp

Epoch time, in milliseconds, when the event was performed, e.g., 1560369000

entityType

Entity type on which action was performed, e.g., pipeline, user, work-space, inspect_session, etc.

entityName

Name of the entity on which event was performed, e.g., pipeline name, workflow name, user name etc.

operation-Name

The type of action performed on entity, e.g., create, up-date, delete, list, access, share, revoke, etc.

description

Descriptive message about the event interaction.

interactionBy

User who caused the interaction event.

tenantId

The tenant id in which event occurred.

tenantName

The tenant name associated with tenant id.


Advanced Search

In contrast to basic Full Text Search, you can perform advance search where you need to select list of entities and operations on which you want to search event interactions.AdvanceSearch

Possible entities and operation types will be listed on Entity and Operation drop down filters respectively.AdvanceSearch2

Filter out event interactions based on workspace names. The event occurred in specified work-space will be shown. This filter operation is visible in superuser workspace only.AdvanceSearch3

Visualizing Audit Results

Time-Series Count of Events

Time-series graph represents aggregated count of events occurred within given time range. The counts will be shown on time series graph with fixed time intervals. Each interval is represented by graph bar.

Time intervals are calculated based on given time range values in search query. Bigger the given time range, bigger will be the time interval.

Example: 12 hours as input time range will give event counts of every 30 minutes interval 1 hour as input time range will give event counts of every 1 minute interval.timeseriescount

Graph Panning

It allows you to zoom in on a specific area of the graph, which will drill down the graph further and will show the zoomed selected area. New search request will be placed with zoomed time range boundaries.graphsniffing

After panning and zooming the results, the graph looks as shown below:graphsniffing2

Processing Search Results

Perform following operations on the search results:

Infinite Scroll

Whenever you scroll down in Result table, next bunch of 100 records will be fetched and appended in result table. You can change the default 100 fetch-size from Audit Configuration page.

Scroll has defined an expiry time after which scroll window will be expired. Default scroll expiry time is 5 minutes.

On every subsequent scroll and new search request, scroll expiry time will get reset.InfiniteScroll

Sorting of Events

You can sort results based on field value of events. A new search request will be placed on each sort action and top 100 (default fetch size) sorted results will be shown out of total matched hits.Sortingofevents

Pipeline Audit Trail

This functionality shows the event activities performed on a pipeline.PipelineAuditTrail

Event counts are represented by circles on time series graph.PipelineAuditTrail2

Event interaction will be auto deleted after configured retention time.

Audit Table Glossary

The common terms displayed in the search result table are explained in the table below.

Field

Description

Activity Time

Event time when the event was performed. To see that exact time stamp of event, expand the result row.

Description

Brief description about the event.

Operation   

Operation name which user has performed. It might be Create, Delete, Update, etc.

Entity Name

Name of entity on which event has been performed. Example pipeline created named as ‘SanityRun’ will be mapped as entity name.

Workspace

Name of the workspace under which event got performed (A single user can work on multiple workspaces)

User

User name who performed the operation


Configurations

User can configure events audit mechanism as per the requirement.

Refer the Administration Audit tab for configuration details.

Manage Users and Roles

StreamAnalytix users are the authorized consumers of the application having one or more roles assigned with certain privileges, to execute a group of tasks.

The Manage Users option is provided in the main menu and the workspace menu.

Only the superuser has control over user and role management features that are available in the main menu, whereas both the admin user and the superuser can manage users and roles in the workspace menu.

The other workspace users can only view the role(s) and associated privileges assigned to them in the Manage Users option of the workspace menu.

There are several tabs in the Manage Users feature which are explained in the subsequent topics.

Manage Users (Main Menu)

The tabs that are available for Manage Users option in the main menu are described below.

LDAP

The LDAP tab will only appear when user authentication and authorization is controlled by LDAP or Active Directory in the StreamAnalytix configuration options.

Listing Page

The superuser can assign global or custom created roles to the existing LDAP groups as per the requirement.

The information and actions displayed for the listed LDAP groups are explained below:

Field

Description

Global LDAP Groups

The global LDAP groups that are created by the superuser from main menu, cannot be deleted in the workspace listing of LDAP tab.

Add LDAP Roles

The Add LDAP Roles option is given at the top right side of the page. It can be used to add new entries for the LDAP group listings

StreamAnalytix Role

A drop-down option having list of all the global and custom roles. The desired role for the LDAP group must be selected. Example: Workspace Admin, Data Analyst I, Data Analyst II, Read Only, and so on.

LDAP Group

Search or enter the LDAP group(s) name that must be assigned with the selected StreamAnalytix Role.

Action

Option to remove the role assignment done for any specific LDAP group.


Once the role assignment for required LDAP groups is done, click on the VALIDATE option to cross-check the LDAP group name and SAVE to register the changes.

Roles

This tab contains the list of the out of box global roles and the custom roles that are created using New Role option.

The out of box Global roles are:

l System Admin

l Workspace Admin

l Data Analyst - I

l Data Analyst - II

l Read Only

Listing Page

The information and actions displayed on the Roles listing page are explained below:

Field

Description

Roles

Name of the roles that are existing in StreamAnalytix.

Type

Type of role. Global if created from main menu and Custom if created from the workspace menu.

Privileges

The number of privileges that are assigned to the particular role.

The assigned privileges can be viewed by clicking on the value displayed.

Users

The number of users that are assigned with the particular role.

The list of users assigned with the particular role can be viewed by clicking on the value displayed.

Actions

Options to edit or delete the listed role(s).

Note: The out of box roles cannot be deleted.


Add Roles

The superuser can create new roles using the Add New Role option given on the top right side of the Roles tab.

The configuration details for creation of a new role are described in the table given below:

Field

Description

New Role

Role Name

Unique name of the new role to be created.

Load Privilege by Users

Privilege(s) of the existing user(s) selected will instantly load in the Set Privilege section. For any privilege conflicts, load preference is given to Deny > Allow > Revoke.

Load Privilege by Roles

Privilege(s) of the existing role(s) selected will instantly load in the Set Privilege section. For any privilege conflicts, load preference is given to Deny > Allow > Revoke.

Description

Optional description can be provided.

Set Privilege

Select and set the role privilege(s), or instantly load privilege(s) of the existing role(s) and user(s) with configuration options explained above. Possible options to set privileges are:

Revoke: Set by default. User(s) assigned with this role will not be able to perform the revoked action(s).

Allow: User(s) assigned with this role will be able to perform the allowed action(s) unrestricted.

Deny: User(s) assigned with this role will not be able to perform the denied action(s).

Revoke and Allow can be changed to any other privilege as required (by providing direct privilege) when the existing roles are later mapped to user(s). But, Deny privilege(s) strictly remain unchanged.

Workspace

If the role privileges are customized in the Set Privilege section, then the privilege settings for features: Summary, Audit, Project, Register Image and Connections can be done in the Workspace section.

Project

If the role privileges are customized in the Set Privilege section, then the privilege settings for features: Models, Register Entities, Data Validation, Workflow, Version Control, Import Export Entities, Sandbox, Pipeline, Processor Group, Notebook Environment, Dataset and Applications can be done in the Project section.


Once the required privilege assignment is done for the new role, click on the CREATE option to register the role in StreamAnalytix.

Users

This tab contains the list of the users that are registered with StreamAnalytix.

The options available on this tab will be different for LDAP configured user management. The description clearly states the options that will be only visible when StreamAnalytix Metastore configuration is used for user management.

Listing Page

The information and actions displayed on the Users listing page are explained below:

Field

Description

Name

Name of the existing user.

Workspace Name

Name of the workspace for which the user is assigned.

Email Id

The registered email id of the user.

Note: Only applicable for StreamAnalytix Metastore configuration.

Assigned Roles

Type of role that is assigned to the user.

Actions

The actions that can be performed are:

l Enable/disable user

l Edit user details

l Edit assigned privileges

l Delete user

Note: Only applicable for StreamAnalytix Metastore configuration.


Add Users

Note: Add users option is only applicable for StreamAnalytix Metastore configuration user management.

The superuser can create new users using the New User option given on the top right side of the Users tab.

The configuration details for creation of a new user are described in the table given below:

Field

Description

New User

The new user configuration section is divided in two parts, namely, Add Roles and Add Users. The configuration options for both parts are explained below.

Add Roles

This is an optional part.

If the new user has to be assigned with any of the existing roles, it is possible to do so with Add Roles option.

Select Roles

Privilege(s) of the role(s) selected will instantly load in the Set Privilege section. For any privilege conflicts, load preference is given to Deny > Allow > Revoke.

Deny privilege(s) strictly remain unchanged, whereas any additional privilege changes will appear as ‘Direct Privileges’.

Set Privilege

Revoke: Set by default. User(s) assigned with this role will not be able to perform the revoked action(s).

Allow: User(s) assigned with this role will be able to perform the allowed action(s) unrestricted.

Deny: User(s) assigned with this role will not be able to perform the denied action(s).

Workspace

The role privileges for features: Summary, Audit, Project, Register Image and Connections will be visible in the Workspace section.

Project

The role privileges for features: Models, Register Entities, Data Validation, Workflow, Version Control, Import Export Entities, Sandbox, Pipeline, Processor Group, Notebook Environment, Dataset and Applications will be visible in the Project section.

Add User

Users selected will get assigned with the privilege(s), if configured in the Add Roles page.

User Name

Unique name for the new user must be provided.

Email Id

Email Id of the new user must be provided.

Password

Password for the new user must be provided.

Confirm Password

Re-type the password.

Language

Choose the language, from English (US) and German (DE).

Configure Artifactory

The user can configure artifactory by checkmarking the checkbox. Provide Artifactory URL, Username and Password.


Once the required configuration is done for the new user, click on the CREATE option to register the user in StreamAnalytix.

Manage Users (Workspace Menu)

The tabs that are available for Manage Users option in the workspace menu are described below.

My Roles

The users can verify their assigned privileges on the My Roles tab.

The information displayed on the My Roles tab are explained below:

Field

Description

Roles

Privileges assigned to the user for the role(s) selected from this list can be viewed in the Privilege section.

Privilege

Revoked: You will not be able to perform the revoked action(s). Request your System Administrator to allow any revoked privilege(s) that you may need.

Allowed: You will be able to perform the allowed action(s) unrestricted.

Denied: You will not be able to perform the denied action(s).

Workspace

The role privileges for features: Summary, Audit, Project, Register Image and Connections will be visible in the Workspace section.

Project

The role privileges for features: Models, Register Entities, Data Validation, Workflow, Version Control, Import Export Entities, Sandbox, Pipeline, Processor Group, Notebook Environment, Dataset and Applications will be visible in the Project section.


LDAP

The LDAP tab on the workspace listing is divided in two parts, Global LDAP Groups and Custom LDAP Groups.

The Global LDAP Groups will contain the list of LDAP groups and the roles assigned by the superuser. They cannot be deleted from the Actions column.

The Custom LDAP Groups will contain the list of LDAP groups and roles assigned at the Workspace level. They can be deleted from the Actions column.

The steps and configuration to add custom LDAP groups and roles is same as mentioned in the table for Global LDAP groups which is given in Manage Users (Main Menu) topic.

Roles

The Roles tab contains the list of Global roles (created by superuser through the main menu) and custom roles (added through the workspace menu).

The Global roles listed on this tab cannot be deleted from the Actions column.

The steps and configuration to add roles is same as mentioned in the Manage Users (Main Menu) topic.

Users

The Users tab contains the list of users.

The options and functionality are same as described in the Users section of the Manage Users (Main Menu) topic.

Groups

A number of users that are registered in StreamAnalytix can be combined in a group to manage assignment of privileges for the entire group from Groups tab.

Listing Page

The information and actions displayed on the Groups listing page are explained below:

Field

Description

Name

Name of the group created.

Members

Total count of the users in the group.

Roles

Type of roles assigned to the group.

Actions

Options to edit or delete the group.


Add Groups

The groups can be created using the New Group option given on the top right side of the Groups tab.

The configuration details for creation of a new group are described in the table given below:

Field

Description

New Group

The New Group section is divided in two parts, Users and Set Privileges.

Each of them is explained in detail below.

Users

Any number of users that exist in StreamAnalytix can be registered using the Add Users to Group option.

Either the users can be added from the given list or a CSV file having the list of users can be uploaded and validated.

Set Privileges

Select Roles

Privilege(s) of the role(s) selected will instantly load in the Set Privilege section.

For any privilege conflicts, load preference is given to Deny > Allow > Revoke.

Deny privilege(s) strictly remain unchanged, whereas any additional privilege changes appear as ‘Direct Privileges’ on the listing page.

Set Privileges as Role

For the privilege(s) that are selected and set, a new role can be created using this option.

If user clicks on SET PRIVILEGES AS ROLES option, a new window will pop up with options to provide the role name and description and thereby register the role.

Set Privilege

Select and set the role privilege(s), or instantly load privilege(s) of the existing role(s) and user(s) with configuration options explained above. Possible options to set privileges are:

Revoke: Set by default. User(s) assigned with this role will not be able to perform the revoked action(s).

Allow: User(s) assigned with this role will be able to perform the allowed action(s) unrestricted.

Deny: User(s) assigned with this role will not be able to perform the denied action(s).

Revoke and Allow can be changed to any other privilege as required (by providing direct privilege) when the existing roles are later mapped to user(s). But, Deny privilege(s) strictly remain unchanged.

Workspace

If the role privileges are customized in the Set Privilege section, then the privilege settings for features: Summary, Audit, Project, Register Image and Connections can be done in the Workspace section.

Project

If the role privileges are customized in the Set Privilege section, then the privilege settings for features: Models, Register Entities, Data Validation, Workflow, Version Control, Import Export Entities, Sandbox, Pipeline, Processor Group, Notebook Environment, Dataset and Applications can be done in the Project section.


Template

To change outline of any existing connection, components or pipelines, a developer has to manually edit JSON files residing in /conf/common/template directory of StreamAnalytix bundle. Templates allow you to update these from UI. You can create many versions of them and switch to any desired version at any point. The changes in the outline of that component will be reflected immediately.

Components

Components tab allows you to edit the JSON, view the type of component for Spark engine.

component

Field

Description

Component Name

The name of the component.

Type

Type of component. For e.g. Processor, Data Source, Emitter or Analytics.

Action

Edit the Json or Version


When you edit any component, Version, Date Modified and Comments added are viewable.

component_edit_pic

Connections

Connection tab allows you to edit the JSON and create as many versions as required.

comm

Field

Description

Connection Name

The name of the component.

Action

Edit the JSON or Version


When you edit any Connection, Version, Date Modified and Comments added are viewable.activemq

Help

This option redirects the user to StreamAnalytix support portal.

StreamAnalytix Support