Core Concepts

Explore the essential terminology and concepts you’ll encounter while using Gathr. These core concepts will help you navigate the platform more effectively and make the most of its capabilities.


Account & User Management

Account

A Gathr account is your organization’s entry point to the platform, created during the sign-up process. Your account contains all your projects, applications, connections, and user access settings.

Management: Access your account settings through the User Profile option in the main menu.

User

A user is an individual with access to Gathr who can create and manage applications based on their assigned permissions. Each user has unique credentials and can be assigned specific roles within the platform.

Organization Administrator

The organization administrator is the primary user who initially signs up for Gathr. This role has full administrative privileges, including:

  • Managing user access and permissions

  • Setting up compute environments

  • Configuring billing information

  • Sharing projects with other users

Subscription Levels

Gathr offers different subscription tiers to meet various business needs:

  • 14-Day Free Trial: Explore Gathr’s features before committing

  • Gathr Advanced Plan: For small to medium teams with standard data processing needs

  • Gathr Business Plan: For enterprises with advanced requirements, including custom compute environments

Your current subscription level is displayed in the main menu.

Credits

Credits are the currency used within Gathr to run applications. Different operations consume varying amounts of credits based on:

  • Application type (Ingestion, CDC, ETL)

  • Compute resources used

  • Runtime duration

  • Data volume processed

Credits are allocated based on your subscription plan.


Landing Page

Upon signing in, the landing page offers quick access to frequently used features

The sidebar menu is your primary navigation tool, providing access to:

  • Projects

  • Applications

  • Connections

  • User settings

  • Documentation and support


Platform Architecture

Engine

Gathr offers two processing engines to power your data applications:

  1. Gathr Engine: The default engine integrated within the Gathr platform

    • Optimized for Gathr applications

    • Managed by Gathr with automatic scaling

    • No additional setup required

  2. Databricks Engine: Integration with Azure Databricks

    • Leverage existing Databricks clusters

    • Access to Databricks-specific features

    • Requires compute environment setup

You can view the engine status in the main menu. Having a connected engine is crucial to run Gathr applications.

If your engine shows as disconnected, contact Gathr support at saas-support@gathr.ai.

Cluster

Clusters are the compute resources that execute your applications. Options include:

  • Free Tier Cluster: Available during the 14-day trial

  • Managed Clusters: Available in sizes ranging from Extra Small to Large, depending on the subscription plan.

    • Extra Small: Consumes 1 credit per minute

    • Small: Consumes 2 credits per minute

    • Medium: Consumes 4 credits per minute

    • Large: Consumes 8 credits per minute

    • GPU (Powered by NVIDIA RAPIDS): Consumes 10 credits per minute

  • Custom Clusters: In Business Plan, connect your own compute resources

Cluster selection affects performance, scalability, and credit consumption.

Projects

Projects are organizational containers that help you:

  • Group related applications

  • Manage access control

  • Organize connections and resources

  • Facilitate pipeline promotion across environments

Each user starts with a default project and can create additional projects as needed.


Application Types

Data Ingestion Applications

Data Ingestion applications move data from sources to targets with minimal transformation. Key features include:

  • Batch Ingestion: Process data in defined intervals

  • Incremental Loading: Capture only new or changed data

Change Data Capture (CDC) Applications

CDC applications track and replicate changes from database sources, enabling:

  • Real-time data replication

  • Database synchronization

ETL Applications

ETL (Extract, Transform, Load) applications provide comprehensive data processing with:

  • Complex transformations

  • Data quality checks

  • Business logic implementation

  • Data enrichment capabilities

  • Schema evolution to handle changes to data structure

Workflows

Workflows are reusable process templates that:

  • Orchestrate multiple applications

  • Define execution sequences

  • Handle dependencies

  • Enable complex data pipelines


Application Components

Connections

Connections are authenticated links to external systems, enabling Gathr to:

  • Read from data sources

  • Write to data targets

  • Access cloud services and resources

Connections must be configured and tested before use in applications.

Data Sources

A data source is any data origin that Gathr can read from, including:

  • Databases (SQL, NoSQL)

  • Cloud storage (S3, Azure Blob, GCS)

  • APIs and web services

  • Streaming platforms (Kafka, Kinesis)

  • File systems

Transformations

Transformations are operations that modify data between source and target, including:

  • Data type conversions

  • Filtering and aggregation

  • Joining multiple datasets

  • Enrichment and data quality checks

  • Custom business logic

Functions

Gathr provides various function types to manipulate data:

  • Date Functions: Format and manipulate date/time values

  • String Functions: Text processing and pattern matching

  • Math Functions: Numerical calculations and statistics

  • Array Functions: Process lists and collections

  • Lookup Functions: Reference data from other sources

  • Miscellaneous Functions: Special-purpose utilities

Targets

A target is the destination where processed data is delivered, such as:

  • Data warehouses (Snowflake, Redshift)

  • Data lakes (S3, ADLS)

  • Databases (PostgreSQL, MySQL)

  • Analytics platforms

  • Visualization tools


Generative AI Capabilities

Gathr integrates advanced AI capabilities throughout the platform to enhance productivity and innovation:

Gathr IQ

Gathr IQ is an AI assistant that helps automatically design ETL applications using natural language inputs:

  • Automatically generate transformation tasks

  • Translate business requirements into technical implementations

  • Accelerate application development

  • Reduce technical complexity

AI-Powered Processors

Gathr offers several processors with AI assistance for code generation:

  • Expression Evaluator: Generate SparkSQL functions using natural language

  • Expression Filter: Create filtering conditions with plain English

  • Python Processor: Auto-generate Python code for transformations

  • Scala Processor: Produce Scala code from natural language descriptions

  • SQL Processor: Convert English queries into SQL statements

Vector Database Integration

Gathr supports vector databases for AI-powered similarity search and retrieval:

  • Redis: For real-time vector similarity search

  • Milvus: For large-scale vector data management

  • Pinecone: For high-performance similarity lookups

MLflow Integration

Connect your MLflow instance to Gathr to:

  • Access registered ML models

  • Incorporate models into data pipelines

  • Perform inference within Gathr applications

  • Manage the ML lifecycle


Data Intelligence

Data Intelligence enables natural language querying of your data sources:

Natural Language Data Querying

Ask questions about your data in plain English:

  • “How many customers were acquired last month?”

  • “What’s the revenue trend by region for Q1?”

  • “Which products have the highest profit margin?”

Insight Types

Data Intelligence provides multiple insight formats:

  • Descriptive: Textual answers to business queries

  • Graphical: Visual representations of data insights

  • SQL Queries: Visibility into the generated SQL

Customize analytical behavior with:

  • Business rules

  • Domain-specific knowledge

  • Custom recommendations

  • Analysis guidelines

Metadata Generation

AI-assisted metadata generation for:

  • Table descriptions

  • Column definitions

  • Relationship explanations

  • Usage examples


Data Assets Management

Gathr provides tools to manage and govern your data assets:

Data Asset Registry

A central repository for:

  • Discovering available data

  • Understanding data structure

  • Tracking data lineage

  • Managing metadata

Versioning

Track changes to data assets with:

  • Version history

  • Change tracking

Metadata Management

Maintain comprehensive metadata including:

  • Business definitions

  • Technical specifications

  • Data quality metrics

  • Usage statistics

Data Lineage

Track data flow through your systems:

  • Source-to-target mapping

  • Transformation history


Security & Governance

Gathr implements robust security and governance features:

Access Control

Manage permissions at multiple levels:

  • Organization-level controls

  • Project-level permissions

  • Application-specific access

  • Resource-based policies

Sharing Models

Share resources with other users:

  • Project sharing

  • Application sharing

  • Connection sharing

  • Data asset sharing

Connection Security

Secure your data connections with:

  • Credential management

  • Encryption

  • Role-based access

  • Audit logging

Audit Trails

Track activities within the platform:

  • User actions

  • System operations

  • Resource modifications

  • Access attempts


Operational Concepts

Billing

Billing refers to the invoicing process for your Gathr subscription, based on:

  • Your subscription plan (Advanced or Business)

  • Credit consumption

  • Additional services or support

Metering

Metering tracks your platform usage patterns, showing:

  • Credit utilization over time

  • Application runtime statistics

  • Resource consumption trends

This information helps optimize your Gathr implementation and manage costs.

Support & Help

Gathr offers multiple support channels:

  • Email Support: SaaS Support

  • Live Chat: Available through the Help option in the interface

  • Documentation: Comprehensive guides and tutorials


Top