Core Concepts
Explore the essential terminology and concepts you’ll encounter while using Gathr. These core concepts will help you navigate the platform more effectively and make the most of its capabilities.
Account & User Management
Account
A Gathr account is your organization’s entry point to the platform, created during the sign-up process. Your account contains all your projects, applications, connections, and user access settings.
Management: Access your account settings through the User Profile option in the main menu.
User
A user is an individual with access to Gathr who can create and manage applications based on their assigned permissions. Each user has unique credentials and can be assigned specific roles within the platform.
Organization Administrator
The organization administrator is the primary user who initially signs up for Gathr. This role has full administrative privileges, including:
Managing user access and permissions
Setting up compute environments
Configuring billing information
Sharing projects with other users
Subscription Levels
Gathr offers different subscription tiers to meet various business needs:
14-Day Free Trial: Explore Gathr’s features before committing
Gathr Advanced Plan: For small to medium teams with standard data processing needs
Gathr Business Plan: For enterprises with advanced requirements, including custom compute environments
Your current subscription level is displayed in the main menu.
Credits
Credits are the currency used within Gathr to run applications. Different operations consume varying amounts of credits based on:
Application type (Ingestion, CDC, ETL)
Compute resources used
Runtime duration
Data volume processed
Credits are allocated based on your subscription plan.
Navigation & Interface
Landing Page
Upon signing in, the landing page offers quick access to frequently used features
Main Menu
The sidebar menu is your primary navigation tool, providing access to:
Projects
Applications
Connections
User settings
Documentation and support
Platform Architecture
Engine
Gathr offers two processing engines to power your data applications:
Gathr Engine: The default engine integrated within the Gathr platform
Optimized for Gathr applications
Managed by Gathr with automatic scaling
No additional setup required
Databricks Engine: Integration with Azure Databricks
Leverage existing Databricks clusters
Access to Databricks-specific features
Requires compute environment setup
You can view the engine status in the main menu. Having a connected engine is crucial to run Gathr applications.
If your engine shows as disconnected, contact Gathr support at saas-support@gathr.ai.
Cluster
Clusters are the compute resources that execute your applications. Options include:
Free Tier Cluster: Available during the 14-day trial
Managed Clusters: Available in sizes ranging from Extra Small to Large, depending on the subscription plan.
Extra Small: Consumes 1 credit per minute
Small: Consumes 2 credits per minute
Medium: Consumes 4 credits per minute
Large: Consumes 8 credits per minute
GPU (Powered by NVIDIA RAPIDS): Consumes 10 credits per minute
Custom Clusters: In Business Plan, connect your own compute resources
Cluster selection affects performance, scalability, and credit consumption.
Projects
Projects are organizational containers that help you:
Group related applications
Manage access control
Organize connections and resources
Facilitate pipeline promotion across environments
Each user starts with a default project and can create additional projects as needed.
Application Types
Data Ingestion Applications
Data Ingestion applications move data from sources to targets with minimal transformation. Key features include:
Batch Ingestion: Process data in defined intervals
Incremental Loading: Capture only new or changed data
Change Data Capture (CDC) Applications
CDC applications track and replicate changes from database sources, enabling:
Real-time data replication
Database synchronization
ETL Applications
ETL (Extract, Transform, Load) applications provide comprehensive data processing with:
Complex transformations
Data quality checks
Business logic implementation
Data enrichment capabilities
Schema evolution to handle changes to data structure
Workflows
Workflows are reusable process templates that:
Orchestrate multiple applications
Define execution sequences
Handle dependencies
Enable complex data pipelines
Application Components
Connections
Connections are authenticated links to external systems, enabling Gathr to:
Read from data sources
Write to data targets
Access cloud services and resources
Connections must be configured and tested before use in applications.
Data Sources
A data source is any data origin that Gathr can read from, including:
Databases (SQL, NoSQL)
Cloud storage (S3, Azure Blob, GCS)
APIs and web services
Streaming platforms (Kafka, Kinesis)
File systems
Transformations
Transformations are operations that modify data between source and target, including:
Data type conversions
Filtering and aggregation
Joining multiple datasets
Enrichment and data quality checks
Custom business logic
Functions
Gathr provides various function types to manipulate data:
Date Functions: Format and manipulate date/time values
String Functions: Text processing and pattern matching
Math Functions: Numerical calculations and statistics
Array Functions: Process lists and collections
Lookup Functions: Reference data from other sources
Miscellaneous Functions: Special-purpose utilities
Targets
A target is the destination where processed data is delivered, such as:
Data warehouses (Snowflake, Redshift)
Data lakes (S3, ADLS)
Databases (PostgreSQL, MySQL)
Analytics platforms
Visualization tools
Generative AI Capabilities
Gathr integrates advanced AI capabilities throughout the platform to enhance productivity and innovation:
Gathr IQ
Gathr IQ is an AI assistant that helps automatically design ETL applications using natural language inputs:
Automatically generate transformation tasks
Translate business requirements into technical implementations
Accelerate application development
Reduce technical complexity
AI-Powered Processors
Gathr offers several processors with AI assistance for code generation:
Expression Evaluator: Generate SparkSQL functions using natural language
Expression Filter: Create filtering conditions with plain English
Python Processor: Auto-generate Python code for transformations
Scala Processor: Produce Scala code from natural language descriptions
SQL Processor: Convert English queries into SQL statements
Vector Database Integration
Gathr supports vector databases for AI-powered similarity search and retrieval:
Redis: For real-time vector similarity search
Milvus: For large-scale vector data management
Pinecone: For high-performance similarity lookups
MLflow Integration
Connect your MLflow instance to Gathr to:
Access registered ML models
Incorporate models into data pipelines
Perform inference within Gathr applications
Manage the ML lifecycle
Data Intelligence
Data Intelligence enables natural language querying of your data sources:
Natural Language Data Querying
Ask questions about your data in plain English:
“How many customers were acquired last month?”
“What’s the revenue trend by region for Q1?”
“Which products have the highest profit margin?”
Insight Types
Data Intelligence provides multiple insight formats:
Descriptive: Textual answers to business queries
Graphical: Visual representations of data insights
SQL Queries: Visibility into the generated SQL
Customize analytical behavior with:
Business rules
Domain-specific knowledge
Custom recommendations
Analysis guidelines
Metadata Generation
AI-assisted metadata generation for:
Table descriptions
Column definitions
Relationship explanations
Usage examples
Data Assets Management
Gathr provides tools to manage and govern your data assets:
Data Asset Registry
A central repository for:
Discovering available data
Understanding data structure
Tracking data lineage
Managing metadata
Versioning
Track changes to data assets with:
Version history
Change tracking
Metadata Management
Maintain comprehensive metadata including:
Business definitions
Technical specifications
Data quality metrics
Usage statistics
Data Lineage
Track data flow through your systems:
Source-to-target mapping
Transformation history
Security & Governance
Gathr implements robust security and governance features:
Access Control
Manage permissions at multiple levels:
Organization-level controls
Project-level permissions
Application-specific access
Resource-based policies
Sharing Models
Share resources with other users:
Project sharing
Application sharing
Connection sharing
Data asset sharing
Connection Security
Secure your data connections with:
Credential management
Encryption
Role-based access
Audit logging
Audit Trails
Track activities within the platform:
User actions
System operations
Resource modifications
Access attempts
Operational Concepts
Billing
Billing refers to the invoicing process for your Gathr subscription, based on:
Your subscription plan (Advanced or Business)
Credit consumption
Additional services or support
Metering
Metering tracks your platform usage patterns, showing:
Credit utilization over time
Application runtime statistics
Resource consumption trends
This information helps optimize your Gathr implementation and manage costs.
Support & Help
Gathr offers multiple support channels:
Email Support: SaaS Support
Live Chat: Available through the Help option in the interface
Documentation: Comprehensive guides and tutorials
Related Concepts
If you have any feedback on Gathr documentation, please email us!