Azure SQL Ingestion Source

Azure SQL data source reads the objects from Azure SQL and SQL Server database.

Data Source Configuration

Fetch From Source/Upload Data File

For designing the application, you can either fetch the sample data from the Azure SQL source by providing the data source connection details or upload a sample data file in one of the supported formats to see the schema details during the application design.

If Upload Data File is selected to fetch sample data, provide the below details.

File Format

Select the sample file format (file type) depending on the data type.

Gathr-supported file formats for Azure SQL data source are CSV, JSON, TEXT, Parquet and ORC.

For CSV file format, select its corresponding delimiter.

Header Included

Enable this option to read the first row as a header if your Azure SQL data is in CSV format.

Upload

Please upload the sample file as per the file format selected above.

👉

Make sure that the file size does not exceed 10 MB.

If Fetch From Source is selected, continue configuring the data source.

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Azure SQL earlier. Or create one as explained in the topic- Azure SQL Connection →

Use the Test Connection option to ensure that the connection with the Azure SQL channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.

Schema Name

Source Schema name for which the list of table will be viewed.

Table Name

Source table name to be selected for which you want to view the metadata.

Add Configuration: Additional properties can be added using this option as key-value pairs.

More Configurations

Query

Hive compatible SQL query to be executed in the component.

Design Time Query

Query used to fetch limited records during Application design. Used only during schema detection and data reload.

Enable Query Partitioning

This enables parallel reading of data from the table. It is disabled by default.

Tables will be partitioned if this check-box is enabled.

If Enable Query Partitioning is check marked, additional fields will be displayed as given below:

No. of Partitions

Specifies the number of parallel threads to be invoked to partition the table while reading the data.

Partition on Column

This column will be used to partition the data. This has to be a numeric column, on which spark will perform partitioning to read data in parallel.

Lower Bound

Value of the lower bound for partitioning column. This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending on the values.

Upper Bound

Value of the upper bound for partitioning column. This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending on the values.

If Enable Query Partitioning is disabled, then proceed by updating the following field.

Fetch Size

The fetch size determines the number of rows to be fetched per round trip. The default value is 1000.

Schema

Check the populated schema details. For more details, see Schema Preview →

Advanced Configuration

Optionally, you can enable incremental read. For more details, see Azure SQL Incremental Configuration →

If you have any feedback on Gathr documentation, please email us!

Azure SQL Ingestion Source

Data Source Configuration #

Fetch From Source/Upload Data File #

File Format #

Header Included #

Upload #

Connection Name #

Schema Name #

Table Name #

More Configurations #

Query #

Design Time Query #

Enable Query Partitioning #

No. of Partitions #

Partition on Column #

Lower Bound #

Upper Bound #

Fetch Size #

Schema #

Advanced Configuration #