Data Quality

👉

All users can access Data Quality details for insights, but only data asset owners can take actions like profiling or deleting.

Actions Available

There are various actions that can be performed on each tab of the view data asset, in addition to the listing page.

Edit Data Asset Name: Modify the name of the data asset to better suit your needs.

Additional Options: Access a range of actions including deletion, utilization in Ingestion or ETL Applications, marking as a favorite, and configuring the data asset.

Start Profiling: Initiate data profiling to gain insights into your data’s characteristics and quality.

Back to Data Assets Listing: Return to the list of all data assets for an overview of your data

Data Quality

The data quality of the source is measured to assess the accuracy, completeness, consistency, and overall reliability of the data asset.

👉

To get the data quality of a data asset, its profile run must be done.

If the data quality is not available for a data asset, the below message will be shown:
“Data Quality is not available for this Data Asset. Do a profile run (use the play button at the top-right section) to calculate the overall data quality.”
If a new verion is created for a data asset, but its profile run is not done, the earlier version’s data quality for which the profile run has been done will get displayed.
In order to get the data quality of the latest version, a profile run should be done.

💡

The data quality can vary based on factors like data source, collection methods, and intended use.

It is divided into the following sections:

Poor: Falls between 0-25% of the overall data quality score. A poor data asset cannot be trusted due to inaccuracies, inconsistencies, or a lack of credibility.

Average: Falls between 25-50% of the overall data quality score. An average data asset is insufficient in terms of quality, quantity, or relevance and lacks the necessary attributes to support effective analysis.

Fair: Falls between 50-75% of the overall data quality score. A fair data asset meets acceptable standards of accuracy, and is free from major errors and inconsistencies.

Good: Falls between 75-90% of the overall data quality score. A good data asset is accurate, and can be trusted for analysis or decision-making.

Excellent: Falls between 90-100% of the overall data quality score. A data asset that is exceptionally good and of high quality. It signifies data that stands out due to its quality and reliability.

The percentage change in data quality is explicitly shown after the latest profiling of data assets. It can go down, up, or remain unchanged from the last percentage.

Data Completeness

A comprehensive source data analysis is conducted to ensure a reliable single source of truth.

👉

To get the data completeness of a data asset, its profile run must be done.

If the data completeness is not available for a data asset, the below message will be shown:
“Data Completeness is not available for this Data Asset. Do a profile run (use the play button at the top-right section) to calculate the data completeness.”
If a new verion is created for a data asset, but its profile run is not done, the earlier version’s data completeness for which the profile run has been done will get displayed.
In order to get the data completeness of the latest version, a profile run should be done.

💡

The data completeness can vary based on factors like data source, collection methods, and intended use.

Data completeness is expressed as a percentage and measured based on the following factors:

Accuracy: Indicates the proportion of accurate versus inaccurate data (including redundant and null rows).

Uniqueness: Determines how much of the data is unique versus duplicated.

Completeness: Calculates the proportion of complete versus incomplete data (including null rows and empty strings).

Profile

The profile section displays the assigned cluster and data asset scheduling details.

Configure Profiling

Option to select the data asset version on which the profiling should run and configure deployment settings on Gathr.

Schedule Profiling

Scheduling profile runs enables you to automate the data asset profiling at a required frequency, reducing the need for manual intervention.

👉

The Profile Scheduling option will not appear for File Upload and Sample sources.

Once you click on Profile Scheduling, you will have the option to schedule a profile run frequency, and once it is scheduled, an UN-SCHEDULE and RESCHEDULE button will be available to manage scheduling needs.

Profile History

A tabular form of profile history is shown with details of the Data Asset profile:

Field Name	Description
Version	Version number of the data asset.
Status	The current state of the data asset.
Start Time	The timestamp record when the data asset profile run was started.
End Time	The timestamp record when the data asset profile run stopped.
Number of Columns	Number of columns in the data asset.
Number of Records	Number of records in the data asset.
Last Profile Run	The date and time when the last profile run got completed successfully.
Credit Points Used	Total credit points consumed for the data asset profiling.
Cluster Type	The cluster details assigned to the data asset for profile run.
Action	Option to view the data asset’s profiling results.

View Run Profile

The Profile Run window shows various statistical insights on each variable like Avg, Min, Max, Percentile etc.

You can also click on the Frequency Distribution Details Label to see the frequency distribution corresponding to every variable.

Frequency Distribution Details:

Frequency distribution of any attribute/field is the count of individual values for that field in whole data asset.

For Numeric type fields, it is shown in terms of counts only.

For String/Date/Timestamp, you can view the frequency/counts along with its percentage.

The Frequency Distribution Graph is generated for every variable in the data asset.

You can filter or sort variables for which you need to see the data profile.

If you have any feedback on Gathr documentation, please email us!

Data Quality

Actions Available #

Data Quality #

Data Completeness #

Profile #

Configure Profiling #

Schedule Profiling #

Profile History #

View Run Profile #

Actions Available

Data Quality

Data Completeness

Profile

Configure Profiling

Schedule Profiling

Profile History

View Run Profile