As more businesses rely on their data products, producing reliable, consistent, and quality data outputs has become critical for many teams.
You can leverage Databand to track, alert on, and investigate problems in data quality, integrity, and access. Databand provides visibility into this information by collecting usage and profiling information about your datasets, as well as providing you the ability to custom define data quality metrics that will be sent to Databand's tracking system.
When you integrate Databand with your pipelines, Databand can automatically gather metadata about your data sets in use and store that info for analysis. Examples include:
- Data schema info and previews from DataFrames, SQL queries, and file types like CSV or Parquet
- Data distributions, profiles, and histograms
- User access information (who is running processes on a given data set or file)
All automatically extracted metadata can be granularly defined and throttled, according to data privacy requirements.
Every organization's data is different. You are also free to define custom metrics about your data sets through Databand's API. This of course can be used in addition to, or instead of, automatically extracted metadata. Examples of user generated metrics include specialized outlier checks or data completeness scores.
It's easy to get lost on who is working with what data files. This is a problem for the obvious governance reasons, but also for data quality when you need to know how issues in data cascade across an organization. As you integrate Databand, the product can track which files or tables are being processed and attribute those to specific pipelines and pipeline owners.
Updated 3 days ago