Let's check out the data shown in the standard histogram view.
A histogram is a visual representation of the data distribution in a specific column within a data target (such as a Dataframe or SQL table).
The horizontal axis shows a scale with equal bins between the minimum and maximum values. All data points within a column are distributed between these bins on the horizontal scale. The labels on the horizontal axis denote a range of values from the available minimum to the maximum value.
The vertical axis shows a scale with a count of values fitting each bin.
The histogram view also provides the following data profiling stats:
- Minimum and maximum values
- Total count of all analyzed values
- The mean value
- Standard deviation value
- Distinct values count
- Total count of all not Null values
- Total count of all Null values
- 25, 50, and 70 percentile ranks
For histograms with categorical data, you can also change the default sorting by frequency to sorting by alphabet.
You can also visualize specific Histogram values via Task Metrics screens
- In the Databand UI, open a pipeline run.
- From the pipeline, select a task that you want to preview.
- To view all histograms available for the selected task, perform one of the following steps:
- Click the Histograms button on a top panel.
- From the Inputs or Outputs section, select a dataset that you want to preview, and then click the Histograms icon.
Updated 8 months ago