Advanced Metrics (Tracking SQL)

Tracking your SQL workflows with DBND

This guide will showcase how DBND can be used for tracking important data metrics, whether they're discovered with SQL or any other external engine. In this guide, you'll extract and compute data metrics with SQL and log them with DBND.

Before beginning with this guide, you will first need to install dbnd and SQLAlchemy.

pip install dbnd SQLAlchemy

The database used will be a local PostgreSQL database containing the following data:

PostgreSQL DataPostgreSQL Data

PostgreSQL Data

The table columns are:

  • index - The indices of PostgreSQL table.
  • invoice_ID - Unique ID generated for each transaction.
  • transaction_amt - Amount exchanged in the transaction.
  • transaction_time - POSIX timestamp of transaction.
  • transaction_fulfilled - Whether the transaction has been fulfilled.

Logging Metrics from SQL Queries

  • First, you need to import the requirements and create a SQLAlchemy engine to execute our queries. * Next, you will find the number of transactions (in transaction_amt column) that are greater than or equal to a specific amount, for example, 5000.
  • After executing our query, you can log this data metric using DBND's log_metric function. You will also log the mean average of the transaction amounts greater than or equal to 5000.

DBND's log_metric function is versatile with the logged data type. It can be used to track collections and complex data types. For example, you can also log all the rows where transaction_amt is greater than or equal to 5000 and the shape of the Postgres table as a tuple.

# Python 3.6.8
from sqlalchemy import create_engine
from dbnd import log_metric

conn_string = 'postgresql+psycopg2://dbnd_postgres:[email protected]/dbnd_postgres'
pg_db = create_engine(conn_string)

minimum_amount = 5000

# logging count of rows where transaction_amt >= 5000
num_large_transactions_query = f"""SELECT COUNT(transaction_amt) FROM transactions_data WHERE transaction_amt >= {minimum_amount};"""
results = pg_db.execute(num_large_transactions_query) 
num_large_transactions = results.fetchone()[0]
log_metric("Number of large transactions(>= 5000)", num_large_transactions)

# logging average of rowsw where transaction_amt >= 5000
avg_large_transactions_query = f"""SELECT AVG(transaction_amt) FROM transactions_data WHERE transaction_amt >= {minimum_amount};"""
results = pg_db.execute(avg_large_transactions_query)
avg_large_transactions = results.fetchone()[0]
log_metric("mean large transactions", avg_large_transactions) 

# logging all rows where transaction_amt >= 5000
large_transactions_query = f"""SELECT * FROM transactions_data WHERE transaction_amt >= {minimum_amount};"""
results = pg_db.execute(large_transactions_query)
large_transactions = [row for row in results]
log_metric("large transactions rows", large_transactions) 

# logging table shape  
num_rows_query = f"""SELECT COUNT (*) FROM transactions_data""" 
results = pg_db.execute(num_rows_query)
num_rows = results.fetchone()[0]
num_cols_query = f"""SELECT COUNT(column_name) FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'transactions_data';"""
results = pg_db.execute(num_cols_query)
num_cols = results.fetchone()[0]

log_metric("table shape", (num_rows, num_cols))
SELECT COUNT(transaction_amt) FROM transactions_data WHERE transaction_amt >= 5000;
SELECT AVG(transaction_amt) FROM transactions_data WHERE transaction_amt >= 5000;
SELECT * FROM transactions_data WHERE transaction_amt >= 5000;
SELECT COUNT (*) FROM transactions_data;
SELECT COUNT(column_name) FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'transactions_data';

Executing the Scripts

🚧

Make sure that DBND Tracking is enabled

Tracking Python

Use Python to execute the script normally:

python tracking_sql_data.py

Output from tracking SQL

The output for the first script will have all the metrics logged. The output will be similar to the output below:

==================== 
= Running Databand!
 TRACKERS   : ['console']
 TASKS      : total=1

==================== 

INFO  tracking_sql_sql_based.py__6592c67f67 - Metric logged: Number of large transactions(>= 5000)=59
INFO  tracking_sql_sql_based.py__6592c67f67 - Metric logged: mean large transactions=9234.756779661016
INFO  tracking_sql_sql_based.py__6592c67f67 - Metric logged: large transactions rows=[(0, 'b6589fc6ab', 8605.19, 1609268915.350321, 0), (4, '1b64538924', 9895.93, 1608532450.936673, 1), (5, 'ac3478d69a', 9100.93, 1607337571.496427, 0), (6, 'c1dfd96eea', 11966.93, 1609333775.865714, 1), (8, 'fe5dbbcea5', 5712.27, 1607005440.311549, 0), (11, '17ba079149', 6821.97, 1608246364.187467, 1), (14, 'fa35e19212', 8744.28, 1608435414.527918, 1), (15, 'f1abd67035', 7348.49, 1609123983.843929, 1), (17, '0716d9708d', 7938.02, 1608793310.466077, 1), (18, '9e6a55b6b4', 10866.32, 1608067565.615092, 1), (20, '91032ad7bb', 10818.29, 1607411956.84893, 0), (21, '472b07b9fc', 9386.1, 1608287414.649073, 1), (24, '4d134bc072', 12831.9, 1607928988.334441, 1), (28, '0a57cb53ba', 11372.59, 1608144011.832888, 0), (29, '7719a1c782', 8130.35, 1607860202.245157, 0), (30, '22d200f867', 10835.7, 1606800489.896977, 0), (31, '632667547e', 9806.29, 1609162913.412595, 0), (32, 'cb4e5208b4', 7232.6, 1608872018.325546, 0), (33, 'b6692ea5df', 13043.42, 1607761880.822167, 0), (34, 'f1f836cb4e', 5135.19, 1607032350.512745, 0), (35, '972a67c481', 7442.91, 1608224422.998188, 1), (36, 'fc074d5013', 11131.99, 1608771202.485815, 0), (37, 'cb7a1d775e', 8327.12, 1607595284.81299, 0), (38, '5b384ce32d', 11561.62, 1609132168.081323, 1), (39, 'ca3512f4df', 7779.57, 1609154086.87106, 1), (40, 'af3e133428', 9471.65, 1607002903.51992, 1), (48, '64e095fe76', 8555.41, 1607686702.130967, 1), (50, 'e1822db470', 5024.09, 1606970278.722451, 0), (53, 'c5b76da3e6', 12558.46, 1608704912.885338, 0), (54, '80e28a51cb', 8719.69, 1608255831.737304, 1), (55, '8effee409c', 8202.24, 1607035064.815207, 0), (57, '9109c85a45', 9798.24, 1607206455.324834, 1), (59, '5a5b0f9b7d', 5147.44, 1608714640.934448, 1), (60, 'e6c3dd6304', 13261.65, 1607282094.854161, 0), (61, '6c1e671f9a', 8612.81, 1608434415.391349, 1), (62, '511a418e72', 7508.2, 1607662631.676266, 0), (63, 'a17554a0d2', 9206.2, 1606915475.870678, 1), (64, 'c66c65175f', 11310.84, 1606845154.938982, 1), (65, '2a45938070', 10421.68, 1607445722.816128, 0), (71, 'd02560dd9d', 12641.66, 1608294347.759727, 0), (72, 'c097638f92', 11756.62, 1606864214.230547, 1), (74, '1f1362ea41', 8818.16, 1608196663.67103, 1), (75, '450ddec8dd', 5362.6, 1608197118.847317, 1), (76, 'd54ad009d1', 12264.43, 1607285101.379916, 1), (77, 'd321d6f7cc', 6203.45, 1607690904.747362, 1), (80, 'b888b29826', 7566.97, 1608234017.086775, 1), (82, '76546f9a64', 7875.78, 1607349955.2563, 0), (83, '7d7116e23e', 12041.98, 1608822886.869622, 1), (84, 'be461a0cd1', 5412.72, 1608642507.837087, 1), (86, '3c26dffc8a', 13368.24, 1606997304.321356, 0), (87, 'e62d7f1eb4', 6877.45, 1607159997.837802, 1), (91, '4cd66dfabb', 8445.84, 1607950499.615994, 0), (92, '8ee51caaa2', 10635.54, 1607525326.751903, 1), (93, '08a35293e0', 5715.43, 1608266291.093912, 1), (95, '8e63fd3e77', 5176.22, 1608375642.515072, 1), (96, '6fb84aed32', 13349.4, 1607067348.342952, 1), (97, '812ed4562d', 7137.98, 1607894900.155571, 0), (98, '31bd9b9f5f', 13016.32, 1608229964.756917, 0), (99, '9a79be611e', 11549.29, 1608299479.108404, 0)]
INFO  tracking_sql_sql_based.py__6592c67f67 - Metric logged: table shape=(100, 5)
INFO  tracking_sql_sql_based.py__6592c67f67 - Task tracking_sql_sql_based.py has been completed!

==================== 
= Your run has been successfully executed!
 TRACKERS   : ['console']

Did this page help you?