[deprecated] Custom Metrics using SQL queries
Tracking your SQL workflows with DBND
This guide will showcase how DBND can be used for tracking important data metrics, whether they're discovered with SQL or any other external engine. In this guide, you'll extract and compute data metrics with SQL and log them with DBND.
Before beginning with this guide, you will first need to install dbnd
and SQLAlchemy
.
pip install dbnd SQLAlchemy
The database used will be a local PostgreSQL database containing the following data:
PostgreSQL Data
The table columns are:
index
- The indices of PostgreSQL table.invoice_ID
- Unique ID generated for each transaction.transaction_amt
- Amount exchanged in the transaction.transaction_time
- POSIX timestamp of transaction.transaction_fulfilled
- Whether the transaction has been fulfilled.
Logging Metrics from SQL Queries
- First, you need to import the requirements and create a SQLAlchemy engine to execute our queries. * Next, you will find the number of transactions (in
transaction_amt
column) that are greater than or equal to a specific amount, for example, 5000. - After executing our query, you can log this data metric using DBND's
log_metric
function. You will also log the mean average of the transaction amounts greater than or equal to 5000.
DBND's log_metric
function is versatile with the logged data type. It can be used to track collections and complex data types. For example, you can also log all the rows where transaction_amt
is greater than or equal to 5000 and the shape of the Postgres table as a tuple.
# Python 3.6.8
from sqlalchemy import create_engine
from dbnd import log_metric
conn_string = 'postgresql+psycopg2://dbnd_postgres:[email protected]/dbnd_postgres'
pg_db = create_engine(conn_string)
minimum_amount = 5000
# logging count of rows where transaction_amt >= 5000
num_large_transactions_query = f"""SELECT COUNT(transaction_amt) FROM transactions_data WHERE transaction_amt >= {minimum_amount};"""
results = pg_db.execute(num_large_transactions_query)
num_large_transactions = results.fetchone()[0]
log_metric("Number of large transactions(>= 5000)", num_large_transactions)
# logging average of rowsw where transaction_amt >= 5000
avg_large_transactions_query = f"""SELECT AVG(transaction_amt) FROM transactions_data WHERE transaction_amt >= {minimum_amount};"""
results = pg_db.execute(avg_large_transactions_query)
avg_large_transactions = results.fetchone()[0]
log_metric("mean large transactions", avg_large_transactions)
# logging all rows where transaction_amt >= 5000
large_transactions_query = f"""SELECT * FROM transactions_data WHERE transaction_amt >= {minimum_amount};"""
results = pg_db.execute(large_transactions_query)
large_transactions = [row for row in results]
log_metric("large transactions rows", large_transactions)
# logging table shape
num_rows_query = f"""SELECT COUNT (*) FROM transactions_data"""
results = pg_db.execute(num_rows_query)
num_rows = results.fetchone()[0]
num_cols_query = f"""SELECT COUNT(column_name) FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'transactions_data';"""
results = pg_db.execute(num_cols_query)
num_cols = results.fetchone()[0]
log_metric("table shape", (num_rows, num_cols))
SELECT COUNT(transaction_amt) FROM transactions_data WHERE transaction_amt >= 5000;
SELECT AVG(transaction_amt) FROM transactions_data WHERE transaction_amt >= 5000;
SELECT * FROM transactions_data WHERE transaction_amt >= 5000;
SELECT COUNT (*) FROM transactions_data;
SELECT COUNT(column_name) FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'transactions_data';
Executing the Scripts
Make sure that DBND Tracking is enabled
Use Python to execute the script normally:
python tracking_sql_data.py
Output from tracking SQL
The output for the first script will have all the metrics logged. The output will be similar to the output below:
====================
= Running Databand!
TRACKERS : ['console']
TASKS : total=1
====================
INFO tracking_sql_sql_based.py__6592c67f67 - Metric logged: Number of large transactions(>= 5000)=59
INFO tracking_sql_sql_based.py__6592c67f67 - Metric logged: mean large transactions=9234.756779661016
INFO tracking_sql_sql_based.py__6592c67f67 - Metric logged: large transactions rows=[(0, 'b6589fc6ab', 8605.19, 1609268915.350321, 0), (4, '1b64538924', 9895.93, 1608532450.936673, 1), (5, 'ac3478d69a', 9100.93, 1607337571.496427, 0), (6, 'c1dfd96eea', 11966.93, 1609333775.865714, 1), (8, 'fe5dbbcea5', 5712.27, 1607005440.311549, 0), (11, '17ba079149', 6821.97, 1608246364.187467, 1), (14, 'fa35e19212', 8744.28, 1608435414.527918, 1), (15, 'f1abd67035', 7348.49, 1609123983.843929, 1), (17, '0716d9708d', 7938.02, 1608793310.466077, 1), (18, '9e6a55b6b4', 10866.32, 1608067565.615092, 1), (20, '91032ad7bb', 10818.29, 1607411956.84893, 0), (21, '472b07b9fc', 9386.1, 1608287414.649073, 1), (24, '4d134bc072', 12831.9, 1607928988.334441, 1), (28, '0a57cb53ba', 11372.59, 1608144011.832888, 0), (29, '7719a1c782', 8130.35, 1607860202.245157, 0), (30, '22d200f867', 10835.7, 1606800489.896977, 0), (31, '632667547e', 9806.29, 1609162913.412595, 0), (32, 'cb4e5208b4', 7232.6, 1608872018.325546, 0), (33, 'b6692ea5df', 13043.42, 1607761880.822167, 0), (34, 'f1f836cb4e', 5135.19, 1607032350.512745, 0), (35, '972a67c481', 7442.91, 1608224422.998188, 1), (36, 'fc074d5013', 11131.99, 1608771202.485815, 0), (37, 'cb7a1d775e', 8327.12, 1607595284.81299, 0), (38, '5b384ce32d', 11561.62, 1609132168.081323, 1), (39, 'ca3512f4df', 7779.57, 1609154086.87106, 1), (40, 'af3e133428', 9471.65, 1607002903.51992, 1), (48, '64e095fe76', 8555.41, 1607686702.130967, 1), (50, 'e1822db470', 5024.09, 1606970278.722451, 0), (53, 'c5b76da3e6', 12558.46, 1608704912.885338, 0), (54, '80e28a51cb', 8719.69, 1608255831.737304, 1), (55, '8effee409c', 8202.24, 1607035064.815207, 0), (57, '9109c85a45', 9798.24, 1607206455.324834, 1), (59, '5a5b0f9b7d', 5147.44, 1608714640.934448, 1), (60, 'e6c3dd6304', 13261.65, 1607282094.854161, 0), (61, '6c1e671f9a', 8612.81, 1608434415.391349, 1), (62, '511a418e72', 7508.2, 1607662631.676266, 0), (63, 'a17554a0d2', 9206.2, 1606915475.870678, 1), (64, 'c66c65175f', 11310.84, 1606845154.938982, 1), (65, '2a45938070', 10421.68, 1607445722.816128, 0), (71, 'd02560dd9d', 12641.66, 1608294347.759727, 0), (72, 'c097638f92', 11756.62, 1606864214.230547, 1), (74, '1f1362ea41', 8818.16, 1608196663.67103, 1), (75, '450ddec8dd', 5362.6, 1608197118.847317, 1), (76, 'd54ad009d1', 12264.43, 1607285101.379916, 1), (77, 'd321d6f7cc', 6203.45, 1607690904.747362, 1), (80, 'b888b29826', 7566.97, 1608234017.086775, 1), (82, '76546f9a64', 7875.78, 1607349955.2563, 0), (83, '7d7116e23e', 12041.98, 1608822886.869622, 1), (84, 'be461a0cd1', 5412.72, 1608642507.837087, 1), (86, '3c26dffc8a', 13368.24, 1606997304.321356, 0), (87, 'e62d7f1eb4', 6877.45, 1607159997.837802, 1), (91, '4cd66dfabb', 8445.84, 1607950499.615994, 0), (92, '8ee51caaa2', 10635.54, 1607525326.751903, 1), (93, '08a35293e0', 5715.43, 1608266291.093912, 1), (95, '8e63fd3e77', 5176.22, 1608375642.515072, 1), (96, '6fb84aed32', 13349.4, 1607067348.342952, 1), (97, '812ed4562d', 7137.98, 1607894900.155571, 0), (98, '31bd9b9f5f', 13016.32, 1608229964.756917, 0), (99, '9a79be611e', 11549.29, 1608299479.108404, 0)]
INFO tracking_sql_sql_based.py__6592c67f67 - Metric logged: table shape=(100, 5)
INFO tracking_sql_sql_based.py__6592c67f67 - Task tracking_sql_sql_based.py has been completed!
====================
= Your run has been successfully executed!
TRACKERS : ['console']
Updated 5 months ago