Tracking Redshift

How to get started with tracking Redshift resource usage with DBND.

For pipelines that integrate AWS Redshift into their workflows, the performance of your queries could have a detrimental impact on your pipeline health. With DBND, you can monitor the resource usage of your Redshift queries.

This guide will demonstrate how to get started with tracking Redshift resource usage with DBND.

Installing requirements

This guide assumes that your Redshift is configured to accept inbound connections. We will be using psycopg2 to connect to Redshift.

πŸ“˜

Currently, DBND only supports psycopg2 connections as the connection parameter. If you are using a different driver (such as pyodbc), please provide the connection ODBC connection string instead.

You can install DBND’s Redshift plug in with the following command:

pip install dbnd_redshift

Tracking Redshift

First, import the necessary modules and establish a connection to Redshift:

import psycopg2 
from dbnd_redshift.dbnd_redshift import get_redshift_query_id, log_redshift_resource_usage  

def establish_connection():
    '''establish and return a Redshift connection'''
    redshift_conn = psycopg2.connect(
        dbname='database',
        host='cluster-name.acc-name.region.redshift.amazonaws.com',
        port='5439', user='user', password='password'
    )
    return redshift_conn

Next, execute the query with the cursor. After the query has been executed, you can simply get the query ID of the query you wish to track and use log_redshift_resource_usage to report the Redshift resource usage.

As an alternative to passing in a psycopg2 connection, you can pass in the connection_string (end point) by using the connection_string parameter of log_redshift_resource_usage and get_redshift_query_id.

def execute_and_log(redshift_connection, query):
    '''execute a query and log Redshift resource usage'''
    cursor = redshift_connection.cursor() 
    cursor.execute(query)

    query_id = get_redshift_query_id(query, connection=redshift_connection)
    log_redshift_resource_usage(query_id, connection=redshift_connection)

Complete Workflow

Now, we can just invoke the functions.

if __name__ == "__main__":
    redshift_connection = establish_connection()

    query = "SELECT * FROM transaction_data WHERE transaction_amt >= 5000;"
    execute_and_log(redshift_connection, query)

What’s Next

To track various metrics and tables from Redshift, see our guide on Tracking SQL

Did this page help you?