For pipelines that integrate AWS Redshift into their workflows, the performance of your queries could have a detrimental impact on your pipeline health. With DBND, you can monitor the resource usage of your Redshift queries.
This guide will demonstrate how to get started with tracking Redshift resource usage with DBND.
This guide assumes that your Redshift is configured to accept inbound connections. We will be using
psycopg2 to connect to Redshift.
Currently, DBND only supports
psycopg2connections as the connection parameter. If you are using a different driver (such as
pyodbc), please provide the connection ODBC connection string instead.
You can install DBND’s Redshift plug in with the following command:
pip install dbnd_redshift
First, import the necessary modules and establish a connection to Redshift:
import psycopg2 from dbnd_redshift.dbnd_redshift import get_redshift_query_id, log_redshift_resource_usage def establish_connection(): '''establish and return a Redshift connection''' redshift_conn = psycopg2.connect( dbname='database', host='cluster-name.acc-name.region.redshift.amazonaws.com', port='5439', user='user', password='password' ) return redshift_conn
Next, execute the query with the cursor. After the query has been executed, you can simply get the query ID of the query you wish to track and use
log_redshift_resource_usage to report the Redshift resource usage.
As an alternative to passing in a
psycopg2 connection, you can pass in the
connection_string (end point) by using the
connection_string parameter of
def execute_and_log(redshift_connection, query): '''execute a query and log Redshift resource usage''' cursor = redshift_connection.cursor() cursor.execute(query) query_id = get_redshift_query_id(query, connection=redshift_connection) log_redshift_resource_usage(query_id, connection=redshift_connection)
Now, we can just invoke the functions.
if __name__ == "__main__": redshift_connection = establish_connection() query = "SELECT * FROM transaction_data WHERE transaction_amt >= 5000;" execute_and_log(redshift_connection, query)
Updated about 1 month ago
To track various metrics and tables from Redshift, see our guide on Tracking SQL