GuidesAPI ReferenceDiscussions
GuidesBlogPlatform

Tracking Redshift

Logging your Redshift dataset operations with Databand

Databand allows you to log your dataset operations when using Python to call SQL commands on Redshift. Wrapping your Redshift cursor's execution with Databand's RedshiftTracker context manager will catch the cursor's result and extract Dataset Tracking operations from it. Currently only COPY INTO is supported.

Requirements

This guide assumes that your Redshift is configured to accept inbound connections. We will be using psycopg2 to connect to Redshift. Currently, DBND only supports psycopg2 connections as the connection parameter.

Make sure that the package "dbnd-redshift" is installed (via databand[redshift] for example). See more info at Installing DBND

Integration with RedshifTracker

Assume the following code is what you are currently using to copy some files into a Redshift table:

import psycopg2

SQL_QUERY = """
COPY DB.PUBLIC.TABLE 
FROM s3://path/to/some/file/file.csv 
iam_role '<role>' csv;
"""

with psycopg2.connect(
        host=REDSHIFT_HOST,
        port=REDSHIFT_PORT,
        database=REDSHIFT_DB,
        user=REDSHIFT_USER,
        password=REDSHIFT_PASSWORD
) as con:
    c = con.cursor()
    c.execute(SQL_QUERY)

To log the results of your query with Databand, you should run all your SQL queries in the context of RedshiftTracker:

from dbnd_redshift import RedshiftTracker

with RedshiftTracker():
        ... 
        c.execute(SQL_QUERY)
        ...

Under the hood, RedshiftTracker will catch the execution of c.execute(SQL_QUERY). Only one query execution should be provided for each RedshiftTracker context.

COPY INTO Command

Databand can track "COPY INTO" command. This will allow you to track both the read operation of your file from S3 as well as the write operation to DB.PUBLIC.TABLE in Redshift.

Redshift will log the read and write operations, capturing the number of rows and schema where possible.Redshift will log the read and write operations, capturing the number of rows and schema where possible.

Redshift will log the read and write operations, capturing the number of rows and schema where possible.

Current COPY INTO Limitations are:

  1. Schema tracking is only supported for tables, not files.
  2. Nested queries are not supported (e.g.: COPY (SELECT * FROM TABLE) table FROM...).

What’s Next
Did this page help you?