GuidesAPI ReferenceDiscussions
GuidesBlogPlatform

Tracking Snowflake

Logging your Snowflake dataset operations with Databand

Databand allows you to log your dataset operations when using Python to call COPY INTO SQL commands on Snowflake. Wrapping your Snowflake cursor's execution with Databand's SnowflakerTracker context manager will catch the cursor's result and extract dataset operations from it.

1. Installation

Begin by installing the required packages using the following command:

pip install dbnd dbnd-snowflake

ℹ️

Requirements

Although there are many alternatives for writing data to Snowflake using Python, logging your Snowflake dataset operations with Databand requires usage of Snowflake's official Python package: snowflake-connector-python.

2. Integration with SnowflakeTracker

Assume the following code is what you are currently using to copy some file into a Snowflake table:

from snowflake import connector

SQL_QUERY = """
COPY INTO DB.PUBLIC.TABLE
FROM s3://path/to/some/file/file.csv
CREDENTIALS = (AWS_KEY_ID = 'key' AWS_SECRET_KEY = 'key');
"""

with connector.connect(
        user=SNOWFLAKE_USER,
        password=SNOWFLAKE_PASSWORD,
        account=SNOWFLAKE_ACCOUNT,
        database=SNOWFLAKE_DATABASE,
        schema=SNOWFLAKE_SCHEMA) as con:
    c = con.cursor()
    c.execute(SQL_QUERY)

To log the results of SQL_QUERY with Databand, you can make the following changes:

from snowflake import connector
from dbnd_snowflake import SnowflakeTracker

SQL_QUERY = """
COPY INTO DB.PUBLIC.TABLE
FROM s3://path/to/some/file/file.csv
CREDENTIALS = (AWS_KEY_ID = 'key' AWS_SECRET_KEY = 'key');
"""

with connector.connect(
        user=SNOWFLAKE_USER,
        password=SNOWFLAKE_PASSWORD,
        account=SNOWFLAKE_ACCOUNT,
        database=SNOWFLAKE_DATABASE,
        schema=SNOWFLAKE_SCHEMA) as con:
    c = con.cursor()
    with SnowflakeTracker():
        c.execute(SQL_QUERY)

Under the hood, SnowflakeTracker will catch the execution of c.execute(SQL_QUERY). This will allow you to track both the read operation of your file from S3 as well as the write operation to DB.PUBLIC.TABLE in Snowflake.

SnowflakeTracker.png

Limitations

  • Schema tracking is only supported for tables, not files.
  • Only one query execution should be provided for each SnowflakerTracker context.
  • Nested queries are not supported (e.g.: COPY INTO (SELECT * FROM TABLE) table FROM...).

3. Nested queries for read operations.

Snowflake tracker supports nested queries for read operations.

Example:

`copy into dband_poc from ( SELECT $1:dt::date , $1:acount::varchar from @dband_poc/dt=20220316/);`

Limitations:

  • Only select nested queries are supported
  • Join operations are not supported