GuidesAPI ReferenceDiscussions
GuidesBlogPlatform

Tracking Datastage

Overview

With Databand, you can track the execution of your DataStage jobs. This is achieved through the use of a syncer that will scan your DataStage project every few seconds and report on collected metadata from any jobs that have been run. With the metadata collected, you can enable powerful alerting to notify your data team on the health of your jobs and the quality of your inputs and outputs.

Available Alerting

Databand natively offers the following alerts for your DataStage job runs:

  • Run and task state alerts (e.g. running, successful, failed, etc.)
  • Run duration alerts
    • anomaly detection
    • percent ranges (e.g. duration within 20% of 100 seconds)
    • basic comparison operators (e.g. duration > 100 seconds)
  • Schema changes for inputs and outputs
    • new columns added
    • old columns removed
    • datatypes of existing columns changed
  • Record counts for inputs and outputs
    • anomaly detection
    • percent ranges (e.g. record count within 20% of 100,000 rows)
    • basic comparison operators (e.g. record count > 100,000 rows)

Creating a DataStage Syncer:

To begin monitoring your DataStage project in Databand, start by creating a DataStage syncer in the Databand UI:

  1. Click on Integrations in the lefthand menu.

  2. Click the Connect button under DataStage.

    datastage syncerdatastage syncer

  3. In the syncer configuration, provide the following details:

    add datastage synceradd datastage syncer

    • Source name - This will become the name of your DataStage syncer in the Databand UI and will allow you to filter flows based on their DataStage projects.

    • Project ID - The ID of the DataStage project you would like to monitor. The project ID can be found in the URL of your DataStage project.

      project idproject id

    • API key - The API key will allow Databand to authenticate with your DataStage project. To generate a new API key for your user identity, follow these steps outlined in the IBM documentation.

    Advanced Settings:

    • Hostname - The hostname for an on-prem deployment of DataStage.
    • IAM Service URL - The hostname for an on-prem IAM authentication service of DataStage.
    • Number of threads - The number of concurrent threads to use on the DataStage API client. The default recommended value is 2.
  4. After providing the required parameters, click Save.

Once these steps have been completed, the next time a job runs in your DataStage project, you will see it in your Databand UI. The name of your job in the DataStage UI will become the pipeline name in the Databand UI.

Editing an Existing DataStage Syncer

  1. Click on Settings in the lefthand menu.
  2. Click on Datasource Syncers in the settings menu.
  3. Click the button in the Actions column, and select Edit from the context menu.
  4. Make the necessary changes in the syncer configuration, and then click the Save button.

edit datastage synceredit datastage syncer

Metadata Collected

Databand will collect high level information about the execution of your DataStage jobs as well as general information about the inputs and outputs of your stages. The metadata collected includes the following:

Graphical representation of the flow

a flow in datastagea flow in datastage

A flow in DataStage

the same flow in databandthe same flow in databand

The same flow in Databand

Logs from each stage

datastage logsdatastage logs

Start and end times, duration, and run ID of the job execution

datastage run infodatastage run info

Job execution and dataset metrics

datastage metricsdatastage metrics

Inputs and outputs of the job

datastage inputsdatastage inputs

Current Limitations

  1. Each syncer only supports a single DataStage project. Soon, users will be able to add multiple projects to a single syncer.
  2. Subflows and custom steps are not yet supported.