Building Pipelines

A pipeline is a directed acyclic graph (DAG) of steps that execute in a defined order. Pipelines let you chain extraction, infrastructure provisioning, transformation, validation, and custom actions into a single automated workflow. You build pipelines visually — dragging steps onto a canvas and connecting them with dependency edges.

What a pipeline does

Without pipelines, each operation in Rime runs independently: you manually trigger a connector sync, then manually run a transformation, then manually check test results. A pipeline automates this sequence and handles failures, retries, and notifications.

A typical pipeline might look like:

Extract (PostgreSQL) ──> Transform (Kimball) ──> Validate ──> Webhook (notify Slack)
Extract (REST API)  ───┘

Both extraction steps run in parallel. The transformation step waits for both to complete. Validation runs after transformation. The webhook fires after validation. If any step fails, the pipeline can stop or continue, depending on your configuration.

The visual DAG builder

To create a pipeline:

Navigate to Pipelines in the project sidebar
Click New Pipeline
Enter a name and optional description
The DAG builder canvas opens

The canvas is a visual editor where you:

Add steps — click Add Step or drag a step type from the palette onto the canvas
Connect steps — drag from one step’s output handle to another step’s input handle to create a dependency edge
Remove connections — click an edge and press Delete
Rearrange — drag steps to reposition them on the canvas. Layout is cosmetic; execution order is determined entirely by the dependency edges.

Adding a step

When you add a step, you choose its type and configure it. Each step type has its own configuration panel.

Removing a step

Click a step and press Delete, or right-click and select Remove. Removing a step also removes all edges connected to it. Downstream steps that depended only on the removed step become root steps (no dependencies) unless you reconnect them.

Step types

Extract

Runs a connector sync. Equivalent to clicking Sync Now on a connector’s detail page, but triggered automatically as part of the pipeline.

Configuration:

Connector — select which connector to sync
Tables — optionally restrict to specific tables (default: all configured tables)
Timeout — maximum duration before the step is marked as failed (default: 60 minutes)

Provision

Applies infrastructure changes via Rime’s internal Terraform execution. Use this when your pipeline needs to create or modify Snowflake or AWS resources before other steps run.

Configuration:

Resources — select which infrastructure resources to include in this apply
Auto-approve — if enabled, changes are applied without a manual approval step. If disabled, the pipeline pauses and waits for approval. Default: disabled.
Timeout — maximum duration (default: 30 minutes)

Use Provision steps cautiously. Infrastructure changes are not easily reversed. Prefer running them manually until you are confident in the automation.

Transform

Executes a transformation project’s models. This is the equivalent of clicking Execute on a transformation project page.

Configuration:

Transformation project — select which project to run
Models — optionally restrict to specific models (default: all models in the project)
Full refresh — if enabled, all incremental models are rebuilt from scratch rather than appending only new data. Default: disabled.
Timeout — maximum duration (default: 120 minutes)

Validate

Runs data quality tests associated with a transformation project. Tests include not-null, unique, referential integrity, and accepted value checks.

Configuration:

Transformation project — select which project’s tests to run
Fail on warning — if enabled, test warnings (not just failures) cause the step to fail. Default: disabled.
Timeout — maximum duration (default: 30 minutes)

Validate steps are commonly placed after Transform steps to gate downstream processing on data quality. If validation fails, subsequent steps can be skipped to prevent propagating bad data.

SQL

Executes custom SQL against your Snowflake account. Use this for ad-hoc operations that do not fit into Rime’s standard transformation framework — data corrections, manual grants, or custom aggregations.

Configuration:

SQL statement — the SQL to execute. Supports multiple semicolon-separated statements.
Snowflake warehouse — which warehouse to use for execution
Timeout — maximum duration (default: 30 minutes)

SQL steps have read-write access to your Snowflake account. Use them carefully and test statements manually before including them in automated pipelines.

Webhook

Calls an external HTTP endpoint. Use this for notifications, triggering external systems, or integrating with tools outside Rime.

Configuration:

URL — the endpoint to call
Method — GET, POST, PUT, or DELETE
Headers — custom HTTP headers (e.g., authentication tokens)
Body — request body for POST/PUT requests. Supports template variables: {{pipeline.name}}, {{pipeline.run_id}}, {{step.name}}, {{step.status}}
Expected status code — the HTTP status code that indicates success (default: 200). Any other code marks the step as failed.
Timeout — maximum duration (default: 60 seconds)

Parallel execution

Steps without dependency edges between them run concurrently. In this example:

Extract A ──> Transform ──> Validate
Extract B ──┘
SQL setup  ──┘

Extract A, Extract B, and SQL setup all start at the same time. Transform waits for all three to complete. Validate waits for Transform.

Parallel execution reduces total pipeline duration. Design your DAGs to maximise parallelism where steps are genuinely independent.

Pipeline validation

When you save a pipeline, Rime validates the DAG:

Cycle detection — the graph must be acyclic. If step A depends on step B and step B depends on step A (directly or transitively), the pipeline is rejected with an error identifying the cycle.
Missing dependencies — every step that references a connector, transformation project, or infrastructure resource must reference one that exists. If a referenced object has been deleted, validation fails.
Empty pipeline — a pipeline must contain at least one step.
Unreachable steps — Rime warns (but does not block) if steps are disconnected from the main graph. Disconnected steps execute as independent root steps.

Validation errors appear inline on the canvas, highlighting the affected steps and edges.

Failure handling

Each step has a failure policy that determines what happens when it fails:

Stop pipeline (default) — all pending steps are cancelled. Steps already running are allowed to complete.
Continue — the failed step is marked as failed, but the pipeline continues. Downstream steps that depend directly on the failed step are skipped. Other branches continue executing.

You can set failure policies per step. For example, a Webhook notification step might use “continue” (a notification failure should not block data processing), while a Validate step might use “stop” (bad data should not propagate).

Next steps

Configure automated execution with Pipeline Scheduling
Understand how edits are tracked in Pipeline Versioning
Monitor running pipelines in Pipeline Execution