Skip to main content

Dagster & Sling (Component)

The dagster-sling library provides a SlingReplicationCollectionComponent which can be used to easily represent a collection of Sling replications as assets in Dagster.

1. Prepare a Dagster project

To begin, you'll need a Dagster project. You can use an existing components-ready project or create a new one:

create-dagster project my-project && cd my-project/src

Activate the project virtual environment:

source ../.venv/bin/activate

Finally, add the dagster-sling library to the project. We will also add duckdb to use as a destination for our Sling replication.

uv add dagster-sling duckdb

2. Scaffold a Sling component definition

Now that you have a Dagster project, you can scaffold a Sling component definition:

dg scaffold defs dagster_sling.SlingReplicationCollectionComponent sling_ingest
Creating defs at /.../my-project/src/my_project/defs/sling_ingest.

The dg scaffold defs call will generate a defs.yaml file and an unpopulated Sling replication.yaml file:

tree my_project/defs
my_project/defs
├── __init__.py
└── sling_ingest
├── defs.yaml
└── replication.yaml

2 directories, 3 files

In its scaffolded form, the defs.yaml file contains the configuration for your Sling workspace:

my_project/defs/sling_ingest/defs.yaml
type: dagster_sling.SlingReplicationCollectionComponent

attributes:
replications:
- path: replication.yaml

The generated file is a template, which still needs to be configured:

my_project/defs/sling_ingest/replication.yaml
source: {}
streams: {}
target: {}

3. Configure Sling replications

In the defs.yaml file, you can directly specify a list of Sling connections which you can use in your replications. Here, you can specify a connection to DuckDB:

my_project/defs/sling_ingest/defs.yaml
type: dagster_sling.SlingReplicationCollectionComponent

attributes:
connections:
DUCKDB:
type: duckdb
instance: /tmp/my_project.duckdb
replications:
- path: ./replication.yaml

For this example replication, we will ingest a set of CSV files to DuckDB. You can use curl to download some sample data:

curl -O https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_customers.csv &&
curl -O https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_orders.csv &&
curl -O https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_payments.csv

Next, you can configure Sling replications for each CSV file in replication.yaml:

my_project/defs/sling_ingest/replication.yaml
source: LOCAL
target: DUCKDB

defaults:
mode: full-refresh
object: "{stream_table}"

streams:
file://raw_customers.csv:
object: "main.raw_customers"
file://raw_orders.csv:
object: "main.raw_orders"
file://raw_payments.csv:
object: "main.raw_payments"

Our newly configured Sling component will produce an asset for each replicated file:

dg list defs
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┓ │
│ │ ┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━┩ │
│ │ │ file_raw_customers/csv │ default │ │ │ │ │
│ │ ├───────────────────────────┼─────────┼────────────────────────┼───────┼─────────────┤ │
│ │ │ file_raw_orders/csv │ default │ │ │ │ │
│ │ ├───────────────────────────┼─────────┼────────────────────────┼───────┼─────────────┤ │
│ │ │ file_raw_payments/csv │ default │ │ │ │ │
│ │ ├───────────────────────────┼─────────┼────────────────────────┼───────┼─────────────┤ │
│ │ │ target/main/raw_customers │ default │ file_raw_customers/csv │ sling │ │ │
│ │ ├───────────────────────────┼─────────┼────────────────────────┼───────┼─────────────┤ │
│ │ │ target/main/raw_orders │ default │ file_raw_orders/csv │ sling │ │ │
│ │ ├───────────────────────────┼─────────┼────────────────────────┼───────┼─────────────┤ │
│ │ │ target/main/raw_payments │ default │ file_raw_payments/csv │ sling │ │ │
│ │ └───────────────────────────┴─────────┴────────────────────────┴───────┴─────────────┘ │
└─────────┴────────────────────────────────────────────────────────────────────────────────────────┘

4. Customize Sling assets

Properties of the assets emitted by each replication can be customized in the defs.yaml file using the translation key:

my_project/defs/sling_ingest/defs.yaml
type: dagster_sling.SlingReplicationCollectionComponent

attributes:
connections:
DUCKDB:
type: duckdb
instance: /tmp/my_project.duckdb
replications:
- path: ./replication.yaml
translation:
group_name: sling_data
description: "Loads data from Sling replication {{ stream_definition.name }}"
dg list defs
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ │ ┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ │ file_raw_customers/csv │ default │ │ │ │ │
│ │ ├───────────────────────────┼────────────┼────────────────────────┼───────┼──────────────────────────────┤ │
│ │ │ file_raw_orders/csv │ default │ │ │ │ │
│ │ ├───────────────────────────┼────────────┼────────────────────────┼───────┼──────────────────────────────┤ │
│ │ │ file_raw_payments/csv │ default │ │ │ │ │
│ │ ├───────────────────────────┼────────────┼────────────────────────┼───────┼──────────────────────────────┤ │
│ │ │ target/main/raw_customers │ sling_data │ file_raw_customers/csv │ sling │ Loads data from Sling │ │
│ │ │ │ │ │ │ replication │ │
│ │ │ │ │ │ │ file://raw_customers.csv │ │
│ │ ├───────────────────────────┼────────────┼────────────────────────┼───────┼──────────────────────────────┤ │
│ │ │ target/main/raw_orders │ sling_data │ file_raw_orders/csv │ sling │ Loads data from Sling │ │
│ │ │ │ │ │ │ replication │ │
│ │ │ │ │ │ │ file://raw_orders.csv │ │
│ │ ├───────────────────────────┼────────────┼────────────────────────┼───────┼──────────────────────────────┤ │
│ │ │ target/main/raw_payments │ sling_data │ file_raw_payments/csv │ sling │ Loads data from Sling │ │
│ │ │ │ │ │ │ replication │ │
│ │ │ │ │ │ │ file://raw_payments.csv │ │
│ │ └───────────────────────────┴────────────┴────────────────────────┴───────┴──────────────────────────────┘ │
└─────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Pre-installing the Sling binary

The dagster-sling package includes the Sling Python CLI, but not the Sling binary. By default, the binary is downloaded at runtime, which may not work in environments where the download URL isn't on a network allowlist.

To pre-install the binary, you can download it during your build step and set the SLING_BINARY environment variable:

# Download and extract the binary
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_darwin_amd64.tar.gz'
tar xf sling_darwin_amd64.tar.gz
chmod +x sling

# Point to the binary
export SLING_BINARY=$(pwd)/sling

Sling will use the specified binary instead of downloading it at runtime.