Dagster & Sling (Component)
The dagster-sling library provides a SlingReplicationCollectionComponent which can be used to easily represent a collection of Sling replications as assets in Dagster.
1. Prepare a Dagster project
To begin, you'll need a Dagster project. You can use an existing components-ready project or create a new one:
create-dagster project my-project && cd my-project/src
Activate the project virtual environment:
source ../.venv/bin/activate
Finally, add the dagster-sling library to the project. We will also add duckdb to use as a destination for our Sling replication.
uv add dagster-sling duckdb
2. Scaffold a Sling component definition
Now that you have a Dagster project, you can scaffold a Sling component definition:
dg scaffold defs dagster_sling.SlingReplicationCollectionComponent sling_ingest
Creating defs at /.../my-project/src/my_project/defs/sling_ingest.
The dg scaffold defs call will generate a defs.yaml file and an unpopulated Sling replication.yaml file:
tree my_project/defs
my_project/defs
├── __init__.py
└── sling_ingest
├── defs.yaml
└── replication.yaml
2 directories, 3 files
In its scaffolded form, the defs.yaml file contains the configuration for your Sling workspace:
type: dagster_sling.SlingReplicationCollectionComponent
attributes:
replications:
- path: replication.yaml
The generated file is a template, which still needs to be configured:
source: {}
streams: {}
target: {}
3. Configure Sling replications
In the defs.yaml file, you can directly specify a list of Sling connections which you can use in your replications. Here, you can specify a connection to DuckDB:
type: dagster_sling.SlingReplicationCollectionComponent
attributes:
connections:
DUCKDB:
type: duckdb
instance: /tmp/my_project.duckdb
replications:
- path: ./replication.yaml
For this example replication, we will ingest a set of CSV files to DuckDB. You can use curl to download some sample data:
curl -O https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_customers.csv &&
curl -O https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_orders.csv &&
curl -O https://raw.githubusercontent.com/dbt-labs/jaffle-shop-classic/refs/heads/main/seeds/raw_payments.csv
Next, you can configure Sling replications for each CSV file in replication.yaml:
source: LOCAL
target: DUCKDB
defaults:
mode: full-refresh
object: "{stream_table}"
streams:
file://raw_customers.csv:
object: "main.raw_customers"
file://raw_orders.csv:
object: "main.raw_orders"
file://raw_payments.csv:
object: "main.raw_payments"
Our newly configured Sling component will produce an asset for each replicated file:
dg list defs
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┓ │
│ │ ┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━┩ │
│ │ │ file_raw_customers/csv │ default │ │ │ │ │
│ │ ├───────────────────────────┼─────────┼────────────────────────┼───────┼─────────────┤ │
│ │ │ file_raw_orders/csv │ default │ │ │ │ │
│ │ ├───────────────────────────┼─────────┼────────────────────────┼───────┼─────────────┤ │
│ │ │ file_raw_payments/csv │ default │ │ │ │ │
│ │ ├───────────────────────────┼─────────┼────────────────────────┼───────┼─────────────┤ │
│ │ │ target/main/raw_customers │ default │ file_raw_customers/csv │ sling │ │ │
│ │ ├───────────────────────────┼─────────┼────────────────────────┼───────┼─────────────┤ │
│ │ │ target/main/raw_orders │ default │ file_raw_orders/csv │ sling │ │ │
│ │ ├───────────────────────────┼─────────┼────────────────────────┼───────┼─────────────┤ │
│ │ │ target/main/raw_payments │ default │ file_raw_payments/csv │ sling │ │ │
│ │ └───────────────────────────┴─────────┴────────────────────────┴───────┴─────────────┘ │
└─────────┴────────────────────────────────────────────────────────────────────────────────────────┘
4. Customize Sling assets
Properties of the assets emitted by each replication can be customized in the defs.yaml file using the translation key:
type: dagster_sling.SlingReplicationCollectionComponent
attributes:
connections:
DUCKDB:
type: duckdb
instance: /tmp/my_project.duckdb
replications:
- path: ./replication.yaml
translation:
group_name: sling_data
description: "Loads data from Sling replication {{ stream_definition.name }}"
dg list defs
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━ ┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ │ ┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ │ file_raw_customers/csv │ default │ │ │ │ │
│ │ ├───────────────────────────┼────────────┼────────────────────────┼───────┼──────────────────────────────┤ │
│ │ │ file_raw_orders/csv │ default │ │ │ │ │
│ │ ├───────────────────────────┼────────────┼────────────────────────┼───────┼───────────────────── ─────────┤ │
│ │ │ file_raw_payments/csv │ default │ │ │ │ │
│ │ ├───────────────────────────┼────────────┼────────────────────────┼───────┼──────────────────────────────┤ │
│ │ │ target/main/raw_customers │ sling_data │ file_raw_customers/csv │ sling │ Loads data from Sling │ │
│ │ │ │ │ │ │ replication │ │
│ │ │ │ │ │ │ file://raw_customers.csv │ │
│ │ ├───────────────────────────┼────────────┼────────────────────────┼───────┼──────────────────────────────┤ │
│ │ │ target/main/raw_orders │ sling_data │ file_raw_orders/csv │ sling │ Loads data from Sling │ │
│ │ │ │ │ │ │ replication │ │
│ │ │ │ │ │ │ file://raw_orders.csv │ │
│ │ ├───────────────────────────┼────────────┼────────────────────────┼───────┼──────────────────────────────┤ │
│ │ │ target/main/raw_payments │ sling_data │ file_raw_payments/csv │ sling │ Loads data from Sling │ │
│ │ │ │ │ │ │ replication │ │
│ │ │ │ │ │ │ file://raw_payments.csv │ │
│ │ └───────────────────────────┴────────────┴────────────────────────┴───────┴──────────────────────────────┘ │
└─────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Pre-installing the Sling binary
The dagster-sling package includes the Sling Python CLI, but not the Sling binary. By default, the binary is downloaded at runtime, which may not work in environments where the download URL isn't on a network allowlist.
To pre-install the binary, you can download it during your build step and set the SLING_BINARY environment variable:
- MacOS
- Linux
- Windows
# Download and extract the binary
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_darwin_amd64.tar.gz'
tar xf sling_darwin_amd64.tar.gz
chmod +x sling
# Point to the binary
export SLING_BINARY=$(pwd)/sling
# Download and extract the binary
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz'
tar xf sling_linux_amd64.tar.gz
chmod +x sling
# Point to the binary
export SLING_BINARY=/path/to/sling
# Download and extract the binary
Invoke-WebRequest -Uri 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_windows_amd64.zip' -OutFile sling.zip
Expand-Archive -Path sling.zip -DestinationPath .
# Point to the binary
$env:SLING_BINARY = "$PWD\sling.exe"
Sling will use the specified binary instead of downloading it at runtime.