- 01 May 2024
- 7 Minutes to read
- Print
- DarkLight
dbt Sync
- Updated on 01 May 2024
- 7 Minutes to read
- Print
- DarkLight
You need to have access to the Preset API in order to successfully use the Preset CLI.
dbt Sync
The dbt sync operations are compatible with both dbt Core and dbt Cloud. It allows you to sync your model definitions (along with its metadata) to Preset as physical datasets, and also sync Preset charts and dashboards to dbt as exposures.
dbt Core command
Use below syntax to trigger a dbt Core sync:
preset-cli --workspaces=$https://workspace_slug.workspace_region.app.preset.io/ \
superset sync dbt-core $/path/to/dbt_project.yml \
--project=$project_name --target=$target --profiles=$path/to/.dbt/profiles.yml
Replace:
$https://workspace_slug.workspace_region.app.preset.io/
with the actual link to the Preset Workspace.$/path/to/dbt_project.yml
with the path to the dbt project'sdbt_project.yml
file.$project_name
with the dbt project name.$target
with the target name.$path/to/.dbt/profiles.yml
with the path to your dbtprofiles.yml
file.
By default, the sync would search for a database connection named $project_name_$target_name
(for example, jaffle_shop_dev
) in the destination Workspace to associate the synced datasets to. It's possible to specify a different display name in the profiles YAML file for a correct mapping, under the target.meta.superset.database_name
field. For example:
jaffle_shop:
outputs:
dev:
meta:
superset:
database_name: Postgres - Production # <= specify the database connection/display name used on the workspace
Alternatively, it's possible to include the --import-db
flag to the CLI command to either create a new database connection or update the existing connection (based on the connection settings from the profiles YAML file).
dbt Cloud command
It's possible to configure a continuous dbt Cloud sync from the DB connection configuration in the Workspace. Further details on the dbt integration page.
Use below syntax to trigger a dbt Cloud sync:
preset-cli --workspaces=$https://workspace_slug.workspace_region.app.preset.io/ \
superset sync dbt-cloud $dbt_token $account_id $project_id $job_id
Replace:
$https://workspace_slug.workspace_region.app.preset.io/
with the actual link to the Preset Workspace.$dbt_token
with a dbt Cloud token with access to the project metadata.$account_id
with the ID of the dbt account that contains the project.$project_id
with the ID of the project that should be synced.$job_id
with the ID of the dbt job to fetch data from. If you don't provide this information, you'll be prompted to choose the account and job.
💡 The $account_id
, $project_id
and $job_id
are optional and if not specified the CLI will fetch available options and prompt you to choose.
The dbt Cloud sync only works in case the required database connection already exists in the destination Workspace. The sync will search for an existing DB connection with a display name matching the DB identifier specified in the dbt Cloud configuration (for example, the database
value for a Snowflake connection / the Project ID
for a BQ connection).
Optional flags
Syncing charts and dashboards as exposures back to dbt
Use the --exposures
flag (for example --exposures=$/path/to/dbt/project/models/exposures.yaml
) to specify the path to the dbt exposures file that should be updated. File structure:
version: 2
exposures:
- name: Example_Chart_chart_1
label: Example Chart [chart]
type: analysis
maturity: low
url: https://workspace_slug.region.app.preset.io/superset/explore/?form_data=%7B%22slice_id%22:+1%7D
description: 'Chart description'
depends_on:
- ref('dbt_vehicle_sales')
owner:
name: Owner Display Name
email: unknown
- name: Exposures_dash_dashboard_1
label: Exposures dash [dashboard]
type: dashboard
maturity: low
url: https://workspace.region.app.preset.io/superset/dashboard/1/
description: ''
depends_on:
- ref('dbt_vehicle_sales')
owner:
name: Owner Display Name
email: unknown
Some considerations:
type
is set toanalysis
for charts anddashboard
for dashboards.maturity
is set tohigh
for certified charts and published dashboards, otherwise is set tolow
.description
is imported from Preset for charts. Since dashboards don't have a description, it's set as''
.- Owner email address isn't synced to the file (
unknown
is shown, instead).
Note that this operation currently replaces all content from the existing file, so in case you have exposures from other sources, you might want to create a new file from the CLI sync and then manually combine the two.
Sync only exposures
The --exposures-only
flag can be added to the command, so that the operation doesn't sync dbt models to Preset as datasets, but instead only generate the exposures
file.
Disallow edits
The --disallow-edits
flag clan can be added to the command, so that the synced datasets can't be modified in Preset UI directly. This can be useful in case you want to make sure that dbt is your source of truth, enforcing changes to be synced from it.
Preserve Preset metadata
By default, the dbt sync performs below actions:
- Sync dbt metadata to Preset datasets.
- Delete any dataset metrics that only exists in Preset. Metrics should be synced from dbt.
- Sync columns from source, restoring any deleted column in Preset UI.
- Refresh the columns metadata, setting Is filterable and Is dimension to true.
While this behavior ensures that dbt is the source of truth, it can be disruptive specially for Organizations that have a lot of ad hoc metrics created in the Preset side.
To avoid impact, there are two flags that can be used to have a different behavior:
--merge-metadata
: The--merge-metadata
flag can be used so that the dbt metadata is applied to Preset datasets, however Preset-only changes aren't discarded:- New metrics are synced properly from dbt.
- Metrics created on Preset side are preserved.
- dbt metadata is applied to metrics that exist in both tools.
- Dataset and columns metadata synced from dbt.
--preserve-metadata
: The--preserve-metadata
flag can be used to consider the Preset metadata as the source of truth:- New metrics are synced properly from dbt.
- Metrics created on Preset side are preserved.
- For metrics that exist on both tools, Preset metadata is preserved.
- Dataset and columns metadata synced from dbt (if not conflicting with Preset information).
Raise sync failures
The dbt sync is not atomic, so in case a model fails to sync, the other models are still going to be synced (and the failure is logged in the terminal). While this behavior is definitely optimal to ensure data is updated, when running it via a CI/CD pipeline it gets difficult to monitor failures (since the execution always finish successfully).
The --raise-failures
flag can be included so that the execution ends with a failed status code in case any model failed to sync OR the command used any feature/flag marked for deprecation.
Filtering models
By default all the models are synced to the Workspace. The CLI supports a subset of the syntax used by the dbt command line to specify which models should be synced. It's possible to use the --select
flag to specify models to be synced:
preset-cli ... --select my_model # sync only "my_model"
preset-cli ... --select my_model+ # sync "my_model" and its children
preset-cli ... --select my_model+2 # sync "my_model" and its children up to 2 degrees
preset-cli ... --select +my_model # sync "my_model" and its parents
preset-cli ... --select +my_model+ # sync "my_model" with parents and children
preset-cli ... --select config.materialized:table # sync only models materialized as tables
Multiple selectors can be passed by repeating the --select
flag (note that this is slightly different from dbt, which doesn't require repeating the flag):
preset-cli ... --select my_model --select my_other_model
The CLI also support the intersection operator:
preset-cli ... --select my_model+,tag:test
The command above will synchronize my_model
and its children, as long as the models have the test
tag.
Finally, the CLI also supports the --exclude
flag in a similar way:
preset-cli ... --select my_model+ --exclude tag:test
The command above synchronizes my_model
and its children, as long as the models don't have the test
tag.
preset-cli ... --exclude config.materialized:ephemeral
The command above syncs all non-ephemeral models to Preset.
Controling Preset-specific metadata from dbt
The sync process would map common model metadata to your datasets (for example, the model and column descriptions, labels, etc). However, it's also possible to control Preset-specific columns directly from your dbt model definition, under the meta.superset.$field_name
key ($field_name
would be the name for the specific field in the Preset metadata).
In the example below, the dbt sync would also populate the CACHE TIMEOUT value (using the meta.superset.cache_timeout
field) and add warning markdown (using the meta.superset.extra.warning_markdown
field) to the dataset:
version: 2
models:
- name: dbt_vehicle_sales
meta:
superset:
cache_timeout: 1000
extra:
warning_markdown: This dataset is still under construction
By default, the dataset receives a generic certification This table is produced by dbt
, but it's also possible to define a specific certification information:
version: 2
models:
- name: dbt_vehicle_sales
meta:
superset:
cache_timeout: 1000
extra:
certified_by: The Data Team
certification_details: Sales data for vehicles.
Setting the meta.superset.extra
value to null
can also be used to prevent any certification info from being synced.
This control is also applicable to columns and metrics. In below example, the metric warning_text
and also d3_format
is specified:
version: 2
metrics:
- name: monthly_revenue
label: Monthly Revenue
model: ref('dbt_vehicle_sales')
description: Monthly revenue.
meta:
superset:
warning_text: Compatible only with monthly time grains.
d3format: '%b %Y'
calculation_method: sum
expression: price_each
timestamp: date
time_grains: [month]