dbt Sync
  • 19 Mar 2024
  • 7 Minutes to read
  • Dark
    Light

dbt Sync

  • Dark
    Light

Article Summary

API Requirement

You need to have access to the Preset API in order to successfully use the Preset CLI.

dbt Sync

The dbt sync operations are compatible with both dbt Core and dbt Cloud. It allows you to sync your model definitions (along with its metadata) to Preset as physical datasets, and also sync Preset charts and dashboards to dbt as exposures.

dbt Core command

Use below syntax to trigger a dbt Core sync:

preset-cli --workspaces=$https://workspace_slug.workspace_region.app.preset.io/ \
superset sync dbt-core $/path/to/dbt_project.yml \
--project=$project_name --target=$target --profiles=$path/to/.dbt/profiles.yml 

Replace:

  • $https://workspace_slug.workspace_region.app.preset.io/ with the actual link to the Preset Workspace.
  • $/path/to/dbt_project.yml with the path to the dbt project's dbt_project.yml file.
  • $project_name with the dbt project name.
  • $target with the target name.
  • $path/to/.dbt/profiles.yml with the path to your dbt profiles.yml file.

By default, the sync would search for a database connection named $project_name_$target_name (for example, jaffle_shop_dev) in the destination Workspace to associate the synced datasets to. It's possible to specify a different display name in the profiles YAML file for a correct mapping, under the target.meta.superset.database_name field. For example:

jaffle_shop:
  outputs:
    dev:
      meta:
        superset:
          database_name: Postgres - Production # <= specify the database connection/display name used on the workspace

Alternatively, it's possible to include the --import-db flag to the CLI command to either create a new database connection or update the existing connection (based on the connection settings from the profiles YAML file).

dbt Cloud command

dbt Cloud UI sync

It's possible to configure a continuous dbt Cloud sync from the DB connection configuration in the Workspace. Further details on the dbt integration page.

Use below syntax to trigger a dbt Cloud sync:

preset-cli --workspaces=$https://workspace_slug.workspace_region.app.preset.io/ \
superset sync dbt-cloud $dbt_token $account_id $project_id $job_id

Replace:

  • $https://workspace_slug.workspace_region.app.preset.io/ with the actual link to the Preset Workspace.
  • $dbt_token with a dbt Cloud token with access to the project metadata.
  • $account_id with the ID of the dbt account that contains the project.
  • $project_id with the ID of the project that should be synced.
  • $job_id with the ID of the dbt job to fetch data from. If you don't provide this information, you'll be prompted to choose the account and job.

💡 The $account_id, $project_id and $job_id are optional and if not specified the CLI will fetch available options and prompt you to choose.

The dbt Cloud sync only works in case the required database connection already exists in the destination Workspace. The sync will search for an existing DB connection with a display name matching the DB identifier specified in the dbt Cloud configuration (for example, the database value for a Snowflake connection / the Project ID for a BQ connection).

Optional flags

Syncing charts and dashboards as exposures back to dbt

Use the --exposures flag (for example --exposures=$/path/to/dbt/project/models/exposures.yaml) to specify the path to the dbt exposures file that should be updated. File structure:

version: 2
exposures:
- name: Example_Chart_chart_1
  label: Example Chart [chart]
  type: analysis
  maturity: low
  url: https://workspace_slug.region.app.preset.io/superset/explore/?form_data=%7B%22slice_id%22:+1%7D
  description: 'Chart description'
  depends_on:
  - ref('dbt_vehicle_sales')
  owner:
    name: Owner Display Name
    email: unknown
- name: Exposures_dash_dashboard_1
  label: Exposures dash [dashboard]
  type: dashboard
  maturity: low
  url: https://workspace.region.app.preset.io/superset/dashboard/1/
  description: ''
  depends_on:
  - ref('dbt_vehicle_sales')
  owner:
    name: Owner Display Name
    email: unknown

Some considerations:

  • type is set to analysis for charts and dashboard for dashboards.
  • maturity is set to high for certified charts and published dashboards, otherwise is set to low.
  • description is imported from Preset for charts. Since dashboards don't have a description, it's set as ''.
  • Owner email address isn't synced to the file (unknown is shown, instead).

Note that this operation currently replaces all content from the existing file, so in case you have exposures from other sources, you might want to create a new file from the CLI sync and then manually combine the two.

Sync only exposures

The --exposures-only flag can be added to the command, so that the operation doesn't sync dbt models to Preset as datasets, but instead only generate the exposures file.

Disallow edits

The --disallow-edits flag clan can be added to the command, so that the synced datasets can't be modified in Preset UI directly. This can be useful in case you want to make sure that dbt is your source of truth, enforcing changes to be synced from it.
image.png

Preserve Preset metadata

By default, the dbt sync performs below actions:

  • Sync dbt metadata to Preset datasets.
  • Delete any dataset metrics that only exists in Preset. Metrics should be synced from dbt.
  • Sync columns from source, restoring any deleted column in Preset UI.
  • Refresh the columns metadata, setting Is filterable and Is dimension to true.

While this behavior ensures that dbt is the source of truth, it can be disruptive specially for Organizations that have a lot of ad hoc metrics created in the Preset side.

To avoid impact, there are two flags that can be used to have a different behavior:

  • --merge-metadata: The --merge-metadata flag can be used so that the dbt metadata is applied to Preset datasets, however Preset-only changes aren't discarded:
    • New metrics are synced properly from dbt.
    • Metrics created on Preset side are preserved.
    • dbt metadata is applied to metrics that exist in both tools.
    • Dataset and columns metadata synced from dbt.
  • --preserve-metadata: The --preserve-metadata flag can be used to consider the Preset metadata as the source of truth:
    • New metrics are synced properly from dbt.
    • Metrics created on Preset side are preserved.
    • For metrics that exist on both tools, Preset metadata is preserved.
    • Dataset and columns metadata synced from dbt (if not conflicting with Preset information).

Raise sync failures

The dbt sync is not atomic, so in case a model fails to sync, the other models are still going to be synced (and the failure is logged in the terminal). While this behavior is definitely optimal to ensure data is updated, when running it via a CI/CD pipeline it gets difficult to monitor failures (since the execution always finish successfully).

The --raise-failures flag can be included so that the execution ends with a failed status code in case any model failed to sync OR the command used any feature/flag marked for deprecation.

Filtering models

By default all the models are synced to the Workspace. The CLI supports a subset of the syntax used by the dbt command line to specify which models should be synced. It's possible to use the --select flag to specify models to be synced:

preset-cli ... --select my_model    # sync only "my_model"
preset-cli ... --select my_model+   # sync "my_model" and its children
preset-cli ... --select my_model+2  # sync "my_model" and its children up to 2 degrees
preset-cli ... --select +my_model   # sync "my_model" and its parents
preset-cli ... --select +my_model+  # sync "my_model" with parents and children
preset-cli ... --select config.materialized:table # sync only models materialized as tables

Multiple selectors can be passed by repeating the --select flag (note that this is slightly different from dbt, which doesn't require repeating the flag):

preset-cli ... --select my_model --select my_other_model

The CLI also support the intersection operator:

preset-cli ... --select my_model+,tag:test

The command above will synchronize my_model and its children, as long as the models have the test tag.

Finally, the CLI also supports the --exclude flag in a similar way:

preset-cli ... --select my_model+ --exclude tag:test

The command above synchronizes my_model and its children, as long as the models don't have the test tag.

preset-cli ... --exclude config.materialized:ephemeral

The command above syncs all non-ephemeral models to Preset.

Controling Preset-specific metadata from dbt

The sync process would map common model metadata to your datasets (for example, the model and column descriptions, labels, etc). However, it's also possible to control Preset-specific columns directly from your dbt model definition, under the meta.superset.$field_name key ($field_name would be the name for the specific field in the Preset metadata).

In the example below, the dbt sync would also populate the CACHE TIMEOUT value (using the meta.superset.cache_timeout field) and add warning markdown (using the meta.superset.extra.warning_markdown field) to the dataset:

version: 2

models:
  - name: dbt_vehicle_sales
    meta:
      superset:
        cache_timeout: 1000
        extra:
          warning_markdown: This dataset is still under construction

By default, the dataset receives a generic certification This table is produced by dbt, but it's also possible to define a specific certification information:

version: 2

models:
  - name: dbt_vehicle_sales
    meta:
      superset:
        cache_timeout: 1000
        extra:
          certified_by: The Data Team
          certification_details: Sales data for vehicles. 

Setting the meta.superset.extra value to null can also be used to prevent any certification info from being synced.

This control is also applicable to columns and metrics. In below example, the metric warning_text and also d3_format is specified:

version: 2

metrics:
  - name: monthly_revenue
    label: Monhtly Revenue
    model: ref('dbt_vehicle_sales')
    description: Monthly revenue. 
    meta:
      superset:
        warning_text: Compatible only with monthly time grains.
        d3format: '%b %Y'
    calculation_method: sum
    expression: price_each
    timestamp: date
    time_grains: [month]

Was this article helpful?

What's Next