Managing content across Workspaces
  • 03 Nov 2023
  • 6 Minutes to read
  • Dark
    Light

Managing content across Workspaces

  • Dark
    Light

Article Summary

👤 This documentation is intended for Workspace Admins. Check with your Team Admin for additional access.

Overview

Customers in the Professional and Enterprise plans have access to multiple Workspaces, which is extremely helpful for a variety of scenarios. Since a Workspace is a completed isolated Superset installation, a common use-case consists of implementing multiple environments:

  • Development: This Workspace is used for content creation. Permissions are more permissive and users have more freedom to modify the assets.
  • Staging: In this Workspace changes are tested and validated, to make sure they are meeting requirements. Permissions are more restrictive and usually most users would have view-only access.
  • Production: After creation and validation, content is migrated to the Production Workspace, so that it can be consumed by the desired audience. Modifications shouldn't happen in this Workspace.

To achieve a successful implementation, it's crucial to understand how to migrate these assets across Workspaces, to avoid duplications and ensure consistency.

Overwriting content via import

There are three different ways to import assets to a Preset Workspace:

  • Using the UI
  • Using the API
  • Using the CLI (which uses the API behind the scenes)

When you import content via the UI, only the actual asset being imported gets overwritten. For example, if you are importing a dashboard, modifications made to the dashboard would be properly applied. However, modifications made to the charts, datasets and DB connections used by the dashboard would not be applied. This is intentional, to prevent a dashboard import from affecting dependencies that could be in use by other dashboards. Similarly, importing a chart wouldn't sync changes to the dataset, etc.

When using the API, it's possible to overwrite all assets being imported when using the Import Assets API endpoint. Note that this endpoint expects an Assets ZIP file, so if you would like to import a dashboard/chart/dataset export through this endpoint, you would have to:

  1. Extract the ZIP file.
  2. Modify the metadata.yaml file, setting the type value to assets. This is an example metadata.yaml file from an Assets ZIP file:
version: 1.0.0
type: assets
timestamp: '2023-11-03T18:14:40.992958+00:00'

Lastly, the CLI provides a seamless operation, allowing you to export all assets (or speciifc ones), and importing them would overwrite all assets (the CLI uses the Assets API endpoint).

How to avoid duplications

For internal operations in the Workspace, assets are identified via their ID -- every asset has a unique ID for its type (Dashboard ID, Chart ID, Dataset ID, DB Connection ID). The ID is an integer that auto increments as you create content.

For cross-Workspaces operations, assets are identified via their UUID. The UUID is a unique hash generated during the asset creation.

The same asset will have different IDs for each Workspace, however it should have the same UUID across all Workspaces (considering it was created in one Workspace, and imported to the others). During an import operation, the asset UUID is validated to confirm if it's already in use in the destination Workspace:

  • If so, the asset is updated (if the import method allows the overwrite);
  • If not, a new asset is created.

Matching mismatching UUIDs

Considering our example above, if all content in the Sandbox Workspace was created via a migration from Development, and consequently all content in the Production Workspace was created via migration from Sandbox, then you can ensure your assets have the same UUID across all Workspaces, and can easily scale up your migrations without causing duplications.

On the other hand, if content was manually re-created in the other Workspaces, migrating an asset from one Workspace would actually create a duplication in the other. To avoid this scenario, there are two basic approaches to handle a UUID mismatch:

Manually updating ZIP files

Let's review how to manually update UUID references in an export file. Once you extract the ZIP file, your assets are organized in folders based on their type (dashboards, charts, datasets and databases). The asset configuration is exported to YAML files that can be easily modified with most text editor tools.

  • Database YAML file:
    • If your DB connection has a different UUID in the destination Workspace, update the uuid field in the YAML file.
    • Also update the database_uuid field in all dataset YAML files powered by this connection.
  • Dataset YAML file:
    • If the dataset has a differnet UUID in the destination Workspace, update the uuid field in the YAML file.
    • Also update the dataset_uuid field in all charts powered by this dataset.
    • Also update the datasetUuid field in the dashboard YAML file for all filters using this dataset.
  • Chart YAML file:
    • Update the uuid field in the chart YAML file.
    • Update the uuid field in the dashboard YAML file, in the position configuration.
  • Dashboard YAML file:
    • Update the uuid fied in the dashboard YAML file.

Dynamically matching UUIDs via the CLI

If you are using the CLI to migrate content, it's possible to use Jinja templating and even Python functions in your YAML files, so that the desired UUID is used during the import operation.

Jinja templating

Consider below DB connection YAML:

database_name: examples
sqlalchemy_uri: examples://
cache_timeout: null
expose_in_sqllab: true
allow_run_async: false
allow_ctas: false
allow_cvas: false
allow_dml: false
allow_csv_upload: false
extra:
  metadata_params: {}
  engine_params: {}
  metadata_cache_timeout: {}
  schemas_allowed_for_csv_upload: []
uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
version: 1.0.0

Jinja templating can be used to dynamically modify the uuid value for this connection. A default variable available is instance, which gets the value of the target Workspace URL, so a potential implementation is:

database_name: examples
sqlalchemy_uri: examples://
cache_timeout: null
expose_in_sqllab: true
allow_run_async: false
allow_ctas: false
allow_cvas: false
allow_dml: false
allow_csv_upload: false
extra:
  metadata_params: {}
  engine_params: {}
  metadata_cache_timeout: {}
  schemas_allowed_for_csv_upload: []
# UUID for Production Workspace
{% if instance.host == '12345678.region.app.preset.io' %}
uuid: a2dc77af-e654-49bb-b321-40f6b559a1ee
# UUID for the non-Prod Workspaces
{% else %}
uuid: ac9a0e2b-73de-4208-b39a-d4db63f63e9b
{% endif %}
version: 1.0.0

Note that the UUID is just an example, but you can apply this method to dynamically modify any information in the YAML files. You can also pass your own parameters using the --option flag in your CLI command.

Python functions

You can also implement more complex logic and validations using Python functions. To do so:

  1. Create a functions folder in the in the same level as your asset folders.
  2. Create a python file with the desired functions (for example, my_functions.py).
  3. In the YAML file, call the function using $folder_name.$file_name.$function_name.

For example:

...
allow_dml: false
allow_csv_upload: false
extra:
  metadata_params: {}
  engine_params: {}
  metadata_cache_timeout: {}
  schemas_allowed_for_csv_upload: []
uuid: {{ functions.my_funcions.get_db_connection_uuid() }}
...

Next Steps

These YAML customizations are discarded from the file during the import operation -- the CLI renders the Jinja templates and uses the rendered version. As a consequence, when migrating a new modification, the new export won't include these changes.

To avoid having to manually apply these changes every time you export a new modification, it's possible to create an override file, that can be used to replace the information for specific fields. This way, you define all your customizations in the .overrides file, so that you don't have to re-create them on every export. Let's take a look in a practical example:

Consider I want to make the DB connection display name dynamic, so that it's called Examples - Staging and Examples - Production acording to the Workspace I'm migrating the content to.

  1. Create the function get_database_name using below code:
def get_database_name(env):
    """
    Returns the DB connection display name according to the target Workspace.
    """
    if env.lower() == 'prod' or env.lower() == 'production':
        return 'Examples - Production'
    else:
        return 'Examples - Staging'
  1. Create an examples.overrides.yaml file in the databases folder, with below content:
database_name: {{ functions.my_functions.get_database_name(env) }}
  1. Use the Preset CLI, passing --option env=Prod as an argument.

You only need to include in your .overrides file the fields that you would like to replace.


Was this article helpful?