Sankey Diagram
  • 13 Oct 2023
  • 1 Minute to read
  • Dark
    Light

Sankey Diagram

  • Dark
    Light

Article summary

Overview

A Sankey diagram is a popular flow diagram that conveys the relative size of metric data based on the size of flow lines from a source to a target. Here is an example:

Reference Content

The following articles may be useful resources as you build your chart:

Creating a Sankey Diagram

Similar to a tree chart, a Sankey diagram uses nodes (represented visually as rectangles), edges (represented visually as flows), and hierarchy (left to right). Unlike a tree chart, the emphasis is on the flow between nodes instead of the hierarchy itself. The hierarchy represents stages in a system flow.

Shape of your Data

Structuring your data in the right shape is crucial for building effective Sankey diagrams. Let's zoom into a specific example of how a row in the dataset maps to a node-edge-node pair in the diagram.

  • The source column specifies the originating node in the chart.
  • The target column specifies the receiving (or destination) node in the chart.
  • The count value is used to scale the thickness of the connecting flow rectangle. This value is also shown when a user hovers over the connecting flow rectangle.

To scale this approach up and build a full Sankey diagram, Preset uses your data to calculate all of the source-target pairs and then orders them from left to right when generating the visualization. Which nodes go in the left-most column (or first layer of nodes) then? The unique values in the source column that aren't in the target column (or receive flow) end up in the first layer.

Here's the full example:

Now we'll walkthrough the options we selected in Chart Builder and how the data was transformed:

Underlying Data
  • Our data in this example is pre-aggregated in the count column.
  • The source and target columns map to the nodes we want visualized.
Chart Builder Options
  • Source / Target: specify the column you want for the source nodes first and the column you want for the target nodes second.
  • Metric: specify the aggregate and column you want used for the thickness of the flow between the nodes

Here's the query that was generated:

SELECT source AS source,
                 target AS target,
                 sum(count) AS "SUM(count)"
FROM main."Example: Sankey"
GROUP BY source,
         target

Was this article helpful?

What's Next