Databricks
  • 24 Feb 2023
  • 1 Minute to read
  • Dark
    Light

Databricks

  • Dark
    Light

Article summary

Overview

This article explains how to connect a Databricks database to Preset.


Allowlist Preset IPs

Preset Cloud runs on four regions. For Preset to access your data, first thing you need to do is to add region based Preset IP addresses to your Inbound and Outbound firewall rules.

us-west-2 (us1a)us-east-1 (us2a)eu-north-1 (eu5a)ap-northeast-1 (ap1a)
35.161.45.1144.193.153.19613.48.95.335.74.159.67
54.244.23.8552.70.123.5213.51.212.16535.75.171.157
52.32.136.3454.83.88.9316.170.49.2452.193.196.211

If you are not sure where your Preset workspace is located, you can refer to the URL on your browser when accessing Preset. It should look like this: https://xxxxxxxx.us2a.app.preset.io/superset..., where us2a means it is in us-east-1.


Retrieve Databricks Information

Step 1: Get Databricks Token

In Databricks, navigate to User SettingsAccess TokensGenerate New Token.

Databricks_Get_Token

Step 2: Get Host and URL

Navigate to Clusters and select your cluster.

Databricks_Get_Host_and_URL1

...then copy and save:

  • Server Hostname
  • Port
  • HTTP Path

Databricks_Cluster_Info.png


Connect with the Databricks Connector

Let's start by selecting + Database — have a look at Connecting your Data if you need help wth this step.

Select Databricks.

Database_form.png

Fill the form with information from the Databricks cluster:

  • Host with the Server Hostname
  • Port with the Port
  • Database name with the name of the database to be connected
  • Access Token with the generated token
  • HTTP Path with the HTTP Path
  • Display Name with the name to be used on Preset for this connection

Databricks_dynamic_form.png

Click on Connect to create the connection.

Configure default Catalog and Schema

The connection would use the default Catalog and Schema specified on your Databricks settings. However, you can set a different Catalog or Schema to be used as default on Preset on the Advanced Settings:

  1. Navigate to Settings > Database Connections to access the list of existing connections on your Workspace.
  2. Hover the mouse over your Databricks connection and click on the pencil icon under the Actions column to modify it.
  3. Navigate to the ADVANCED tab, and expand the Other section.
  4. You should find a JSON on the ENGINE PARAMETERS configuration similar to this one:
{
    "connect_args":
    {
        "http_path": "{{Your HTTP Path}}"
    }
}
  1. Specify the default Catalog and Schema that should be used on the connection_args:
{
    "connect_args":
    {
        "http_path": "{{Your HTTP Path}}",
        "catalog": "{{Catalog Name}}",
        "schema": "{{Schema Name}}"
    }
}
  1. Save changes.

You can also query data from different catalogs on the SQL lab, using below structure:

SELECT * FROM {{Catalog_Name}}.{{Schema_Name}}.{{Table_Name}}

Was this article helpful?