Integration: Databricks

The Databricks integration enables you to ingest dataset metadata from your Databricks Unity Catalog into the Backstage Software Catalog. Databricks' legacy Hive metastore is not supported.

Configuration & Authentication

Create a Service Principal in Databricks. This can be at either the account or workspace level.
This service principal must be granted the USE CATALOG, USE SCHEMA, and SELECT privileges on each catalog, schema, and table (respectively) you wish to ingest datasets from. See the Databricks API documentation for more details.
For tags support: The service principal must also have access to query the system.information_schema.table_tags system table to retrieve tag information. This requires the USE SCHEMA privilege on the system.information_schema schema.
SQL Warehouse requirement: To retrieve table tags, you must configure a SQL Warehouse ID in your Databricks source configuration. The service principal must have CAN_USE permission on this warehouse.
Generate OAuth credentials for the service principal, and take note of the client ID and secret. Be mindful of the lifetime set for these credentials and remember to rotate the secrets before they expire to prevent ingestion failures.
Back in Portal, visit the Databricks configuration page in Config Manager.
For each workspace you'd like to ingest datasets from, add a new source with the workspace URL, service principal credentials, and optionally a warehouse ID for tags support. The integration will discover all catalogs, schemas, and tables which the service principal has access to.

Naming

The naming structure for Datasets created from Databricks is as follows: [metastore_id].[catalog].[schema].[table_name]. The metastore ID is used to ensure the uniqueness of names in the Software Catalog, but typically you will not need to be aware of it. Learn more about the metastore ID in the Databricks documentation.

Ingestible Assets

The Databricks integration currently ingests the following asset types:

Tables
Views
Materialized Views

Tags and Labels

The Databricks integration supports ingesting table tags from the Unity Catalog system.information_schema.table_tags system table. Tags are converted to labels in the Backstage Software Catalog.

Requirements for Tags Support

To enable tag ingestion, you must:

Configure a SQL Warehouse: Add a warehouseId to your Databricks source configuration in Config Manager. This warehouse is used to execute SQL queries against the system tables.
Grant warehouse permissions: The service principal must have CAN_USE permission on the specified SQL Warehouse.
Grant system schema access: The service principal must have USE SCHEMA privilege on the system.information_schema schema to query the table_tags system table.

How Tags Work

Table tags defined in Databricks Unity Catalog are retrieved using SQL queries against system.information_schema.table_tags
Each tag consists of a tag_name and optional tag_value
Tags are converted to labels in Backstage with the tag name as the key and tag value as the value
If a tag has no value, the label value will be an empty string

If warehouseId is not configured, the integration will still work but will skip tag ingestion and log a warning message.

Troubleshooting

If you are experiencing difficulties ingesting Databricks resources, verify your service principal has the necessary privileges to access the following APIs:

Configuration & Authentication​

Naming​

Ingestible Assets​

Tags and Labels​

Requirements for Tags Support​

How Tags Work​

Troubleshooting​