Integration: Databricks
The Databricks integration enables you to ingest dataset metadata from your Databricks Unity Catalog into the Backstage Software Catalog. Databricks' legacy Hive metastore is not supported.
Configuration & Authentication
- Create a Service Principal in Databricks. This can be at either the account or workspace level.
 - This service principal must be granted the 
USE CATALOG,USE SCHEMA, andSELECTprivileges on each catalog, schema, and table (respectively) you wish to ingest datasets from. See the Databricks API documentation for more details. - For tags support: The service principal must also have access to query the 
system.information_schema.table_tagssystem table to retrieve tag information. This requires theUSE SCHEMAprivilege on thesystem.information_schemaschema. - SQL Warehouse requirement: To retrieve table tags, you must configure a SQL Warehouse ID in your Databricks source configuration. The service principal must have 
CAN_USEpermission on this warehouse. - Generate OAuth credentials for the service principal, and take note of the client ID and secret. Be mindful of the lifetime set for these credentials and remember to rotate the secrets before they expire to prevent ingestion failures.
 - Back in Portal, visit the Databricks configuration page in Config Manager.
 - For each workspace you'd like to ingest datasets from, add a new source with the workspace URL, service principal credentials, and optionally a warehouse ID for tags support. The integration will discover all catalogs, schemas, and tables which the service principal has access to.
 
Naming
The naming structure for Datasets created from Databricks is as follows: [metastore_id].[catalog].[schema].[table_name]. The metastore ID is used to ensure the uniqueness of names in the Software Catalog, but typically you will not need to be aware of it. Learn more about the metastore ID in the Databricks documentation.
Ingestible Assets
The Databricks integration currently ingests the following asset types:
- Tables
 - Views
 - Materialized Views
 
Tags and Labels
The Databricks integration supports ingesting table tags from the Unity Catalog system.information_schema.table_tags system table. Tags are converted to labels in the Backstage Software Catalog.
Requirements for Tags Support
To enable tag ingestion, you must:
- 
Configure a SQL Warehouse: Add a
warehouseIdto your Databricks source configuration in Config Manager. This warehouse is used to execute SQL queries against the system tables. - 
Grant warehouse permissions: The service principal must have
CAN_USEpermission on the specified SQL Warehouse. - 
Grant system schema access: The service principal must have
USE SCHEMAprivilege on thesystem.information_schemaschema to query thetable_tagssystem table. 
How Tags Work
- Table tags defined in Databricks Unity Catalog are retrieved using SQL queries against 
system.information_schema.table_tags - Each tag consists of a 
tag_nameand optionaltag_value - Tags are converted to labels in Backstage with the tag name as the key and tag value as the value
 - If a tag has no value, the label value will be an empty string
 
If warehouseId is not configured, the integration will still work but will skip tag ingestion and log a warning message.
Troubleshooting
If you are experiencing difficulties ingesting Databricks resources, verify your service principal has the necessary privileges to access the following APIs: