Integration: Databricks
The Databricks integration enables you to ingest dataset metadata from your Databricks Unity Catalog into the Backstage Software Catalog. Databricks' legacy Hive metastore is not supported.
Configuration & Authentication
- Create a Service Principal in Databricks. This can be at either the account or workspace level.
- This service principal must be granted the
USE CATALOG
andUSE SCHEMA
privileges on each catalog and schema you wish to ingest datasets from. See the Databricks API documentation for more details. - Generate OAuth credentials for the service principal, and take note of the client ID and secret. Be mindful of the lifetime set for these credentials and remember to rotate the secrets before they expire to prevent ingestion failures.
- Back in Portal, visit the Databricks configuration page in Config Manager.
- For each workspace you'd like to ingest datasets from, add a new source with the workspace URL and service principal credentials. The integration will discover all catalogs, schemas, and tables which the service principal has access to.
Naming
The naming structure for Datasets created from Databricks is as follows: [metastore_id].[catalog].[schema].[table_name]
. The metastore ID is used to ensure the uniqueness of names in the Software Catalog, but typically you will not need to be aware of it. Learn more about the metastore ID in the Databricks documentation.
Ingestable Assets
The Databricks integration currently ingests the following asset types:
- Tables
- Views
- Materialized Views
Labels and Tags
Labels and tags are not currently ingested. This feature is planned for a future release.
Troubleshooting
If you are experiencing difficulties ingesting Databricks resources, verify your service principal has the necessary privileges to access the following APIs: