Integration: Snowflake
Configuration & Authentication
To configure this module, list all the Snowflake accounts you want to collect metadata from, using their format 2 account identifiers. For each account included, every Table within that account will be ingested as a dataset within the registry.
Within each account block, the relevant authentication type and credentials must also be provided. Currently, of Snowflake's available authentication options, the key pair set-up is supported. Be sure to follow the instructions to assign the public key to a Snowflake user. Set the authenticator key to 'SNOWFLAKE_JWT' and add the relevant details representing the account Backstage will connect with.

Roles
Users in Snowflake can have multiple roles, but the default role for the user will be the one used during all queries to Snowflake. This can be overridden using the optional role key to the config underneath the relevant account. Note that the role selection will affect which tables are ingested. Tables with a higher level role in the role hierarchy than the one used in the connection won't appear.

Using the example config above, the role is specifically set to PUBLIC. This would mean any tables visible only to ACCOUNTADMINs
would not be ingested into the Registry.
Naming
The naming structure for Datasets created from Snowflake is as follows: [database].[schema].[table_name].
Tags & Labels
Snowflake Tags are pieces of metadata that can be added to
various resources. Since these are key-value pairs in the source system, the Snowflake connector for Registry will convert them to
labels (which are a key-value pair mapping)
within the Software Catalog. Tags that can't be converted into Catalog labels (because they're too long, contain invalid UTF-8 characters,
etc) will be omitted from the Dataset and emit a WARN log.
The snowflake connector will parse the tags looking for ones where the key value is OWNER or LIFECYCLE in order to populate those
values onto the dataset. If an OWNER key is not found within the tags, it will use the ownership metadata
(which is usually the principal role on the table).
Finally, it will use the values populated within the entityDefaults section of the config as a fallback option for both owner and lifecycle.
Incremental Ingestion
Each sync fetches tables that have changed since the last successful sync using Snowflake's LAST_DDL timestamp on each table. A full sync runs periodically to catch any missed changes and clean up deleted datasets. The default interval is 7 days, and can be configured in dataExperience -> registry -> incrementalSync -> fullSyncIntervalDays.
How it works
- Incremental syncs query
INFORMATION_SCHEMA.TABLESfiltered to tables whereLAST_DDLis newer than the last sync time. Only changed tables have their columns and tags re-fetched. - Full syncs still run on the first sync, when the full sync interval has elapsed, or if an incremental sync fails.
- Dataset cleanup (removing datasets that no longer exist in Snowflake) only happens during full syncs, since incremental syncs only see a subset of tables.
Incremental ingestion works well when paired with a more frequent sync schedule (e.g. every 30 minutes) since each run is lightweight. Full syncs will still occur at the configured interval regardless of schedule frequency.
Warehouses
Snowflake Warehouses are required for certain queries to be run. While the basic ingestion of tables doesn't rely on any such queries, ingesting the tags on each table does. For this step, the default warehouse of the user associated with the account in the config will be used. To override this selection, add the optional warehouse key-value pair to the relevant account. Reference the screen shot above in roles to see where to configure this.
Since warehouses are also scoped to roles, it's important to ensure the warehouse used in the connection (be it the user's default or defined via the config) can be accessed by the role used in the connection. Also note that the warehouse must be running (or have auto-resume enabled) in order for tag ingestion to occur.