Integration: BigQuery
Configuration & Authentication
Within the bigquery module config, there is a sources block, where you'll list each GCP project you'd like ingested into the registry. For each project included, every BigQuery Table & View within that project will be ingested as a dataset within the registry.
Alongside each project, you'll include credentials as a JSON object. The account associated with the credentials needs the roles/bigquery.dataViewer
role, and the bigquery.jobs.create
permission. Documentation on how to generate credentials for your GCP project can be found here.
Naming
The naming structure for Datasets created from BigQuery is as follows: [project].[dataset].[table_name/view_name]
.
Tags & Labels
BigQuery Tags and Labels are both pieces of metadata that can be added to various resources within GCP. Since these are both key-value pairs in the source system, the BigQuery connector for Registry will convert both tags and labels into labels (which are also a key-value pair mapping) within the Software Catalog. Tags/Labels which cannot be converted to Catalog labels (because they're too long, contain invalid UTF-8 characters, etc) will be omitted from the Dataset, and emit a WARN log, similar to this:
catalog warn Policy check failed for api:default/gcp-project.gcp-dataset.gcp-table-name; caused by Error: "metadata.name" is not valid; expected a string that is sequences of [a-zA-Z0-9] separated by any of [-_.], at most 63 characters in total but found "default/gcp-project.gcp-dataset.gcp-table-name". To learn more about catalog file format, visit: https://github.com/backstage/backstage/blob/master/docs/architecture-decisions/adr002-default-catalog-file-format.md entity="api:default/default/gcp-project.gcp-dataset.gcp-table-name"
BigQuery tag keys often have a prefix, the organization or project they're scoped to, which is displayed before a '/'. The connector implementation changes that slash to a '.' since slashes are disallowed in Software Catalog label keys (slashes are used to mark domains). In Google's BigQuery UX, tags inherited from a parent Dataset don't appear on a table/view. For that reason, this module doesn't ingest Dataset-level tags.
The connector will parse the BigQuery labels looking for ones where the key value is owner or lifecycle in order to populate those values onto the dataset. If either is not found, it will use the value populated within the entityDefaults section of the config.