Skip to main content

BigQuery

The BigQuery integration plugin for Soundcheck supports the collection of the following facts:

  • Query - contains information returned as a result of execution of Google SQL query.

Prerequisites

Configure BigQuery authentication in Backstage

The authentication is configured at the root level of app-config.yaml. Here's an example configuration for BigQuery:

soundcheck:
collectors:
bigquery:
auth:
- projectId: <GCP Project ID> # Google Cloud Project ID
keyFilename: <Key File Path> # Path to a .json, .pem, or .p12 key file.

Below are the details for each field.

auth [optional]

An array of Google Cloud authentication config. The authentication configuration is optional, alternatively you can set up Application Default Credentials by following the Google documentation.

projectId [required]

Google Cloud Project ID.

keyFilename [optional]

Path to a .json, .pem, or .p12 key file to use for authentication.

Add the BigQueryCollector to Soundcheck

First, add the @spotify/backstage-plugin-soundcheck-backend-module-bigquery package:

yarn workspace backend add @spotify/backstage-plugin-soundcheck-backend-module-bigquery

Then add the following to your packages/backend/src/index.ts file:

packages/backend/src/index.ts
const backend = createBackend();

backend.add(import('@spotify/backstage-plugin-soundcheck-backend'));
backend.add(
import('@spotify/backstage-plugin-soundcheck-backend-module-bigquery'),
);
// ...

backend.start();

Consult the Soundcheck Backend documentation for additional details on setting up the Soundcheck backend.

Legacy Backend

warning

If you are still using the Legacy Backend you can follow these instructions but we highly recommend migrating to the New Backend System.

First add the package: yarn workspace backend add @spotify/backstage-plugin-soundcheck-backend-module-bigquery

Then in packages/backend/src/plugins/soundcheck.ts, add the BigQueryFactCollector:

packages/backend/src/plugins/soundcheck.ts
import { SoundcheckBuilder } from '@spotify/backstage-plugin-soundcheck-backend';
import { Router } from 'express';
import { PluginEnvironment } from '../types';
import { BigQueryFactCollector } from '@spotify/backstage-plugin-soundcheck-backend-module-bigquery';

export default async function createPlugin(
env: PluginEnvironment,
): Promise<Router> {
return SoundcheckBuilder.create({ ...env })
.addFactCollectors(
BigQueryFactCollector.create(env.logger, env.cache),
)
.build();
}

Plugin Configuration

The collection of BigQuery facts is driven by configuration. To learn more about the configuration, consult the Defining BigQuery Fact Collectors section.

BigQuery Fact Collector can be configured via YAML or No-Code UI. If you configure it via both YAML and No-Code UI, the configurations will be merged. It's preferable to choose a single source for the Fact Collectors configuration (either No-Code UI or YAML) to avoid confusing merge results.

No-Code UI Configuration Option

  1. Make sure the prerequisite Configure BigQuery authentication in Backstage is completed and BigQuery authentication is configured.

  2. To enable the BigQuery Integration, go to Soundcheck > Integrations > BigQuery and click the Configure button. To learn more about the No-Code UI config, see the Configuring a fact collector (integration) via the no-code UI.

BigQuery Integration

YAML Configuration Option

  1. Create a bigquery-facts-collectors.yaml file in the root of your Backstage repository and fill in all your BigQuery Fact Collectors. A simple example BigQuery fact collector is listed below.
- factName: 'check_results'
type: query
projectId: test-certified
query: |
SELECT component_id, check_id, result
FROM `test-certified.checks.checks_*`
WHERE _TABLE_SUFFIX = (SELECT MAX(_TABLE_SUFFIX) FROM `test-certified.checks.checks_*`)
AND component_id IN UNNEST(@entity_names)
schema:
- name: component_id
type: string
isEntityName: true
- name: check_id
type: string
- name: result
type: number
filter:
kind: 'Component'
spec.type: 'service'
cache: false

Note: this file will be loaded at runtime along with the rest of your Backstage configuration files, so make sure it's available in deployed environments in the same way as your app-config.yaml files.

  1. Add a Soundcheck collects field to the app-config.yaml and reference the newly created bigquery-facts-collectors.yaml file.
# app-config.yaml
soundcheck:
collectors:
bigquery:
auth:
- projectId: <GCP Project ID> # Google Cloud Project ID
keyFilename: <Key File Path> # Path to a .json, .pem, or .p12 key file.
collects:
$include: ./bigquery-facts-collectors.yaml

Defining BigQuery Fact Collectors

This section describes the data shape and semantics of BigQuery Fact Collectors.

Shape Of A BigQuery Fact Collector

The following is an example of a BigQuery Fact Collector YAML configuration:

soundcheck:
collectors:
bigquery:
collects:
- factName: 'check_results'
type: query
projectId: test-certified
query: |
SELECT component_id, check_id, result
FROM `test-certified.checks.checks_*`
WHERE _TABLE_SUFFIX = (SELECT MAX(_TABLE_SUFFIX) FROM `test-certified.checks.checks_*`)
AND component_id IN UNNEST(@entity_names)
schema:
- name: component_id
type: string
isEntityName: true
- name: check_id
type: string
- name: result
type: number
filter:
kind: 'Component'
spec.type: 'service'
cache: false

Below are the details for each field.

collects [required]

An array describing which facts to collect.

factName [required]

The name of the fact to be collected.

  • Minimum length of 1
  • Maximum length of 100
  • Alphanumeric with single separator instances of periods, dashes, underscores, or forward slashes

type [required]

The type of the collector: query.

projectId [optional]

Google Cloud Project ID. If provided and matches one of the project IDs from the auth section, the corresponding config will be used for the authentication.

query [required]

SQL Query (GoogleSQL). The query can be parameterized, supported named parameters are @entity_refs and @entity_names. The values will be resolved automatically when the fact collection is scheduled.

  • @entity_refs - the references of all entities in a batch (values in the format kind:namespace/name). Value example: ['component:default/queue-proxy', 'component:default/petstore']. Usage example: WHERE column IN UNNEST(@entity_refs).

  • @entity_names - the names of all entities in a batch (entity's metadata.name value). Value example: ['queue-proxy', 'petstore']. Usage example: WHERE column IN UNNEST(@entity_names).

Note: By default fact collection is scheduled for one entity at a time. Consider updating batchSize to schedule the query for multiple entities at once and prevent executing a new query for every entity the collector is applicable to. The query response can be further filtered and sliced as smaller entity facts by configuring the schema field as described below.

By using these named parameters you can tailor the response to be relevant only to the entities the collector is applicable to. If neither @entity_refs nor @entity_names parameter is used within the query, the same data will be collected for each entity batch (this behaviour may be desired if you don't want to run a heavy query multiple times for different sets of entities). The same queries are executed only once during every scheduled run (the collector prevents executing the same query more than once even across different entity batches by caching the query response until the next scheduled run).

schema [optional]

An array describing the columns returned by the query. Optional, if not provided the available fact paths won't be auto-populated in Check Creation No-Code UI and you won't be able to group the query response by an entity if batchSize is configured.

  • name [required] - the name of the column.

  • type [optional] - the column's data type (optional, default value is string).

  • isEntityRef [optional] - set to true if the column contains entity references (values in the format kind:namespace/name). Such column will be treated as a foreign key that references Entity(entity_ref), so that the query response will be grouped by the values in this column and the resulting entity fact will only contain the group that corresponds to the entity.

  • isEntityName [optional] - set to true if the column contains entity names (entity's metadata.name value). Such column will be treated as a foreign key that references Entity(metadata.name), so that the query response will be grouped by the values in this column and the resulting entity fact will only contain the group that corresponds to the entity.

frequency [optional]

The frequency at which the fact collection should be executed. Possible values are either a cron expression { cron: ... } or HumanDuration. If provided, it overrides the default frequency provided at the top level. If not provided, it defaults to the frequency provided at the top level. If neither collector's frequency nor default frequency is provided, the fact will only be collected on demand. Example:

frequency:
minutes: 10

initialDelay [optional]

The amount of time that should pass before the first invocation happens. Possible values are either a cron expression { cron: ... } or HumanDuration.

Example:

initialDelay:
seconds: 30

batchSize [optional]

The number of entities to collect facts for at once. Optional, the default value is 1.

Example:

batchSize: 100

filter [optional]

A filter specifying which entities to collect the specified facts for. Matches the filter format used by the Catalog API. If provided, it overrides the default filter provided at the top level. If not provided, it defaults to the filter provided at the top level. If neither collector's filter nor default filter is provided, the fact will be collected for all entities.

exclude [optional]

Entities matching this filter will be skipped during the fact collection process. Can be used in combination with filter. Matches the filter format used by the Catalog API.

filter:
- kind: component
exclude:
- spec.type: documentation

cache [optional]

If the collected facts should be cached, and if so for how long. Possible values are either true or false or a nested { duration: HumanDuration } field. If provided it, overrides the default cache config provided at the top level. If not provided, it defaults to the cache config provided at the top level. If neither collector's cache nor default cache config is provided, the fact will not be cached. Example:

cache:
duration:
hours: 24

Rate Limiting (Optional)

This fact collector can be rate limited in Soundcheck using the following configuration:

soundcheck:
job:
workers:
bigquery:
limiter:
max: 400
duration: 60000

BigQuery has quotas and limits in place. We recommend setting your rate limit to something that will avoid exceeding your quota.

This fact collector handles rateLimitExceeded and quotaExceeded error codes from BigQuery. Soundcheck will automatically wait and retry requests that are rate limited.

Shape of A BigQuery Fact

The shape of a BigQuery Fact is based on the Fact Schema.

The following is an example of the collected BigQuery fact:

factRef: bigquery:default/check_results
entityRef: component:default/queue-proxy
scope: default
timestamp: 2025-01-09T15:50+00Z
data:
rows:
- component_id: queue-proxy
check_id: has-production-lifecycle
result: passed
- component_id: queue-proxy
check_id: coverage-collected
result: failed
- component_id: queue-proxy
check_id: tests-passed
result: passed
- component_id: queue-proxy
check_id: static-analysis-no-warnings
result: passed

The resulting fact data will always have rows as a top level field and collected rows as a nested array.

Shape of A BigQuery Fact Check

The shape of a BigQuery Fact Check matches the Shape of a Fact Check.

The following is an example of the BigQuery fact check:

soundcheck:
checks:
- id: static-analysis-no-warnings-bq
name: Static Analysis Has No Warnings
ownerEntityRef: group:default/toast-infra
description: Static Analysis Has No Warnings
filter:
kind: Component
spec.type: service
rule:
factRef: bigquery:default/check_results
path: $.rows[?(@.check_id=='static-analysis-no-warnings')].result
operator: all:equal
value: passed
passedMessage: Success! Static Analysis Has No Warnings
failedMessage: Failure! Static Analysis Has Warnings