Soundcheck SCM Integration

Spotify Plugins for Backstage: Soundcheck - SCM Integration

Source Control Management (SCM) integration for Soundcheck.

The soundcheck-backend-module-scm plugin allows soundcheck to integrate with the following source control management providers:

  • azure
  • bitbucketCloud
  • bitbucketServer
  • gerrit
  • gitea
  • github
  • gitlab


SCM Integrations - Connecting the SCM module to SCM providers

To connect to external providers, an 'integration' must be provided in the main app-config.yaml file as follows:

- host:
token: ${GITHUB_TOKEN}

The above provides a github integration, with the host set as Authentication is provided via a token issued from for the repository to which you'd like to connect.

Full integration configuration details can be found here.

Add the ScmFactCollector to Soundcheck

First add the package: yarn workspace backend add @spotify/backstage-plugin-soundcheck-backend-module-scm

Then in packages/backend/src/plugins/soundcheck.ts, add the ScmHubFactCollector:

  import { SoundcheckBuilder } from '@spotify/backstage-plugin-soundcheck-backend';
import { Router } from 'express';
import { PluginEnvironment } from '../types';
+ import { ScmFactCollector } from '@spotify/backstage-plugin-soundcheck-backend-module-scm';

export default async function createPlugin(
env: PluginEnvironment,
): Promise<Router> {
return SoundcheckBuilder.create({ ...env })
+ .addFactCollectors(
+ ScmFactCollector.create(env.config, env.logger),
+ )

New Backend System

If you are using the New Backend System, instead of the above, you can just add the following to your packages/backend/src/index.ts file:

const backend = createBackend();

+ backend.add(import('@spotify/backstage-plugin-soundcheck-backend-module-scm'));
// ...


See the the soundcheck-backend documentation for additional details on creating the Soundcheck backend.

Adding SCM Entities

To use SCM integrations, an entity hosted by an SCM provider is needed. As an example, an entity could be added to the catalog with a type of 'url' and a target that corresponds to the entity's hosted location, like so:

# Soundcheck external demo
- type: url # Denotes SCM entities.

The configuration above adds a component hosted by and configured by the target yaml file.

Configuring the SCM Module

The facts to be collected by the SCM module must be defined in one or more yaml files, and then referenced in the soundcheck configuration in the app-config.yaml file like so:

- $include: ./scm-fact-extraction-configurations.yaml
- $include: ./more-scm-fact-extraction-configurations.yaml
- $include: ./even-more-scm-fact-extraction-configurations.yaml

With an SCM entity in your catalog, an SCM integration in the main app-config.yaml, and SCM configuration files added to the soundcheck.collectors.scm field also in the main app-config.yaml, your Backstage instance is almost ready to extract facts from SCM providers.

The next section covers how to set up the fact extraction configuration files to extract facts from SCM.

Rate Limiting (Optional)

This fact collector can be rate limited in Soundcheck using the following configuration:

max: 4900
duration: 3600000

The rate limits for SCM are dependant on the source control API used. For example, Github has a rate limit of 5000 per hour; the above configuration is set for 4900 executions per hour to account for this limit.

This fact collector handles 429 rate limit errors from SCM. Soundcheck will automatically wait and retry requests that are rate limited.

SCM Fact Extraction Configuration


SCM Fact Collection Configuration Files

The SCM fact collection configuration yaml files have the following structure:

cron: '0 * * * *' # Defines a schedule for when the facts defined in this file should be collected
# This is optional and if omitted, facts will only be collected on demand.
filter: # A filter specifying which entities to collect the specified facts for
kind: 'Component'
cache: # Defines if the collected facts should be cached, and if so for how long
hours: 2
collects: # An array of fact extractor configuration describing how to collect SCM facts.
- SCM Fact Extractor Configuration One
- SCM Fact Extractor Configuration Two
- ...
- SCM Fact Extractor Configuration N

Variables in this file are defined below:

frequency [Optional]

The frequency at which the collector should be executed. Possible values are either a cron expression { cron: ... } or HumanDuration. This is the default frequency for each extractor.

filter [Optional]

A filter specifying which entities to collect the specified facts for. Matches the filter format used by the Catalog API. This is the default filter for each extractor.

cache [Optional]

If the collected facts should be cached, and if so for how long. Possible values are either true or false or a nested { duration: HumanDuration } field. This is the default cache config for each extractor.


An array of SCM Fact Extractor configurations describing how to collect SCM facts. See the section below for details on configuring the extractors.

SCM Fact Extractors

The exists, regex and json/yaml fact extractor configurations are described in detail below, but all extractors share a common base, so we first cover that base schema before going into the detailed schemas of the individual fact collectors.

Common Fact Extractor Schema

All SCM fact extractors share a common base schema, the variables for which are defined below:


The name of the fact to be extracted.

  • Minimum length of 1
  • Maximum length of 100
  • Alphanumeric with single separator instances of periods, dashes, underscores, or forward slashes

filter [Optional]

A filter specifying which entities to collect the specified facts for. Matches the filter format used by the Catalog API. If provided it overrides the default filter provided at the top level. If not provided it defaults to the filter provided at the top level. If neither extractor's filter nor default filter is provided the fact will be collected for all entities.

cache [Optional]

If the collected facts should be cached, and if so for how long. Possible values are either true or false or a nested { duration: HumanDuration } field. If provided it overrides the default cache config provided at the top level. If not provided it defaults to the cache config provided at the top level. If neither extractor's cache nor default cache config is provided the fact will not be cached. Example:

hours: 24

frequency [optional]

The frequency at which the fact extraction should be executed. Possible values are either a cron expression { cron: ... } or HumanDuration. If provided it overrides the default frequency provided at the top level. If not provided it defaults to the frequency provided at the top level. If neither extractor's frequency nor default frequency is provided the fact will only be collected on demand. Example:

minutes: 10

branch [optional]

The branch to extract the fact from. If not provided, defaults to the repository's default branch.

Exists Fact Extractor

This extractor collects information on whether a given set of files exists in the SCM provider. The extensions to the base schema are as follows:


Must be exactly exists, like so:

type: exists


The data collected for this fact. This is an array consisting of two pairs of name and path:

  • name: An identifier for the data element.
  • path: The path to the file. Both name and path are subject to the naming restrictions of factName.

Sample Exists Configuration

Here is a sample yaml configuration for a fact that gets information on the existence of two files, '' and 'license.txt':

- factName:
readme_and_catalog_info_files_exist_fact # This gives this fact an identifier which is
# used to refer to the fact in other
# configuration files.
type: exists # This identifies the type of fact to collect.
data: # This defines the data element which will be returned in the
# fact object when the fact is collected.
- name: readme_exists # Label for the data element.
path: / # The file for which existence will be determined.
- name: catalog_info_exists # Label for the data element.
path: /catalog-info.yaml # The file for which existence will be determined.
filter: # A filter to narrow the applicability of this fact.
soundcheck-external-demo # This filter makes this fact applicable only to the
# component with the given name, in this case
# 'soundcheck-external-demo'

The checks that will compare the data collected by this fact to the expected outcomes is specified in the app-config.yaml file. Note that because the fact collects two data elements, there will be two checks to check the value of each data element; those checks would look like this:

- id: has_readme_check # The name of the check
rule: # How to evaluate this check
factRef: scm:default/readme_and_catalog_info_files_exist_fact # The fact data to reference
path: $.readme_exists # The path to the field to analyze
operator: equal # Indicates the operation to apply
value: true # The desired value of the field indicated in path, above.
- id: has_catalog_info_file_check
factRef: scm:default/readme_and_catalog_info_files_exist_fact
path: $.catalog_info_exists
operator: equal
value: true

Finally, these two checks need to be listed in a program level within soundcheck-programs.yaml:

- id: demo
name: Demo
ownerEntityRef: group:default/owning_group
description: >
Demonstration of Soundcheck Exists Fact Extractor
- ordinal: 1
name: First level
description: Checks leveraging Soundcheck's SCM Exists Fact Extractor
- id: has_catalog_info_file_check # The identifier for the check.
name: Has catalog-info.yaml # A human-readable name for this check
> # The description to display on the Soundcheck page for this check.
Repositories should contain a catalog-info.yaml file.
- id: has_readme_check
name: Has
description: |
Repositories should provide a file at root.

Regex Fact Extractor

The Regex Fact Extractor collects information about the contents of a file. Two modes are supported:

  • true/false
  • named capture groups

True/False Mode

The true/false usage uses a regex expression to search for a match in a specified file. If a regex match is found, the resulting fact data will contain true for a field named 'matches', else the 'matches' field will contain false.

Regex Capture Groups

Using named capture groups allows the extractor to associate capture groups within a regex to named values which allows checks to verify those captured values are correct.

Regex Fact Extractor Schema

The extension schema for Regex Fact Extractors is as follows:


Must be exactly regex, like so:

type: regex


The path to the file to analyze.


A valid regex string. This string is used on the file to collect data elements or provide a true/ false response corresponding to the regex finding a match in the file.

data [Optional]

Defines the data to collect for this fact. This is an array consisting of two pairs of name and type:

  • name: An identifier for the data element. Subject to the naming restrictions of factName.
  • type: The expected type of data to be collected.

Each pair defined in the data field must correspond to a capture group in the given regex, a mismatch between data element definition counts and regex capture groups is an error and the fact data will not be collected.

If the data element is not present, the mode of the Regex Fact Extractor defaults to true/false.

Sample Regex Configuration

The yaml below defines both modes of the Regex extractor, true/false and data collection. Sample fact definitions are as follows:

- factName: apache_license_fact # Name of the fact
type: regex # Type of the fact
path: / # Path to the file whose contents will be searched
regex: .*Apache License.*Version 2\.0.* # Regex to match.
# Note lack of any 'data:' object definition, this implies this regex is a true/false type.

- factName: api_version_fact
type: regex
path: /catalog-info.yaml
'^apiVersion:' # Note the capture group! Each capture group in a regex
# *must* correspond to a named data element, see below.
data: # Data describing each capture group
- name: captured_api_version # The name of the first capture group
type: string # The type of the first capture group.

With fact collection specified, we now must define checks against the data that will be collected for each fact. We define the checks in the app-config.yaml file. Sample checks that correspond to the regex facts above are as follows:

- id: uses_recommended_license_check # ID for this check
rule: # How to evaluate this check
factRef: scm:default/apache_license_fact # The fact data to reference
$.matches # The path to the field to analyze, note that this is always 'matches' for a
# true/false type regex.
operator: equal # Indicates the operation to apply
true # The desired value of path field, above. True here indicates
# that, indeed, we want to have found the 2.0 apache license version in the
# given file.

- id: api_version_check
factRef: scm:default/api_version_fact
$.captured_api_version # This path refers to the name given to the capture group in
# the fact definition.
operator: equal
v1alpha1 # This is the value we expect to have captured via the regex for this
# capture group. This can be any string.

Finally, the checks defined above must be present in the soundcheck-programs.yaml file under a level of a program. Such inclusion would look like this:

- id: demo
name: Demo
ownerEntityRef: group:default/owning_group
description: >
Demonstration of Soundcheck Regex Fact Extractor
- ordinal: 1
name: First level
description: Checks leveraging Soundcheck's SCM Regex Fact Extractor
- id: uses_recommended_license_check # Check ID to include
name: Uses Apache License 2.0 # Human-readable name for this check.
description: |
Use of the Apache License 2.0 is required.
- id: api_version_check
name: Has correct API version
description: >
"Ensures that the component is using the correct api version, which is

JSON/YAML Fact Extractor

The final fact extractor type supported by the SCM module is the json/yaml fact extractor, which works similar to the regex fact extractor in that it extracts json/yaml values from a file for use in checks.

JSON/YAML Fact Extractor Schema

The extensions to the base schema are as follows.


Must be one of json or yaml, like so:

type: json


The path to the file to analyze.


Defines the data to collect for this fact. This is an array of the following fields:

  • name: An identifier for the data element. Subject to the naming restrictions of factName.
  • type: The expected type of data to be collected, either array or a primitive type (string, int, etc.)
  • jsonPath: A period delimited path to the desired json/yaml element.
  • items: A optional field with a single subtype of type. If included, the data returned by the fact will be an array of all matching elements of the specified type. If 'items' is omitted, the returned value will be a single element.

Sample json/yaml Configuration

The yaml below defines both types of collection by the json/yaml extractor: single element and array capture. Sample fact definitions are as follows:

- factName: entity_metadata_fact # Name of the fact
type: json # Type of the fact
path: /catalog-info.yaml # Path to the file whose contents will be searched
data: # Data describing the file contents collected at each jsonPath
- name: tags # Name for this entry in the data element.
jsonPath: metadata.tags # Path from which to pull data from the file.
type: array # Type of element, array or primitive.
items: # For the array type, this items specification and the type of the items is required.
type: string
- name: pager_duty_integration_key
jsonPath: metadata.annotations.pagerduty_integration-key
type: string # For non-array captures, just the type of the data is required.

The above data specifications are the two types supported by this extractor, corresponding to arrays and strings, respectively. In the above, metadata.tags will be extracted into an array named 'tags' of type String, and metadata.annotations.pagerduty_integration-key will be extracted into a variable called pager_duty_integration_key of type string.

The checks for the fact data extracted based on the fact specification above could be as follows:

- id: entity_metadata_tags_check # ID of this check
rule: # How to evaluate this check
factRef: scm:default/entity_metadata_fact # The fact data to reference
path: $.tags # The path to the data in the collected fact's 'data' element
operator: notEqual # The operation to apply
value: undefined # The value to compare with the extracted value.
- id: entity_metadata_key_check
factRef: scm:default/entity_metadata_fact
path: $.pager_duty_integration_key
operator: equal
value: 12345

Above, we define two checks. The first ensures that the tags array is not undefined in the file, that is, that there are tags present. The second check ensures that the pager duty key is in the file and that it is equal to the given value.

Finally, we add these checks to the soundcheck-program.yaml file under an appropriate ordinal. As an example:

- id: demo
name: Demo
ownerEntityRef: group:default/owning_group
description: >
Demonstration of Soundcheck Regex Fact Extractor
- ordinal: 1
name: First level
description: Checks leveraging Soundcheck's SCM Regex Fact Extractor
- id: entity_metadata_tags_check
name: Entity Metadata Tags Check
description: Check that metadata tags are present.

- id: entity_metadata_key_check
name: Entity Metadata Key Check
description: Check that the pager duty key is correct.

Adding the above to the soundcheck-programs.yaml would make the checks defined in this section necessary for whichever program and level to which they were added to be considered passing.