Source Control Management
The Source Control Management (SCM) integration plugin for Soundcheck enables integration with the following source control management providers:
- azure
- bitbucketCloud
- bitbucketServer
- gerrit
- gitea
- github
- gitlab
Prerequisites
SCM Integrations - Connecting the SCM module to SCM providers
To connect to external providers, an 'integration' must be provided in the main app-config.yaml
file
as follows:
integrations:
github:
- host: github.com
token: ${GITHUB_TOKEN}
The above example provides a GitHub integration, with the host
set to github.com
. Authentication is provided
via a token issued from github.com
for the repository that you'd like to connect to.
Consult the Backstage GitHub integration instructions for full configuration details.
Add the ScmFactCollector to Soundcheck
Source Control Management integration for Soundcheck is not installed by default. It must be manually installed and configured.
First, add the @spotify/backstage-plugin-soundcheck-backend-module-scm
package:
yarn workspace backend add @spotify/backstage-plugin-soundcheck-backend-module-scm
Then add the following to your packages/backend/src/index.ts
file:
const backend = createBackend();
backend.add(import('@spotify/backstage-plugin-soundcheck-backend'));
backend.add(import('@spotify/backstage-plugin-soundcheck-backend-module-scm'));
// ...
backend.start();
Consult the Soundcheck Backend documentation for additional details on setting up the Soundcheck backend.
Adding SCM Entities
To use Source Control Management (SCM) integrations, an entity hosted by an SCM provider is needed.
As an example, an entity could be added to the catalog with a type
set to url
and a target
set to the entity's hosted location, like so:
catalog:
locations:
# Soundcheck external demo
- type: url # Denotes SCM entities.
target: https://github.com/your_repo/blob/main/all-components.yaml
The configuration above adds a component hosted by github.com
and configured by the target yaml
file.
Entity configuration
To be able to determine the location of a file the SCM integration will use the value from the backstage.io/source-location
annotation as its base. In many cases this will be set for you but if it is not you will need to add it to your catalog-info.yaml
file, here's a simple example:
metadata:
annotations:
backstage.io/source-location: url:https://github.com/my-org/my-service/
Configuring the SCM Module
SCM Fact Collector can be configured via YAML, URL or No-Code UI. If you configure it via both YAML and No-Code UI, the configurations will be merged. It's preferable to choose a single source for the Fact Collectors configuration (either No-Code UI, YAML or URL) to avoid confusing merge results.
SCM Fact Collector: Default Configuration
To add the default initial configuration of SCM Fact Collector on startup, the following flag must be set to true in the app-config.yaml
file:
soundcheck:
addStartingConfigurations:
collectors: true
This configuration is required to be able to collect the facts necessary for the pre-canned checks and tracks.
Note: The configuration will be stored in the database and will be configurable via No-Code UI. If you'd like to add this configuration via yaml, you should add the following config instead:
soundcheck:
collectors:
scm:
collects:
# Extracts if files exist at the given paths.
- factName: required_files_exist
type: exists
data:
- name: license
path: LICENSE.md
- name: code_of_conduct
path: CODE_OF_CONDUCT.md
- name: contributing
path: CONTRIBUTING.md
- name: readme
path: README.md
- name: api_report
path: api-report.md
frequency:
cron: '7 9 * * *'
filter:
- kind: Component
- spec.lifecycle: production
# Checks that the api-report contains the correct 'do not edit' disclaimer.
- factName: api-report-has-no-edit-warning
type: regex
path: api-report.md
regex: .*Do not edit this file.*
frequency:
cron: '7 11 * * *'
filter:
- kind: Component
- spec.lifecycle: production
SCM Fact Collector: No-Code UI Configuration Option
-
Make sure the prerequisite SCM Integrations - Connecting the SCM module to SCM providers is completed and SCM instance details are configured.
-
To configure the SCM Integration, go to
Soundcheck > Integrations > Source Code Management
and click theConfigure
button. To learn more about the No-Code UI config, see the Configuring a fact collector (integration) via the no-code UI.
SCM Fact Collector: YAML Configuration Option
The facts to be collected by the Source Control Management (SCM) module must be defined in one or more yaml
files, and then
referenced in the Soundcheck configuration in the app-config.yaml
file, like so:
soundcheck:
collectors:
scm:
- $include: ./scm-fact-extraction-configurations.yaml
- $include: ./more-scm-fact-extraction-configurations.yaml
- $include: ./even-more-scm-fact-extraction-configurations.yaml
With an SCM entity in your catalog, an SCM integration set, and SCM
configuration files added to the soundcheck.collectors.scm
field, your Backstage instance is almost ready to extract facts from SCM providers.
The next section covers how to set up the fact extraction configuration files to extract facts from SCM.
SCM Fact Collector: URL Configuration Option
The SCM Fact Collector can be configured to fetch its configuration from a remote URL, allowing its configuration to be managed as code.
The configuration file still follows the rules and structure as outlined for collectors above.
Here is how the SCM Fact Collector can be configured to fetch its configuration from a remote URL:
soundcheck:
collectors:
scm:
url: https://github.com/JasonSmithSpotify/soundcheck-external-demo/blob/main/scm-collector.yaml
Here, instead of the standard $include
directive or local specification of the collector configuration,
the scm
collector is configured with a url
field that points to the remote URL from which the configuration can be fetched.
Rate Limiting (Optional)
This fact collector can be rate limited in Soundcheck using the following configuration:
soundcheck:
job:
workers:
scm:
limiter:
max: 4900
duration: 3600000
The rate limits for SCM are dependant on the source control API used. For example, GitHub has a rate limit of 5000 per hour; the above configuration is set for 4900 executions per hour to account for this limit.
This fact collector handles 429 rate limit errors from SCM. Soundcheck will automatically wait and retry requests that are rate limited.
When available by the underlying API (i.e. GitHub), ETags and conditional requests are utilized to reduce the total number of API calls.
Caching Full Etag Responses (Optional)
The SCM Fact Collector can be configured to cache the full response from the SCM provider when using ETags. This will cache not only the extracted fact data, but the full response from the SCM provider, including the file tree structure and file contents. This is especially useful when using GLOB paths that may return many files as it can save significantly on API calls to the SCM provider(s). If you're encountering API rate limits and are using GLOB patterns, consider enabling this option.
WARNING: Caching full responses can lead to high memory usage, particularly when using GLOB patterns that match many files or if the files themselves are large.
Enable this option with:
soundcheck:
collectors:
scm:
cacheFullResultsForEtags: true
This option is disabled by default, and can currently only be enabled via YAML. If you are using the No-Code UI to configure your collectors, you will need to add this option to your app-config.yaml
file.
Enabling this option via YAML will work with but otherwise not impact any No-Code UI configurations.
SCM Fact Extraction Configuration
The SCM Fact Collector configuration yaml
files have the following structure:
frequency:
cron: '0 * * * *' # Defines a schedule for when the facts defined in this file should be collected
# This is optional and if omitted, facts will only be collected on demand.
initialDelay:
seconds: 30
filter: # A filter specifying which entities to collect the specified facts for
kind: 'Component'
cache: # Defines if the collected facts should be cached, and if so for how long
duration:
hours: 2
collects: # An array of fact extractor configuration describing how to collect SCM facts.
- SCM Fact Extractor Configuration One
- SCM Fact Extractor Configuration Two
- ...
- SCM Fact Extractor Configuration N
Below are the details for each field.
frequency
[Optional]
The frequency at which the collector should be executed. Possible values are either a cron expression { cron: ... }
or HumanDuration.
This is the default frequency for each extractor.
initialDelay
[optional]
The amount of time that should pass before the first invocation happens. Possible values are either a cron expression { cron: ... }
or HumanDuration.
batchSize
[optional]
The number of entities to collect facts for at once. Optional, the default value is 1.
Note: Fact collection for a batch of entities is still considered as one hit towards the rate limits
by the Soundcheck Rate Limiting engine, while the actual number of hits
will be equal to the batchSize
.
Example:
batchSize: 100
filter
[Optional]
A filter specifying which entities to collect the specified facts for. Matches the filter format used by the Catalog API. This is the default filter for each extractor.
See filters for more details.
exclude
[optional]
Entities matching this filter will be skipped during the fact collection process. Can be used in combination with filter. Matches the filter format used by the Catalog API.
filter:
- kind: component
exclude:
- spec.type: documentation
cache
[Optional]
If the collected facts should be cached, and if so for how long. Possible values are either true
or false
or a nested { duration:
HumanDuration }
field.
This is the default cache config for each extractor.
collects
An array of SCM Fact Extractor configurations describing how to collect SCM facts. See the section below for details on configuring the extractors.
SCM Fact Extractors
The Exists, RegEx, and JSON/YAML Source Control Management (SCM) fact extractor configurations are described in detail below. Before going into the detailed schemas of the individual fact extractors, the base schema that they all share will be covered first.
Common Fact Extractor Schema
All SCM Fact Extractors share a common base schema, the variables for which are defined below:
factName
The name of the fact to be extracted.
- Minimum length of 1
- Maximum length of 100
- Alphanumeric with single separator instances of periods, dashes, underscores, or forward slashes
filter
[Optional]
A filter specifying which entities to collect the specified facts for. Matches the filter format used by the Catalog API. If provided, it overrides the default filter provided at the top level. If not provided, it defaults to the filter provided at the top level. If neither extractor's filter, nor default filter is provided, the fact will be collected for all entities.
exclude
[optional]
Entities matching this filter will be skipped during the fact collection process. Can be used in combination with filter. Matches the filter format used by the Catalog API.
filter:
- kind: component
exclude:
- spec.type: documentation
cache
[Optional]
If the collected facts should be cached, and if so for how long. Possible values are either true
or false
or a nested { duration:
HumanDuration }
field.
If provided, it overrides the default cache config provided at the top level. If not provided, it defaults to the cache config provided at the top level. If neither extractor's cache, nor default cache config is provided, the fact will not be cached.
Example:
cache:
duration:
hours: 24
frequency
[optional]
The frequency at which the fact extraction should be executed. Possible values are either a cron expression { cron: ... }
or HumanDuration.
If provided, it overrides the default frequency provided at the top level. If not provided, it defaults to the frequency provided at the top level. If neither extractor's frequency, nor default frequency is provided, the fact will only be collected on demand.
Example:
frequency:
minutes: 10
batchSize
[optional]
The number of entities to collect facts for at once. Optional, the default value is 1. If provided it overrides the default batchSize provided at the top level. If not provided it defaults to the batchSize provided at the top level. If neither collector's batchSize nor default batchSize is provided the fact will be collected for one entity at a time.
Note: Fact collection for a batch of entities is still considered as one hit towards the rate limits
by the Soundcheck Rate Limiting engine, while the actual number of hits
will be equal to the batchSize
.
Example:
batchSize: 100
branch
[optional]
The branch to extract the fact from. If not provided, defaults to the repository's default branch.
Exists Fact Extractor
The Exists Fact Extractor collects information on whether a given file exists in the SCM provider. The extensions to the base schema are as follows:
type
Must be exactly exists
, like so:
type: exists
data
The data collected for this fact. This is an array consisting of two pairs of name
and path
:
name
: An identifier for the data element.path
: The path to the file. GLOB paths are supported.
Both name
and path
are subject to the naming restrictions of factName.
Sample Exists Configuration
Here's a sample yaml
configuration for a fact that gets information on the
existence of two files, README.md
and catalog-info.yaml
:
collects:
- factName:
readme_and_catalog_info_files_exist_fact # This gives this fact an identifier which is
# used to refer to the fact in other
# configuration files.
type: exists # This identifies the type of fact to collect.
data: # This defines the data element which will be returned in the
# fact object when the fact is collected.
- name: readme_exists # Label for the data element.
path: /*/README.md # A GLOB path. If any file is found matching the GLOB path, the value for the exists condition will be true.
- name: catalog_info_exists # Label for the data element.
path: /catalog-info.yaml # The file for which existence will be determined.
filter: # A filter to narrow the applicability of this fact.
metadata.name:
soundcheck-external-demo # This filter makes this fact applicable only to the
# component with the given name, in this case
# 'soundcheck-external-demo'
The checks that will compare the data collected by this fact to the expected outcomes is specified
in the app-config.yaml
file. Since this fact collects two data elements, there will be
two checks that check the value of each data element. The two checks would look like this:
soundcheck:
checks:
- id: has_readme_check # The name of the check
rule: # How to evaluate this check
factRef: scm:default/readme_and_catalog_info_files_exist_fact # The fact data to reference
path: $.readme_exists # The path to the field to analyze
operator: equal # Indicates the operation to apply
value: true # The desired value of the field indicated in path, above.
- id: has_catalog_info_file_check
rule:
factRef: scm:default/readme_and_catalog_info_files_exist_fact
path: $.catalog_info_exists
operator: equal
value: true
Finally, these two checks need to be listed in a program level ordinal
within the soundcheck-programs.yaml
file. Here's an example:
- id: demo
name: Demo
ownerEntityRef: group:default/owning_group
description: >
Demonstration of Soundcheck Exists Fact Extractor
levels:
- ordinal: 1
name: First level
description: Checks leveraging Soundcheck's SCM Exists Fact Extractor
checks:
- id: has_catalog_info_file_check # The identifier for the check.
name: Has catalog-info.yaml # A human-readable name for this check
description:
> # The description to display on the Soundcheck page for this check.
Repositories should contain a catalog-info.yaml file.
- id: has_readme_check
name: Has README.md
description: |
Repositories should provide a README.md file at root.
RegEx Fact Extractor
The RegEx Fact Extractor collects information about the contents of a file. Two modes are supported:
True/False Mode
True/False Mode uses a Regular Expression, or RegEx, to search for a match in a specified file. If a RegEx
match is found, the resulting fact data will contain a value of true
for a field named matches
. If not, the
matches
field will contain a value of false
.
RegEx Capture Groups Mode
Using RegEx Capture Groups Mode allows the extractor to associate capture groups within a RegEx to named values. This allows checks to verify that the captured values are correct.
RegEx Fact Extractor Schema
The extension schema for RegEx Fact Extractors is as follows:
type
Must be exactly regex
, like so:
type: regex
path
The path to the file to analyze. GLOB paths are supported. When GLOB paths are used, the fact data
will be an array, with each element corresponding to a file that matched the GLOB path. If true/false
mode is used, array will contain objects with a matches
field, which will be true
if the file
matched the RegEx, and false
if it did not. If RegEx Capture Groups Mode is used, the array will
contain objects with fields corresponding to the capture groups defined in the RegEx.
NOTE: When using GLOB paths, ensure your check is prepared to handle an array of results. See the example below.
regex
A valid RegEx string. This string is used on the file to collect data elements or to provide a true
/false
response corresponding to whether there is a match for the RegEx or not in the file.
flags
[Optional]
Accepts an optional regex flag parameter that must match:
/^[gimsuy]+$/
g | Global search. i | Case-insensitive search. m | Allows ^ and $ to match next to newline characters. s | Allows . to match newline characters. u | "Unicode"; treat a pattern as a sequence of Unicode code points. y | Perform a "sticky" search that matches starting at the current position in the target string.
A sample fact using the configuration is as follows:
- factName: api-report-has-no-edit-warning
type: regex
flags: mi
path: api-report.md
regex: .*do not edit this file.*
data
[Optional]
Defines the data to collect for this fact. This is an array consisting of two pairs of name
and
type
:
name
: An identifier for the data element. Subject to the naming restrictions of factName.type
: The expected type of data to be collected.
Each pair defined in the data field must correspond to a capture group in the given regex
. A mismatch between data element definition counts and RegEx Capture Groups is an error. Fact
data will not be collected.
If the data element is not present, the mode of the RegEx Fact Extractor defaults to True/False Mode.