Skip to main content

GitHub

Similar to the Source Control Management (SCM) integration plugin, the GitHub integration plugin for Soundcheck provides out-of-box integration with GitHub by leveraging Backstage's GitHub integration to implement extraction and collection of facts from GitHub repositories.

The purpose of the GitHub integration plugin is to provide GitHub-specific fact collection (like branch protections), while the SCM integration plugin provides the collection of facts based on repository content.

The GitHub integration plugin supports the extraction of the following facts:

Prerequisites

Configure GitHub integration in Backstage

Integrations are configured at the root level of app-config.yaml. Here's an example configuration for GitHub:

integrations:
github:
- host: github.com
token: ${GITHUB_TOKEN}

Consult the Backstage GitHub integration instructions for full configuration details.

Add the GitHubFactCollector to Soundcheck

GitHub integration for Soundcheck is not installed by default. It must be manually installed and configured for the GitHub Fact Collector to work.

First, add the @spotify/backstage-plugin-soundcheck-backend-module-github package:

yarn workspace backend add @spotify/backstage-plugin-soundcheck-backend-module-github

Then add the following to your packages/backend/src/index.ts file:

packages/backend/src/index.ts
const backend = createBackend();

backend.add(import('@spotify/backstage-plugin-soundcheck-backend'));
backend.add(
import('@spotify/backstage-plugin-soundcheck-backend-module-github'),
);
// ...

backend.start();

Consult the Soundcheck Backend documentation for additional details on setting up the Soundcheck backend.

Legacy Backend

warning

If you are still using the Legacy Backend you can follow these instructions but we highly recommend migrating to the New Backend System.

First add the package: yarn workspace backend add @spotify/backstage-plugin-soundcheck-backend-module-github

Then in packages/backend/src/plugins/soundcheck.ts, add the GitHubFactCollector:

  import { SoundcheckBuilder } from '@spotify/backstage-plugin-soundcheck-backend';
import { Router } from 'express';
import { PluginEnvironment } from '../types';
+ import { GithubFactCollector } from '@spotify/backstage-plugin-soundcheck-backend-module-github';

export default async function createPlugin(
env: PluginEnvironment,
): Promise<Router> {
return SoundcheckBuilder.create({ ...env })
+ .addFactCollectors(
+ GithubFactCollector.create(env.config, env.logger, env.cache),
+ )
.build();
}

Entity configuration

To be able to determine the repository to use the GitHub integration will use the value from the backstage.io/source-location annotation. In many cases this will be set for you but if it is not you will need to add it to your catalog-info.yaml file, here's a simple example:

metadata:
annotations:
backstage.io/source-location: url:https://github.com/my-org/my-service/

Plugin Configuration

The collection of facts is driven by configuration. To learn more about the configuration, jump to the Defining GitHub Fact Collectors section.

GitHub Fact Collector can be configured via YAML or No-Code UI. If you configure it via both YAML and No-Code UI, the configurations will be merged. It's preferable to choose a single source for the Fact Collectors configuration (either No-Code UI or YAML) to avoid confusing merge results.

Default Configuration

To add the default initial configuration of GitHub Fact Collector on startup, the following flag must be set to true in the app-config.yaml file:

soundcheck:
addStartingConfigurations:
collectors: true

This configuration is required to be able to collect the facts necessary for the pre-canned checks and tracks.

Note: The configuration will be stored in the database and will be configurable via No-Code UI. If you'd like to add this configuration via yaml, you should add the following config instead:

soundcheck:
collectors:
github:
collects:
- factName: branch_protections
type: BranchProtections
cache: false
filter:
- kind:
- Component
spec.lifecycle:
- production
frequency:
cron: '7 3 * * *'
- factName: repository_details
type: RepositoryDetails
cache: false
filter:
- kind:
- Component
spec.lifecycle:
- production
frequency:
cron: '7 5 * * *'
- factName: repository_languages
type: RepositoryLanguages
cache: false
filter:
- kind:
- Component
spec.lifecycle:
- production
frequency:
cron: '7 7 * * *'

No-Code UI Configuration Option

  1. Make sure the prerequisite Configure GitHub integration in Backstage is completed and GitHub instance details are configured.

  2. To enable the GitHub Integration, go to Soundcheck > Integrations > GitHub and click the Configure button. To learn more about the No-Code UI config, see the Configuring a fact collector (integration) via the no-code UI.

GitHub Integration

YAML Configuration Option

  1. Create a github-facts-collectors.yaml file in the root of your Backstage repository and fill in all your GitHub Fact Collectors. A simple example GitHub Fact Collector is listed below.

    ---
    frequency:
    cron: '0 * * * *'
    collects:
    - factName: branch_protections
    type: BranchProtections
    branch: master
    - factName: repository_details
    type: RepositoryDetails
    - factName: repository_languages
    type: RepositoryLanguages

    Note: this file will be loaded at runtime along with the rest of your Backstage configuration files. Therefore, make sure that it's available in deployed environments in the same way as your app-config.yaml files are.

  2. Add a soundcheck collectors field to app-config.yaml and reference the newly created github-facts-collectors.yaml

    # app-config.yaml
    soundcheck:
    collectors:
    github:
    $include: ./github-facts-collectors.yaml

Rate Limiting (Optional)

This fact collector can be rate limited in Soundcheck using the following configuration:

soundcheck:
job:
workers:
github:
limiter:
max: 4900
duration: 3600000

GitHub API has a limit of 5000 requests per hour (15000 for Enterprise). We recommend setting your rate limit to something below this, i.e. in the example above, we set the rate limit to 4900 executions every hour.

This fact collector handles rate limit errors per the recommendation from GitHub. Soundcheck will automatically wait and retry requests that are rate limited.

Defining GitHub Fact Collectors

This section describes the data shape and semantics of GitHub Fact Collectors.

Overall Shape Of A GitHub Fact Collector

The following is an example of a descriptor file for a GitHub Fact Collector:

---
frequency:
cron: '0 * * * *'
initialDelay:
seconds: 30
filter:
kind: 'Component'
cache:
duration:
hours: 2
collects:
- factName: branch_protections
type: BranchProtections
branch: master
filter:
- spec.lifecycle: 'production'
spec.type: 'website'
cache: false
- factName: repository_details
type: RepositoryDetails
cache: true
exclude:
- spec.type: 'documentation'
- factName: repository_languages
type: RepositoryLanguages

Below are the details for each field.

frequency [optional]

The frequency at which the collector should be executed. Possible values are either a cron expression { cron: ... } or HumanDuration. This is the default frequency for each extractor.

initialDelay [optional]

The amount of time that should pass before the first invocation happens. Possible values are either a cron expression { cron: ... } or HumanDuration.

batchSize [optional]

The number of entities to collect facts for at once. Optional, the default value is 1.

Note: Fact collection for a batch of entities is still considered as one hit towards the rate limits by the Soundcheck Rate Limiting engine, while the actual number of hits will be equal to the batchSize.

Example:

batchSize: 100

filter [optional]

A filter specifying which entities to collect the specified facts for. Matches the filter format used by the Catalog API. This is the default filter for each extractor.

exclude [optional]

Entities matching this filter will be skipped during the fact collection process. Can be used in combination with filter. Matches the filter format used by the Catalog API.

filter:
- kind: component
exclude:
- spec.type: documentation

cache [optional]

If the collected facts should be cached, and if so for how long. Possible values are either true or false or a nested { duration: HumanDuration } field. This is the default cache config for each extractor.

collects [required]

An array describing which facts to collect and how to extract them. See below for details about the overall shape of a fact extractor.

Overall Shape Of A Fact Extractor

Each extractor supports the fields described below.

factName [required]

The name of the fact to be extracted.

  • Minimum length of 1
  • Maximum length of 100
  • Alphanumeric with single separator instances of periods, dashes, underscores, or forward slashes

type [required]

The type of the extractor (e.g. BranchProtections, RepositoryDetails).

frequency [optional]

The frequency at which the fact extraction should be executed. Possible values are either a cron expression { cron: ... } or HumanDuration. If provided, it overrides the default frequency provided at the top level. If not provided, it defaults to the frequency provided at the top level. If neither extractor's frequency, nor default frequency is provided, the fact will only be collected on demand.

Example:

frequency:
minutes: 10

batchSize [optional]

The number of entities to collect facts for at once. Optional, the default value is 1. If provided it overrides the default batchSize provided at the top level. If not provided it defaults to the batchSize provided at the top level. If neither collector's batchSize nor default batchSize is provided the fact will be collected for one entity at a time.

Note: Fact collection for a batch of entities is still considered as one hit towards the rate limits by the Soundcheck Rate Limiting engine, while the actual number of hits will be equal to the batchSize.

Example:

batchSize: 100

branch [optional]

The branch to extract the fact from. If not provided, defaults to the repository's default branch.

filter [optional]

A filter specifying which entities to collect the specified facts for. Matches the filter format used by the Catalog API. If provided, it overrides the default filter provided at the top level. If not provided, it defaults to the filter provided at the top level. If neither extractor's filter, nor default filter is provided, the fact will be collected for all entities.

exclude [optional]

Entities matching this filter will be skipped during the fact collection process. Can be used in combination with filter. Matches the filter format used by the Catalog API.

filter:
- kind: component
exclude:
- spec.type: documentation

cache [optional]

If the collected facts should be cached, and if so for how long. Possible values are either true or false or a nested { duration: HumanDuration } field. If provided, it overrides the default cache config provided at the top level. If not provided, it defaults to the cache config provided at the top level. If neither extractor's cache nor default cache config is provided, the fact will not be cached. Example:

cache:
duration:
hours: 24

Collecting BranchProtections Fact

The BranchProtections fact contains information about configured branch protections for a given branch in a GitHub repository.

Shape of A BranchProtections Fact Collector

The shape of a BranchProtections Fact Collector matches the Overall Shape Of A GitHub Fact Collector (restriction: type: BranchProtections).

The following is an example of the BranchProtections Fact Collector configuration:

collects:
- factName: branch_protections
type: BranchProtections
frequency:
cron: '0 * * * *'
branch: master
filter:
- spec.lifecycle: 'production'
spec.type: 'website'
cache: false

Shape of A BranchProtections Fact

The shape of a BranchProtections Fact is based on the Fact Schema.

For a description of the data collected regarding branch protection, refer to the GitHub API documentation.

The following is an example of the collected BranchProtections fact:

factRef: github:master/branch_protections
entityRef: component:default/queue-proxy
scope: master
timestamp: 2023-02-24T15:50+00Z
data:
url: 'https://api.github.com/repos/backstage/backstage/branches/master/protection'
required_pull_request_reviews:
url: 'https://api.github.com/repos/backstage/backstage/branches/master/protection/required_pull_request_reviews',
dismiss_stale_reviews: false
require_code_owner_reviews: true
required_approving_review_count: 2
require_last_push_approval: false
required_signatures:
url: 'https://api.github.com/repos/backstage/backstage/branches/master/protection/required_signatures'
enabled: false
enforce_admins:
url: 'https://api.github.com/repos/backstage/backstage/branches/master/protection/enforce_admins'
enabled: false
required_linear_history:
enabled: false
allow_force_pushes:
enabled: true
allow_deletions:
enabled: true
block_creations:
enabled: true
required_conversation_resolution:
enabled: false
lock_branch:
enabled: false
allow_fork_syncing:
enabled: true

Shape of A BranchProtections Fact Check

The shape of a BranchProtections Fact Check matches the Shape of a Fact Check.

The following is an example of the BranchProtections fact checks:

soundcheck:
checks:
- id: requires_code_owner_reviews
rule:
factRef: github:master/branch_protections
path: $.required_pull_request_reviews.require_code_owner_reviews
operator: equal
value: true
- id: requires_at_least_two_approving_reviews
rule:
factRef: github:master/branch_protections
path: $.required_pull_request_reviews.required_approving_review_count
operator: greaterThanInclusive
value: 2

The following is an example of the Soundcheck program that utilizes these checks:

- id: demo
name: Demo
ownerEntityRef: group:default/owning_group
description: Demonstration of Soundcheck BranchProtections Fact Extractor
levels:
- ordinal: 1
name: First level
description: Checks leveraging Soundcheck's GitHub BranchProtections Fact Extractor
checks:
- id: requires_code_owner_reviews
name: Requires code owner reviews
description: PR requires code owner reviews
- id: requires_at_least_two_approving_reviews
name: Requires at least two approving reviews
description: PR requires at least two approving reviews

Collecting RepositoryDetails Fact

The RepositoryDetails fact contains information about a GitHub repository.

Shape of A RepositoryDetails Fact Collector

The shape of a RepositoryDetails Fact Collector matches the Overall Shape Of A GitHub Fact Collector (restriction: type: RepositoryDetails).

The following is an example of the RepositoryDetails Fact Collector configuration:

collects:
- factName: repository_details
type: RepositoryDetails
frequency:
cron: '0 * * * *'
filter:
- spec.lifecycle: 'production'
cache: true

Shape of A RepositoryDetails Fact

The shape of a RepositoryDetails Fact is based on the Fact Schema.

For a description of the data collected about repository, refer to the GitHub API documentation.

The following is an example of the collected RepositoryDetails fact:

factRef: github:default/repository_details
entityRef: component:default/queue-proxy
scope: default
timestamp: 2023-02-24T15:50+00Z
data:
name: backstage
full_name: backstage/backstage
private: true
html_url: 'https://github.com/backstage/backstage'
description: null
fork: false
url: 'https://api.github.com/repos/backstage/backstage'
homepage: null
size: 3
stargazers_count: 0
watchers_count: 0
language: null
has_issues: true
has_projects: true
has_downloads: true
has_wiki: true
has_pages: false
has_discussions: false
forks_count: 0
mirror_url: null
archived: false
disabled: false
open_issues_count: 0
license: null
allow_forking: true
is_template: false
web_commit_signoff_required: false
visibility: 'private'
forks: 0
open_issues: 0
watchers: 0
default_branch: master
permissions:
admin: true
maintain: true
push: true
triage: true
pull: true
allow_squash_merge: true
allow_merge_commit: true
allow_rebase_merge: true
allow_auto_merge: false
delete_branch_on_merge: false
allow_update_branch: false
use_squash_pr_title_as_default: false
squash_merge_commit_message: 'COMMIT_MESSAGES'
squash_merge_commit_title: 'COMMIT_OR_PR_TITLE'
merge_commit_message: 'PR_TITLE'
merge_commit_title: 'MERGE_MESSAGE'
security_and_analysis:
secret_scanning:
status: 'disabled'
secret_scanning_push_protection:
status: 'disabled'
network_count: 0
subscribers_count: 1

Shape of A RepositoryDetails Fact Check

The shape of a RepositoryDetails Fact Check matches the Shape of a Fact Check.

The following is an example of the RepositoryDetails fact checks:

soundcheck:
checks:
- id: allows_rebase_merge
rule:
factRef: github:default/repository_details
path: $.allow_rebase_merge
operator: equal
value: true
- id: has_less_than_ten_open_issues
rule:
factRef: github:default/repository_details
path: $.open_issues
operator: lessThan
value: 10

The following is an example of the Soundcheck program that utilizes these checks:

- id: demo
name: Demo
ownerEntityRef: group:default/owning_group
description: Demonstration of Soundcheck RepositoryDetails Fact Extractor
levels:
- ordinal: 1
name: First level
description: Checks leveraging Soundcheck's GitHub RepositoryDetails Fact Extractor
checks:
- id: allows_rebase_merge
name: Allows Rebase Merge
description: Repository allows rebase merge
- id: has_less_than_ten_open_issues
name: Has Less Than 10 Open Issues
description: GitHub Repository Has Less Than 10 Open Issues