Skip to main content

Creating a new TPI FactCollector

Introduction

One of the most fundamental components of Soundcheck are the Fact Collectors. Fact Collectors are responsible for collecting data about entities. A single piece of data collected about a single entity is called a Fact in the Soundcheck system. Facts can then be referenced in Soundcheck Checks to determine if an entity is adhering to the rules defined by the Check.

Soundcheck comes with a set of built-in Fact Collectors, but adopters will inevitably want to customize their Soundcheck instance by adding new Fact Collectors specific to their organizational needs.

To that end, this guide will walk you through the process of creating a new FactCollector for a third party integration from scratch, using the PagerDuty API as an example and with callouts for best practices and tips from the Soundcheck team. Let's get started!

Pre-requisites

This guide assumes you have Soundcheck installed in your backstage instance. If you don't, follow this guide to install Soundcheck backend and this guide to install the Soundcheck frontend before proceeding.

Understanding FactCollectors

All Fact Collectors must implement the FactCollector interface. This interface is defined in the @spotify/backstage-plugin-soundcheck-node package, but we replicate it here for convenience:

/**
* Fact collectors collect one or more facts about entities.
*
* @public
*/
export interface FactCollector {
/**
* A unique identifier for the {@link FactCollector}.
*
* Generally shorthand for the source (e.g., catalog, github).
*/
id: string;

/**
* A name for the {@link FactCollector} suitable for display in a user interface.
*/
name?: string;

/**
* A description of the {@link FactCollector} suitable for display in a user interface.
*/
description?: string;

/**
* Collect the requested facts for the given entity.
*
* @param entities - The entities to collect facts on.
* @param params - Optional parameters for the collection including:
* factRefs - references to the facts to be collected
* refresh - a hint that the specified facts should be refreshed rather than served from any
* sort of (local to the collector) cache.
*/
collect(
entities: Entity[],
params?: {
factRefs?: FactRef[];
refresh?: FactRef[];
},
): Promise<Fact[]>;

/**
* Returns the names of the facts that this collector can collect.
*/
getFactNames(): Promise<string[]>;

/**
* Returns the data schema for the requested fact. The returned object should be a JSON schema
* describing the data structure of the fact. This is used in Soundchecks NCUI to provide paths
* for a selected fact when creating a new Check. Custom fact collectors that do not integrate
* with NCUI do not need to implement this method.
*
* An example schema for the FileExists fact from the SCMFactCollector that comes with Soundcheck
* is as follows:
* {
* "title": "File Exists",
* "description": "File Exists",
* "type": "object",
* "properties": {
* "classpath": {
* "type": "boolean"
* },
* "project": {
* "type": "boolean"
* }
* }
* }
*
* @param factRef - Reference to the fact whose data schema should be returned.
*/
getDataSchema(factRef: FactRef): Promise<string | undefined>;

/**
* Returns the collection configurations for this {@link FactCollector}.
*/
getCollectionConfigs(): Promise<CollectionConfig[]>;
}

The CollectionConfig type and the getCollectionConfigs() function

The final method in the FactCollector interface is getCollectionConfigs. This method returns an array of CollectionConfig objects. This type is defined in the @spotify/backstage-plugin-soundcheck-common package, but we again provide it here for convenience:

/**
* Collection configuration for one or more facts such as schedules, filters, and cache settings.
*
* @public
*/
export type CollectionConfig = {
/**
* The facts to which this collection configuration applies.
*/
factRefs?: FactRef[];

/**
* A filter specifying which entities to collect the specified facts for.
*
* Matches the filter format used by the Catalog API:
* {@link https://backstage.io/docs/reference/catalog-client.entityfilterquery}
*/
filter?: EntityFilter;

/**
* The frequency at which the specified facts should be collected.
*
* If not provided fact collection will not be scheduled.
*/
frequency?:
| {
cron: string;
}
| HumanDuration;

/**
* If the collected facts should be cached, and if so for how long.
*
* If not provided facts will not be cached.
*/
cache?: CacheConfig;
};

The CollectionConfigs returned by a FactCollector specify how the collector collects groups of facts by specifying:

  • factRefs: The facts to collect.
  • filter: The entities to collect the facts for.
  • frequency: How often to collect the facts.
  • cache: Whether to cache the facts and for how long.

Most collectors will only return a single CollectionConfig that applies to all facts the collector collects.

Configuring a FactCollector

FactCollector configurations are specified under the soundcheck.collectors key in the configuration file. These can be specified in the main app-config.yaml file, but more commonly they are referred to through an $includes key in the app-config.yaml file. Here is an example of an SCMFactCollector configuration:

soundcheck:
collectors:
scm:
frequency:
cron: '0 0 * * *'
cache:
duration:
minutes: 10
filter:
kind: 'Component'
collects:
# Extracts if files exist at the given paths.
- factName: files.eclipse
type: exists
data:
- name: classpath
path: .classpath
- name: project
path: .project

# Extracts two data elements, ubi and date, from the file at the specified path
# using the given regex. Each capture group corresponds to the data elements, in order.
- factName: github.docker
type: regex
path: /pom.xml
regex: gcr.io/spotify-base-images/(.*):(.*)@.*
data:
- name: ubi
type: string
- name: date
type: string

# Extracts json data from the file at the given path.
# Here:
# metadata.tags will be extracted into an array named 'tags' of type String, and
# metadata.labels.service_type will be extracted into a variable called service_type of type string
- factName: entity.metadata
type: json
path: /system-info.yaml
data:
- name: tags
jsonPath: metadata.tags
type: array
items:
type: string
- name: service_type
jsonPath: metadata.labels.service_type
type: string

Note that the SCMFactCollector, and indeed all FactCollectors, take no specific configuration types other than a JSON object. This allows for flexibility in how collectors are configured. However, configurations should still be validated, and so best practice is to define a schema for the collector which should be used to validate the configuration object passed to the collector. We highly recommend using Zod for creating schemas and validating them.

Here is a snippet showing how the SCMFactCollector validates its configuration:

Zod verification
const schema = ScmFactCollectorSchema.safeParse(collectorConfig);
if (!schema.success) {
this.#logger.error(`Failed to parse SCMFactCollector from schema.`);
throw new InputError(schema.error.message);
}

Typically, FactCollectors take a configuration object similar to the one shown above. The frequency, cache, and filter keys are common and dictate when the collector collects facts, how long it caches them, and what entities it collects facts for, respectively. The 'collects' configuration is specific to the collector and specifies what facts the collector should collect and is specific to the needs of the organization.

Creating a New Module

With the basics of Fact Collectors covered, we can now use the backstage-cli to create a new backend module, in which we'll write the code for our new FactCollector.

Make sure you've run yarn install and installed dependencies, then run the following on your command line yarn new. You will see the below prompt, pick the 'backend-module' option. Then for the ID of the plugin enter "soundcheck" and for the ID of the module enter "pagerduty".

Backstage cli

This will create a new Backstage backend module based on the ID that was provided. It will be built and added to the Backstage App automatically.

Backstage cli

You will see the above output when the 'backend-module' is created successfully.

Creating the PagerDuty FactCollector

Now, let's walk through the creation of a new third-party FactCollector that we'll use to collect facts from PagerDuty. Unsurprisingly, we'll call it the PagerDutyFactCollector.

Note: This example collector focuses only on PagerDuty service details on entities in our Catalog. Other features offered by pagerduty can be integrated, if you need more features, checkout the pagerduty plugin.

At this point, you're assumed to have created a new module to house the code we'll be writing. If you've deviated from the names in the examples above, you'll need to ensure that you continue to use the names you've chosen in the following steps, rather than the names used in the examples.

First let's add the packages we need, you can do this with the following command:

From your Backstage root directory
yarn --cwd plugins/soundcheck-backend-module-pagerduty add @spotify/backstage-plugin-soundcheck-node @backstage/catalog-model @backstage/config @spotify/backstage-plugin-soundcheck-common @pagerduty/pdjs luxon qs zod

Create a new file called PagerDutyFactCollector.ts in the src folder of the 'backend-module' you just created and open it in your editor.

Let's add the imports we'll need:

PagerDutyFactCollector.ts
import { LoggerService } from '@backstage/backend-plugin-api';
import { Entity, stringifyEntityRef } from '@backstage/catalog-model';
import { api } from '@pagerduty/pdjs';
import {
CollectionConfig,
Fact,
FactRef,
} from '@spotify/backstage-plugin-soundcheck-common';
import { FactCollector } from '@spotify/backstage-plugin-soundcheck-node';
import { DateTime } from 'luxon';
import qs from 'qs';

import { PagerDutyConfigSchema } from './schema';

The key interface when creating a Fact Collector is the FactCollector interface. We'll spend most of the tutorial on implementing this interface.

import { FactCollector } from '@spotify/backstage-plugin-soundcheck-node';
note

The ConfigurableFactCollector interface, below, extends FactCollector, and allows the collector to be updated when its configuration changes. We won't make use of this interface here for simplicity's sake, but it is worth noting that it exists as it can be useful to have a collector that can be updated without needing to restart the Soundcheck backend.

import { ConfigurableFactCollector } from '@spotify/backstage-plugin-soundcheck-node';

We start by importing Entity and a function called stringifyEntityRef from the @backstage/catalog-model package. Entity is a representation of a single entity in the Backstage ecosystem against which a Fact Collector can collect facts. stringifyEntityRef is a function that takes an Entity and returns a string representation of it, this is used commonly in Soundcheck to represent entities in situations where sending the entire Entity is undesired.

We also import the Config type from the @backstage/config package. This is the type of the Soundcheck configuration object that is passed to the FactCollector when it is created, and from which our FactCollector can pull its own sub-configuration.

We next import the api function from the @pagerduty/pdjs package. This is the function we will use to communicate with the PagerDuty API.

The final import lines pull in common types from the @spotify/backstage-plugin-soundcheck-common package, as well as some utility functions from the luxon, qs and winston packages.

Now, we'll define some constants:

PagerDutyFactCollector.ts
const TOKEN = '<REDACTED>';
const ID = 'pagerduty';
const SCOPE = 'default';
const FACT_REF_SERVICE = `${ID}:${SCOPE}/service`;

The TOKEN constant is a placeholder for the PagerDuty API token. This token is used to authenticate and authorize our communications with the PagerDuty API. It should be kept secret and not hard-coded, but we leave it here for simplicity of the example. If you are experimenting with this code, you should remember to replace this with a real token for testing, and then switch to getting a token from your preferred secret management system when you are ready to deploy.

The ID is the identifier for this FactCollector. It is used to uniquely identify the collector. SCOPE is a reserved word in Soundcheck and should, for now, be set to 'default'. Some fact collectors support scope, for instance the SCM collector uses scope to pull from different SCM branches.

The FACT_REF_SERVICE constant is a string that represents a complete fact reference in the 'collector-scope-factName' format. This is the type of fact that is returned by this collector.

Next, add the following constant to the file:

PagerDutyFactCollector.ts
const SERVICE_DETAILS = [
'escalation_policies',
'teams',
'auto_pause_notifications_parameters',
'integrations',
];

SERVICE_DETAILS is an array of strings that represent the details we want to pull from PagerDuty.

With our constants defined and our imports all set, we can begin to define our PagerDutyFactCollector in earnest. Let's begin by creating the PagerDutyFactCollector class and having it implement the FactCollector interface:

PagerDutyFactCollector.ts
export class PagerDutyFactCollector implements FactCollector {
id = ID;
name = 'PagerDuty';
description = 'PagerDuty integration.';

readonly #pd = api({ token: TOKEN });
readonly #logger: LoggerService;
readonly #config?: typeof PagerDutyConfigSchema;

static create(logger: LoggerService): PagerDutyFactCollector {
return new PagerDutyFactCollector(logger);
}

private constructor(logger: LoggerService) {
this.#logger = logger.child({
target: this.id,
});
}

collect(
entities: Entity[],
params?: { factRefs?: FactRef[]; refresh?: FactRef[] },
): Promise<(Fact | CollectionError)[]> {
throw new Error('Method not implemented.');
}
getFactNames(): Promise<string[]> {
throw new Error('Method not implemented.');
}
getDataSchema(factRef: FactRef): Promise<string | undefined> {
throw new Error('Method not implemented.');
}
getCollectionConfigs(): Promise<CollectionConfig[]> {
throw new Error('Method not implemented.');
}
}

Excellent, we've defined our new FactCollector, and given it an id, name, and description, as well as defining a few readonly variables for the logger, config, and the PagerDuty API client.

note

Note the use of the static create method. This is a common pattern in Soundcheck FactCollectors that facilitates the easy creation of the collector within the Soundcheck environment.

Let's get into the heart of the collector, the collect method:

PagerDutyFactCollector.ts
  async collect(
entities: Entity[],
_params: {
factRefs?: FactRef[];
refresh?: FactRef[];
},
): Promise<Fact[]> {
this.#logger.debug('Collecting fact(s) for PagerDutyFactCollector');

const facts: Fact[] = [];
for (const entity of entities) {
const fact = await this.#collectServiceFact(entity);
if (fact) {
facts.push(fact);
}
}
return facts;
}
note

The _ before params in the example above is just a convention to mark a property as unused so that linters or TypeScript won't flag it.

The arguments to the collect method are entities and _params. Entities is an array of entities for which we want to collect facts. Note that in general, this will only ever be a single entity due to Soundchecks integration with job queues. The _params parameter is unused in this example, but tells the collector what facts to collect and if they should be fetched freshly rather than from any sort of cache provided by the collector itself. This does not impact fact caching provided by Soundcheck.

The collect method is where the collector does its work. For all entities, and for any fact references provided by the _params argument, the collector must collect and return the facts for the entities. Above, we are calling to a private method, #collectServiceFact, to collect the service details for each entity. We add the fact for each entity to the facts array and return it.

Now let's look at the #collectServiceFact function:

PagerDutyFactCollector.ts
  async #collectServiceFact(entity: Entity): Promise<Fact | undefined> {
const serviceId = entity.metadata.annotations?.['pagerduty.com/service-id'];
if (serviceId) {
const service = await this.#getService(serviceId, SERVICE_DETAILS);
if (service) {
return {
factRef: FACT_REF_SERVICE,
entityRef: stringifyEntityRef(entity),
data: service,
timestamp: DateTime.utc().toISO()!,
};
}
}
return undefined;
}

The #collectServiceFact function takes an entity and returns a fact for that entity. It first checks that the entity is annotated with a pagerduty.com/service-id annotation, which is the PagerDuty service id associated with the entity. This means that this FactCollector will only collect facts from those entities that have this annotation. If the entity has the annotation, we call the #getService function, and return a fact for the entity with the service details as the fact's data.

PagerDutyFactCollector.ts
  async #getService(serviceId: string, include?: string[]): Promise<any> {
const query = qs.stringify(
{
'include[]': include,
},
{
indices: false,
},
);

const { data } = await this.#pd.get(`/services/${serviceId}?${query}`);
return data?.service;
}

The final function used by the collect method is #getService. This function takes a serviceId and an optional array of strings called include, which is specific to the PagerDuty API and informs PagerDuty what information we want about the service. The #getService function constructs a query string from the include array and calls to the PagerDuty API to get the service details for the ID associated with the entity. It then returns the service details.

The final three methods to implement are getFactNames, getDataSchema, and getCollectionConfigs.

Let's start with getFactNames:

PagerDutyFactCollector.ts
  async getFactNames(): Promise<string[]> {
return [FACT_REF_SERVICE];
}

This method returns an array of strings representing the fact references that this collector can collect. In this case, we only collect one type of fact, so we return an array with a single element. This method is used by Soundcheck's frontend to determine what facts are available for use in checks when creating a check via the Soundcheck No-Code User Interface (NCUI).

Next, let's implement getDataSchema:

PagerDutyFactCollector.ts
  async getDataSchema(_factRef: FactRef): Promise<string | undefined> {
return JSON.stringify({
title: 'Service Details',
description: 'Pager Duty Service Details',
type: 'object',
properties: {
escalation_policies: {
type: 'object',
},
teams: {
type: 'array',
},
acknowledgement_timeout: {
type: 'number',
},
integrations: {
type: 'object',
},
},
});
}

The getDataSchema method returns a JSON schema describing the data structure of the fact. This is used in Soundcheck's NCUI to provide paths for a selected fact when creating a new check. In this case, we return a schema that describes the structure of the service details fact that we collect from PagerDuty. Note that this schema is not the full schema of the data returned by the PagerDuty API, but rather a subset for the sake of brevity and demonstration.

With the getdataSchema and getFactNames methods implemented, we can now see the service details fact in the NCUI when creating a new check:

PagerDuty Fact in NCUI

Finally, let's implement getCollectionConfigs:

PagerDutyFactCollector.ts
  async getCollectionConfigs(): Promise<CollectionConfig[]> {
if (this.#config) {
const validationResult = PagerDutyConfigSchema.safeParse(this.#config);

if (validationResult.success) {
return validationResult.data.collects.map(c => {
return {
filter: c.filter,
frequency: c.frequency,
factRefs: [FACT_REF_SERVICE],
};
});
}
}
return [];
}

The getCollectionConfigs method returns an array of CollectionConfig objects. This method is used by Soundcheck to determine how often the collector should collect facts, what facts it should collect, and what entities it should collect facts for. In this case, we return the CollectionConfig that is specified in the pagerduty collector config.

Additionlly, we will use the Zod library to create a config schema that will validate the collector config specified by the user, which ensures that the collector is configured correctly before facts are collected. Create a new file in the plugin source directory and name it schema.ts, copy and paste below the code into it.

schema.ts
import {
FilterSchema,
FrequencySchema,
} from '@spotify/backstage-plugin-soundcheck-common';
import { z } from 'zod';

export const PagerDutyConfigSchema = z.strictObject({
server: z.string().optional(),
collects: z.array(
z.strictObject({
type: z.literal('Service'),
frequency: FrequencySchema.optional(),
filter: FilterSchema.optional(),
}),
),
});

Notes on the PagerDutyFactCollector

Note that this fact collector does not implement the ConfigurableFactCollector interface, which would allow the collector to be updated when its configuration changes and to have its configuration specified in the Soundcheck configuration file. Switch to this interface if you need to update the collector without restarting the Soundcheck backend. This would also necessitate the addition of verification and validation of the configuration within the collector, including schema validation, and so was omitted for brevity.

Wrapping Up the PagerDutyFactCollector

As it stands, the code snippets provided above and tagged with 'PagerDutyFactCollector.ts' are enough to create a working PagerDutyFactCollector, but we have a bit of configuration left to make use of it. Let's do that next.

Updating the backend module

In the same folder as our PagerDutyFactCollector.ts file is a module.ts file that we need to make some changes to. Open the module.ts file and replace it's entire contents with the following:

module.ts
import {
createBackendModule,
coreServices,
} from '@backstage/backend-plugin-api';
import { factCollectionExtensionPoint } from '@spotify/backstage-plugin-soundcheck-node';
import { PagerDutyFactCollector } from './PagerDutyFactCollector';

export const soundcheckModulePagerduty = createBackendModule({
pluginId: 'soundcheck',
moduleId: 'pagerduty',
register(env) {
env.registerInit({
deps: {
logger: coreServices.logger,
soundcheck: factCollectionExtensionPoint,
},
async init({ logger, soundcheck }) {
soundcheck.addFactCollector(PagerDutyFactCollector.create(logger));
},
});
},
});

What this does is create a new backend module that uses the factCollectionExtensionPoint's addFactCollector method to add our new PagerDutyFactCollector.

Plugin Configuration

Add metadata annotation to the catalog-info.yaml file of the entity to allow the plugin to map an entity to a service in PagerDuty.

metadata:
annotations:
pagerduty.com/service-id: pd-test-service-id #replace with your service ID

Create Checks and Tracks

To use our collector, we'll need to define a Soundcheck Track as well as a Soundcheck Check to analyze the Facts collected by our new PagerDuty Fact Collector. Let's start with the Check which we'll define in yaml. Go ahead and add this Check to your app-config.yaml or app-config.local.yaml file:

soundcheck:
checks:
- id: requires_service_status_to_be_active
description: Requires service status to be active
passedMessage: The check has passed!
failedMessage: The check has failed!
rule:
factRef: pagerduty:default/service
path: $.status
operator: equal
value: active
schedule:
frequency:
minutes: 1

The check above is a simple example of checks that can be used to check the data collected by the PagerDuty Fact Collector. The check ensures that the status of the service is active.

Now, let's define a Soundcheck Track that uses our new PagerDuty Fact Collector and the Checks we just defined. Go ahead and add the following to your app-config.yaml or app-config.local.yaml file as well:

soundcheck:
programs:
- id: pager_duty_track
name: Pager Duty
ownerEntityRef: group:default/backstage
description: >
Ensure that your component is properly set up to use PagerDuty for incident
management.
filter:
catalog:
spec.lifecycle: 'experimental'
levels:
- ordinal: 0
name: Demonstration of Soundcheck PagerDuty Fact Collector
description: Checks leveraging SoundCheck's PagerDuty Fact Collector
checks:
- id: requires_service_status_to_be_active
name: Requires service status to be active
description: Requires service status to be active

The simple Track defined above has one level, with the check we defined earlier. Start your Backstage instance and navigate to the Soundcheck tab to see your new Track. Note that in the example Track, we've filtered so that the Track only applies to components with experimental lifecycle.

Here's what the Track looks like in the Soundcheck UI: PagerDuty Track

Note that we configured our PagerDuty Fact Collector to run every five minutes. Once the collector collects the fact, you'll see the Checks run and update the Track for your components:

PagerDuty Track - Checks Executed

That's it! Your new PagerDuty Fact Collector is up and running!