Onboarding Software to Backstage

45 MINS

Populating the Software Catalog basics

The Backstage Software Catalog makes it easy for one team to manage 10 services — and makes it possible for your company to manage thousands of them.

Remember, you can think of the Software Catalog as an inventory of all the software at your company. However, the Software Catalog isn’t just an inventory. It also tracks relationships between software, creating more of a software graph than a spreadsheet.

The fundamental unit within the Software Catalog is an entity, which represents some piece of your organization’s software ecosystem. Entities can be services, websites, libraries, data pipelines, ML models, as well as broader concepts like system or domain — or even users and groups that tie in as owners and team members of the software.

All entities share a common overall data structure, but each kind of entity may have a different schema, semantics, or relationships to others. For this guide, we will only be covering Component entities (which represent pieces of software, like one from the aforementioned list, that a team might be responsible for).

For more, see What is the Software Catalog?

Where does the data live?

The Backstage Software Catalog is not the source of truth for your software ecosystem; it aggregates, syncs and exposes information from authoritative, external sources. At Spotify, we find that metadata about software entities is organically kept up to date when it lives in the same place as the software. Entity data is therefore checked into source control, alongside the software it describes, as declarative YAML files.

Teams then register the location of metadata files (e.g. URLs to such files hosted on GitHub, GitLab or similar systems) in the Software Catalog, which will read, process, and cache that metadata. The Software Catalog will periodically reload the metadata file from the source and process any changes found (such as a change of ownership, or other updates).