A sneak peek at the Catalog Wizard in Spotify Portal

Author
Nurit Izrailov, Spotify
Published
Icons representing software components being arranged in a grid by the Spotify Portal Catalog Wizard

tl;dr Spotify Portal for Backstage is a full-featured internal developer portal (IDP) that’s designed to be quick and simple to stand up inside an engineering org. Even better, you don’t have to be a JavaScript expert to get started — Portal’s built-in onboarding tools take you through the setup process step by step, no coding required. In this post, we focus on one of those tools: the Catalog Wizard. Keep reading to learn how the wizard integrates with GitHub and GitHub Enterprise (GHE) to automatically populate your new Software Catalog with all your teams’ services, websites, and libraries — in less than five minutes. Want to try it out for yourself? Apply for the Spotify Portal beta, and see just how quickly you can get a Backstage IDP up and running inside your org.

The wonders of a centralized Software Catalog

Rather than requiring you to build a custom dev portal from scratch, Spotify Portal includes the essential features of a Backstage IDP right out of the box. This includes Software Templates, our premium Soundcheck plugin, and the heart of any internal developer portal: the Software Catalog.

Less chaos, more clarity

Portal’s built-in Software Catalog centralizes your org’s software components in one place. All those services, websites, and libraries you have scattered across hundreds of repos? With Spotify Portal, your teams will be able to find and manage them all in a single, searchable catalog.

Manage and explore your tech ecosystem

From there, a component’s owner can check its CI/CD status, review its tech health, update its documentation, and more. Teams can also explore components that they don’t own, making it easier to understand dependencies, look up APIs, and reuse solutions someone else has already built. Learn more about the many benefits of a Backstage Software Catalog and how it improves developer experience.

Discoverability, ownership, software quality, hooray

In a nutshell, centralizing your components leads to better discoverability, stronger ownership, and improved software quality across your org. Sounds amazing, right? But how do you actually get all that stuff — all those services, websites, libraries, and ownership data — into the Software Catalog in the first place? Enter the Catalog Wizard.

The wizardry behind the Catalog Wizard

The Software Catalog is based on YAML files stored in each component’s repository. The YAML files contain metadata and other descriptors about the component, including component type (service, website, library, etc.) and the name of the team that owns it. If your org has hundreds or thousands of components, generating each YAML file can be a daunting (and dull) task.

Making YAML appear out of thin air

Portal’s built-in Catalog Wizard automates the tedious parts of adding components to the Software Catalog. By analyzing the CODEOWNERS files in your repos, the wizard proposes components to add and which teams to associate them with. An administrator can then approve these suggestions, upon which the wizard automatically generates pull requests that add the necessary catalog-info.yaml files to all of the repos. You get all the YAML, with none of the typing.

No coding required

The magical part is you don’t even have to know how to spell “YAML” in order to step through this no-code process, let alone touch any of the files themselves. Anyone who has access to your org’s GitHub/GHE can be an administrator and use the Catalog Wizard. Depending on how big or complex your organization is, you could assign an administrator for each of your teams or have one administrator for your whole org.

Screenshots of the Catalog Wizard adding teams, finding components, and creating pull requests

How the Catalog Wizard works:

If you haven’t had the opportunity to try the Portal beta yet, here’s what the Catalog Wizard process looks like:

  1. Connect Portal to your repos. The Portal beta currently supports integration with GitHub/GHE, but we plan to extend support for other version control systems in the future.
  2. Select which teams to onboard. By filtering by team, the wizard can recommend more relevant and accurate proposals for component ownership.
  3. The wizard does its thing. The wizard scans the CODEOWNERS files across connected GitHub/GHE repositories to automatically suggest components to the appropriate teams.
  4. Review and approve. Did the wizard find all the components? Here’s your chance to reject or claim ownership.
  5. The wizard conjures up the YAML. The wizard automatically creates pull requests adding the necessary catalog-info.yaml files for each approved component.
  6. Merge! Merging the PRs populates your new Software Catalog with your teams’ components. Voilá, you have a Backstage Software Catalog.

But as simple as this process sounds, it’s not without its hiccups. It turns out, assigning ownership to components is not as straightforward as you might initially assume.

When magical solutions meet real-world complications

The development of the Catalog Wizard has been an enlightening journey for the Backstage team, particularly with Spotify serving as our test environment. Using the application in the real world validated the functionality of our wizard (it works!), but also unveiled intricate challenges — largely relating to how organizational structures are represented in a repo compared to real life — that we have taken strides to better understand and address.

Making sense of team structures

When an administrator first uses the Setup Wizard to connect Portal with GitHub/GHE, they begin by selecting GitHub organizations relevant to their operations. Subsequently, in the catalog-onboarding-wizard, we extract team data from the chosen organizations via the GitHub API. This data is then ingested into the catalog, with team names formatted as GithubOrgName/GithubTeamName.

To better understand the issues this presents, let’s first define three types of teams:

  • Logical Team: A real-life group of individuals working together.
  • GitHub Team: The digital representation of a logical team, managed within GitHub.
  • Backstage Team: The representation of a logical team within Backstage, inferred from third-party sources like GitHub.

A significant challenge arises when a single logical team is represented across multiple GitHub organizations. For instance, a team known as “Chipmunks” might be present in different GitHub orgs and thus appear as backstage/chipmunks, chipmunks/chipmunks, and tools/chipmunks.

When these are ingested into Backstage, each representation is treated as a distinct team. This multiplicity can lead to ambiguity during the component onboarding process, especially when determining the “owner” in the catalog-info.yaml files, since the system prompts the user to select from multiple similar entries.

How we’re helping you navigate your own teams

Encountering the multiplicity of team names during the setup process is a common issue that could potentially confuse Portal adopters. To help mitigate this, we’ve incorporated a detailed explanation of this scenario in the Troubleshooting section of our product docs. Here, we will guide users on how to navigate such situations and make informed decisions about team selection and component ownership.

Although this is a recognized issue, we have opted not to make changes to the wizard's current functionality during the limited beta. Instead, we are focusing on gathering user feedback and exploring optimal solutions to enhance future versions of the tool. Stay tuned for updates as the beta progresses.

The road ahead: more integrations, smarter suggestions

Our current roadmap includes several developments aimed at making the tool more versatile and effective for a broader range of users and organizational needs.

  • Integrations with more version control systems. Future integrations will include popular platforms such as GitLab and Bitbucket. This expansion will enable more Portal customers to leverage the automation and efficiency of the Catalog Wizard, ensuring a seamless onboarding process regardless of the underlying technology.

  • Broader identity provider support. We plan to make Portal more accessible to organizations of varying security and authentication infrastructures by adding support for LDAP, Microsoft Entra ID, Okta, and Workday.

  • Enhancing component suggestion algorithms. Currently, our wizard suggests components by analyzing CODEOWNERS files. We plan to refine this feature by incorporating additional factors into our suggestion algorithms. One such factor is the commit history of each team concerning specific repositories. By assessing recent commits and other relevant metrics, the wizard can make more informed suggestions, potentially increasing the accuracy and relevance of the components it recommends to teams.

These future plans are part of our commitment to continuously improve Portal, making it both a powerful and simple tool for developers and administrators alike. By expanding our support for various version control systems and identity providers, and by enhancing the intelligence of our component suggestion algorithms, we aim to provide a more inclusive, efficient experience.

Try the beta and follow us for updates

We are excited about the future of the Catalog Wizard and the positive impact it will have on software development practices by making it easier and faster for any organization to get up and running with Spotify Portal for Backstage. Follow us on LinkedIn for all the latest news from Spotify’s Backstage team, and stay tuned for updates as we roll out these enhancements.

If you haven’t already, apply to try Spotify Portal beta for yourself. Our support team is ready to assist users of the beta with more information and assistance. We are dedicated to optimizing your experience with Backstage and ensuring you get the best out of our tools.