Skip to main content

Troubleshooting Rate Limiting Issues

Your Portal instance retrieves data from multiple sources in order to provide a single aggregated view of your software ecosystem. As the size and complexity of data grows, the likelihood increases that Portal will run into API rate limits against the services it consumes. When this happens, Portal will typically be unable to fetch data from the affected upstream API until the rate limit resets. In more extreme cases, high-volume batch jobs which are configured to run too frequently may repeatedly consume the entire rate limit quota, resulting in user-initiated requests consistently failing.

Rate limiting issues in Portal occur most frequently with the GitHub integration due to its broad usage across Portal. With that in mind, this page focuses on strategies to address rate limiting issues in GitHub specifically. However, the approaches described are applicable to other integrations, too.

Check Catalog processing interval

The catalog processing interval determines how frequently entities that are already registered in the catalog are updated. The load on the GitHub API from catalog processing grows in line with the number of entities based on GitHub repositories registered in the catalog. We generally suggest setting the processing interval to at least 30 minutes. For larger catalogs, it may be necessary to further increase this interval.

The catalog processing interval configuration can be found in the Catalog section of the Config Manager in Portal.

Check Catalog provider frequencies

Catalog providers scan systems such as GitHub in order to ingest certain classes of entity without manually registering them. Providers are configured with their own frequency configuration. The more frequently a provider runs, the more of the rate limit it will consume.

Portal ships with two built-in providers, which are listed below:

  • The githubOrg provider, which ingests all users and groups in specific organizations on GitHub into the catalog. This provider is enabled by default in order to user and group data in Portal. For GitHub organizations with very large numbers of users and groups, it may be necessary to decrease the frequency at which the provider runs. However, since this provider makes paginated requests to load all users and groups in batches, it consumes a far smaller proportion of the rate limit than other parts of Portal.
  • The github provider, which scans repositories for catalog-info.yaml files and automatically ingests them into the catalog. This provider is disabled by default in Portal.

You can control how frequently catalog providers run by adjusting the providers.<provider>.schedule.frequency configuration in the catalog section of the configuration manager. We recommend running providers no more frequently than once every 15 minutes by default. It may be necessary to further decrease this frequency when the volume of data loaded by the provider is large.

Soundcheck

In Soundcheck, Fact Collectors collect raw data (facts) about your components. There are two fact collectors that collect data from GitHub: GitHub and SCM (Source Code Management). To help get you started, Soundcheck comes with pre-loaded configurations for these integrations which are intended to be starting points for you to build upon and customize to fit your organization's needs. When modifying these integrations, there are a couple of things to keep in mind to avoid exceeding GitHub rate limits. Soundcheck includes ETags in requests to GitHub, which means that unmodified responses usually don't count towards the rate limit. However, it's always a good idea to optimize the number of the requests to GitHub. With that in mind, you can find some useful tips below.

  • Fact Details (File Path): When configuring the SCM collector's file path using a glob pattern, make sure it's not too "broad". The more files match the pattern, the more requests will be made to GitHub for a single entity (1 matching file = 1 request).

  • Frequency: When configuring the GitHub or SCM collector's frequency, make sure that the facts are not collected too often. In most cases, collecting facts once a day is enough. It's also recommended to use a cron schedule instead of a regular interval, so that the collection of different facts can be spaced apart in time, and, subsequently, the requests to GitHub will be spaced apart.

  • Filters: When configuring the GitHub or SCM collector's filters, make sure the filters are not too "broad", so that the facts are collected only for the entities for which you're building checks. The more entities the collector applies to, the more requests will be made to GitHub.

  • Caching: Consider enabling fact caching if you have checks that are using a combination of different facts. When a fact is collected, all checks that depend on it will be executed, and the checks that depend on multiple facts will collect the remaining facts outside of their normal schedule if these facts are not available in cache. This will cause extra requests to GitHub.

Configuring Soundcheck rate limiting for the GitHub & SCM fact collectors

The GitHub and SCM fact collectors can be rate limited in Soundcheck by specifying the max number of requests to collect a fact within a specified duration (in milliseconds). Each request to collect a fact usually results in 1 call to the GitHub API. Go to Config Manager > Soundcheck and update the configuration under soundcheck.job.workers.github (or soundcheck.job.workers.scm for the SCM collector). Soundcheck will automatically wait and retry requests that are rate limited. The documentation for this feature can be found here (GitHub collector) and here (SCM collector).

Checks

The checks that use GitHub / SCM facts should not be scheduled. When Soundcheck collects a fact, it will automatically execute all checks that depend on that fact, making scheduling checks unnecessary. If you schedule checks that use GitHub / SCM facts, such facts will be collected on both the collector's and the check's schedules, causing extra requests to GitHub. Additionally, the Soundcheck rate limiting configuration described above doesn't account for the requests emitted on a check schedule.

Obtaining a higher rate limit

The GitHub documentation provides some strategies for obtaining a higher rate limit. The simplest way to do this is by moving from a personal access token to a GitHub App for authentication. This process is documented for Portal here.

Users of GitHub enterprise server can configure higher rate limits directly.