Troubleshooting Rate Limiting Issues
Your Portal instance retrieves data from multiple sources to provide a single aggregated view of your software ecosystem. As the size and complexity of data grows, Portal may encounter API rate limits against the services it consumes.
Rate limiting issues occur most frequently with the GitHub integration due to its broad usage across Portal. While this guide focuses on GitHub strategies, the approaches described are applicable to other integrations as well.
Understanding Rate Limiting Issues
When Portal hits rate limits, it will typically be unable to fetch data from the affected upstream API until the rate limit resets. In more extreme cases, high-volume batch jobs configured to run too frequently may repeatedly consume the entire rate limit quota, resulting in user-initiated requests consistently failing.
Optimization Strategies
Check Catalog Processing Interval
The catalog processing interval determines how frequently entities already registered in the catalog are updated. The load on the GitHub API from catalog processing grows in line with the number of entities based on GitHub repositories.
We generally suggest:
- Setting the processing interval to at least 30 minutes
- For larger catalogs, further increasing this interval
The catalog processing interval configuration can be found in the Catalog section of the Config Manager in Portal.
Check Catalog Provider Frequencies
Catalog providers scan systems such as GitHub to ingest certain classes of entity without manual registration. Each provider has its own frequency
configuration that affects rate limit consumption.
Portal ships with two built-in providers:
-
The
githubOrg
provider- Ingests all users and groups from specific GitHub organizations
- Enabled by default for user and group data in Portal
- Makes paginated requests to load data in batches
- Consumes a smaller proportion of the rate limit than other parts of Portal
-
The
github
provider- Scans repositories for catalog-info.yaml files
- Disabled by default in Portal
You can control how frequently catalog providers run by adjusting the providers.<provider>.schedule.frequency
configuration in the catalog section of the Config Manager.
We recommend running providers no more frequently than once every 15 minutes by default. For large data volumes, consider decreasing this frequency further.
Soundcheck Optimization
In Soundcheck, Fact Collectors gather raw data about your components. Two fact collectors interact with GitHub: GitHub and SCM (Source Code Management).
Optimizing Fact Collection
When modifying these integrations, consider these optimization strategies:
-
Fact Details (File Path): When configuring the SCM collector's file path using glob patterns, avoid overly broad patterns. Each matching file requires one request to GitHub.
-
Collection Frequency: Ensure facts aren't collected too often. In most cases, collecting facts once a day is sufficient. Consider using cron schedules instead of regular intervals to space requests over time.
-
Filters: Use specific filters so facts are collected only for relevant entities. Broader filters mean more GitHub requests.
-
Caching: Enable fact caching for checks using multiple facts to prevent extra GitHub requests when checks execute.
Configuring Rate Limiting for Collectors
The GitHub and SCM fact collectors can be rate limited in Soundcheck by specifying the maximum number of requests within a specified duration (in milliseconds).
To configure this:
- Go to Config Manager > Soundcheck
- Update the configuration under:
soundcheck.job.workers.github
(for GitHub collector)soundcheck.job.workers.scm
(for SCM collector)
Soundcheck will automatically wait and retry rate-limited requests. For detailed documentation, see:
Check Scheduling
Checks that use GitHub/SCM facts should not be scheduled. When Soundcheck collects a fact, it automatically executes all dependent checks.
Scheduling these checks would cause facts to be collected on both the collector's and check's schedules, generating extra GitHub requests.
Obtaining Higher Rate Limits
GitHub provides strategies for obtaining higher rate limits. The simplest approach is moving from a personal access token to a GitHub App for authentication, which is documented for Portal here.
GitHub Enterprise Server users can configure higher rate limits directly.
Event-Driven Updates
An effective strategy for avoiding rate-limit issues is to use Event-Driven Catalog Component Updates instead of relying solely on scheduled catalog scans.
Benefits of Event-Driven Updates
- React in real-time to GitHub repository changes
- Significantly reduce API calls
- Lower scheduled scan frequency
- Decrease likelihood of hitting rate limits
For detailed setup instructions, see the Event-Driven Catalog Component Updates guide. For detailed setup instructions, see the Event-Driven Catalog Component Updates guide.