Overview
A scheduled monitoring task that logs aggregate statistics about Soundcheck’s job queue performance. Provides visibility into fact collection throughput, queue backlogs, and worker processing rates.What it Reports
For each active queue (with non-zero activity):- Waiting: Jobs queued but not yet started
- Active: Jobs currently being processed by workers
- Completed: Jobs successfully processed since last report (incremental count)
- Failed: Jobs that failed since last report (incremental count)
- Delayed: Jobs scheduled for future execution (BullMQ only)
- Jobs by Type: Breakdown of queued jobs by collector/check type
Configuration
Add toapp-config.yaml:
Log Output
Statistics are logged to the backend logger at info level:Understanding the Metrics
- Incremental counts: Completed and Failed counts reset after each report, showing jobs processed in the reporting interval
- Snapshot counts: Waiting, Active, and Delayed show current queue state
- Idle queues filtered: Only queues with activity are shown to reduce log noise
- Job type naming: Format is
soundcheck/collector/{collector}/{priority}/{namespace}:{scope}/{check-id}
Use Cases
- Capacity planning: Identify if Soundcheck pods/workers are struggling to keep up with job volume
- Bottleneck detection: High waiting counts may indicate a need for more pods/workers or reducing job frequency
- Error monitoring: Failed job counts reveal systematic collection issues
- Performance validation: Verify expected throughput after worker configuration changes
Troubleshooting
No logs appearing
- Verify
soundcheck.job.statistics.enabled: truein config - Check logger level allows info messages
- Confirm at least one queue has activity (check will be silent if all queues are idle)
High waiting counts
- Review worker configuration at
soundcheck.job.workers.{worker-name}.concurrency - Check rate limiter settings at
soundcheck.job.workers.{worker-name}.limiter.maxand.duration - Consider switching from local queues to Redis queues for global rate limiting across instances
Persistent failed counts
- Review backend logs for job failure stack traces
- Common causes: API rate limits, network timeouts, invalid check configurations
- Failed jobs are not automatically retried by default