Job Queue Statistics
Overview
A scheduled monitoring task that logs aggregate statistics about Soundcheck's job queue performance. Provides visibility into fact collection throughput, queue backlogs, and worker processing rates.
What it Reports
For each active queue (with non-zero activity):
- Waiting: Jobs queued but not yet started
- Active: Jobs currently being processed by workers
- Completed: Jobs successfully processed since last report (incremental count)
- Failed: Jobs that failed since last report (incremental count)
- Delayed: Jobs scheduled for future execution (BullMQ only)
- Jobs by Type: Breakdown of queued jobs by collector/check type
Configuration
Add to app-config.yaml:
soundcheck:
job:
statistics:
# Enable queue statistics reporting
enabled: true
# Cron expression for reporting frequency (default: every 15 minutes)
reportingFrequencyCron: '*/15 * * * *'
Log Output
Statistics are logged to the backend logger at info level:
=== Job Queue Statistics ===
Total queues: 8 (3 active)
Queue: scm
Waiting: 64715
Active: 1
Completed: 2341
Failed: 12
Delayed: 0
Jobs by type:
soundcheck/collector/scm/0/scm:default/required_files_exist: 32361
soundcheck/collector/scm/1/scm:default/api-report-has-no-edit-warning: 32354
Queue: github
Waiting: 156
Active: 2
Completed: 847
Failed: 3
Jobs by type:
soundcheck/collector/github/0/github:default/branch-protection: 156
=== End Job Queue Statistics ===
Understanding the Metrics
- Incremental counts: Completed and Failed counts reset after each report, showing jobs processed in the reporting interval
- Snapshot counts: Waiting, Active, and Delayed show current queue state
- Idle queues filtered: Only queues with activity are shown to reduce log noise
- Job type naming: Format is
soundcheck/collector/{collector}/{priority}/{namespace}:{scope}/{check-id}
Use Cases
- Capacity planning: Identify if Soundcheck pods/workers are struggling to keep up with job volume
- Bottleneck detection: High waiting counts may indicate a need for more pods/workers or reducing job frequency
- Error monitoring: Failed job counts reveal systematic collection issues
- Performance validation: Verify expected throughput after worker configuration changes
Troubleshooting
No logs appearing
- Verify
soundcheck.job.statistics.enabled: truein config - Check logger level allows info messages
- Confirm at least one queue has activity (check will be silent if all queues are idle)
High waiting counts
- Review worker configuration at
soundcheck.job.workers.{worker-name}.concurrency - Check rate limiter settings at
soundcheck.job.workers.{worker-name}.limiter.maxand.duration - Consider switching from local queues to Redis queues for global rate limiting across instances
Persistent failed counts
- Review backend logs for job failure stack traces
- Common causes: API rate limits, network timeouts, invalid check configurations
- Failed jobs are not automatically retried by default