Skip to main content

Job Queue Statistics

Overview

A scheduled monitoring task that logs aggregate statistics about Soundcheck's job queue performance. Provides visibility into fact collection throughput, queue backlogs, and worker processing rates.

What it Reports

For each active queue (with non-zero activity):

  • Waiting: Jobs queued but not yet started
  • Active: Jobs currently being processed by workers
  • Completed: Jobs successfully processed since last report (incremental count)
  • Failed: Jobs that failed since last report (incremental count)
  • Delayed: Jobs scheduled for future execution (BullMQ only)
  • Jobs by Type: Breakdown of queued jobs by collector/check type

Configuration

Add to app-config.yaml:

soundcheck:
job:
statistics:
# Enable queue statistics reporting
enabled: true

# Cron expression for reporting frequency (default: every 15 minutes)
reportingFrequencyCron: '*/15 * * * *'

Log Output

Statistics are logged to the backend logger at info level:

=== Job Queue Statistics ===
Total queues: 8 (3 active)

Queue: scm
Waiting: 64715
Active: 1
Completed: 2341
Failed: 12
Delayed: 0
Jobs by type:
soundcheck/collector/scm/0/scm:default/required_files_exist: 32361
soundcheck/collector/scm/1/scm:default/api-report-has-no-edit-warning: 32354

Queue: github
Waiting: 156
Active: 2
Completed: 847
Failed: 3
Jobs by type:
soundcheck/collector/github/0/github:default/branch-protection: 156

=== End Job Queue Statistics ===

Understanding the Metrics

  • Incremental counts: Completed and Failed counts reset after each report, showing jobs processed in the reporting interval
  • Snapshot counts: Waiting, Active, and Delayed show current queue state
  • Idle queues filtered: Only queues with activity are shown to reduce log noise
  • Job type naming: Format is soundcheck/collector/{collector}/{priority}/{namespace}:{scope}/{check-id}

Use Cases

  • Capacity planning: Identify if Soundcheck pods/workers are struggling to keep up with job volume
  • Bottleneck detection: High waiting counts may indicate a need for more pods/workers or reducing job frequency
  • Error monitoring: Failed job counts reveal systematic collection issues
  • Performance validation: Verify expected throughput after worker configuration changes

Troubleshooting

No logs appearing

  • Verify soundcheck.job.statistics.enabled: true in config
  • Check logger level allows info messages
  • Confirm at least one queue has activity (check will be silent if all queues are idle)

High waiting counts

  • Review worker configuration at soundcheck.job.workers.{worker-name}.concurrency
  • Check rate limiter settings at soundcheck.job.workers.{worker-name}.limiter.max and .duration
  • Consider switching from local queues to Redis queues for global rate limiting across instances

Persistent failed counts

  • Review backend logs for job failure stack traces
  • Common causes: API rate limits, network timeouts, invalid check configurations
  • Failed jobs are not automatically retried by default