Prometheus Metrics

Qtap exposes Prometheus-compatible metrics endpoints for monitoring traffic activity and agent health. These metrics enable you to build dashboards, set up alerts, and track system performance.

circle-exclamation

Configuration Requirement

To enable HTTP-level Prometheus metrics, you must add the http_metrics plugin to your stack:

stacks:
  my_stack:
    plugins:
      - type: http_capture  # Optional
        config:
          level: summary
      - type: http_metrics  # REQUIRED for Prometheus metrics

Without this plugin, only connection-level metrics will be available.

Metrics Endpoints

Qtap provides two metrics endpoints on localhost:10001:

/metrics

Application metrics for monitoring HTTP traffic activity on your system. Use these metrics to build dashboards tracking:

  • Request rates and volumes

  • Error rates and status code distributions

  • Response times and latency percentiles

  • Request/response payload sizes

  • Traffic patterns by host, method, and protocol

/system/metrics

System metrics related to the health and performance of the qtap agent itself. These include:

  • eBPF program performance

  • Memory and CPU usage

  • Event processing rates

  • Internal error counts

Accessing Metrics

From the Host

From Docker Container

When running qtap in Docker, expose port 10001:

Then access from the host:

From Kubernetes

Forward the metrics port to your local machine:

Then access locally:

Key Application Metrics

The /metrics endpoint provides these HTTP traffic metrics:

Request Metrics

qtap_http_requests_total

  • Type: Counter

  • Description: Total number of HTTP requests observed

  • Labels: host, method, protocol, status_code

  • Use: Calculate request rates, error rates, and traffic distribution

qtap_http_requests_duration_ms_bucket

  • Type: Histogram

  • Description: Request duration distribution in milliseconds

  • Labels: host, method, protocol, status_code, le (histogram bucket)

  • Use: Calculate latency percentiles (p50, p95, p99)

qtap_http_requests_duration_ms_sum

  • Type: Counter

  • Description: Total request processing time in milliseconds

  • Labels: host, method, protocol, status_code

  • Use: Calculate average request duration

qtap_http_requests_duration_ms_count

  • Type: Counter

  • Description: Total number of requests measured for duration

  • Labels: host, method, protocol, status_code

  • Use: Paired with _sum for average calculations

qtap_http_requests_size_bytes_bucket

  • Type: Histogram

  • Description: Request payload size distribution in bytes

  • Labels: host, method, protocol, status_code, le

  • Use: Calculate payload size percentiles

qtap_http_requests_size_bytes_sum

  • Type: Counter

  • Description: Total size of HTTP request payloads in bytes

  • Labels: host, method, protocol, status_code

  • Use: Calculate average request size

qtap_http_requests_size_bytes_count

  • Type: Counter

  • Description: Number of requests measured for size

  • Labels: host, method, protocol, status_code

  • Use: Paired with _sum for average calculations

Response Metrics

qtap_http_responses_total

  • Type: Counter

  • Description: Total number of HTTP responses observed

  • Labels: host, method, protocol, status_code

  • Use: Track response counts

qtap_http_responses_duration_ms_bucket

  • Type: Histogram

  • Description: Response duration distribution (TTFB) in milliseconds

  • Labels: host, method, protocol, status_code, le

  • Use: Calculate response time percentiles

qtap_http_responses_duration_ms_sum

  • Type: Counter

  • Description: Total response time in milliseconds (TTFB - Time To First Byte)

  • Labels: host, method, protocol, status_code

  • Use: Calculate average response times

qtap_http_responses_duration_ms_count

  • Type: Counter

  • Description: Number of responses measured for duration

  • Labels: host, method, protocol, status_code

  • Use: Paired with _sum for average calculations

qtap_http_responses_size_bytes_bucket

  • Type: Histogram

  • Description: Response payload size distribution in bytes

  • Labels: host, method, protocol, status_code, le

  • Use: Calculate payload size percentiles

qtap_http_responses_size_bytes_sum

  • Type: Counter

  • Description: Total size of HTTP response payloads in bytes

  • Labels: host, method, protocol, status_code

  • Use: Calculate average response size

qtap_http_responses_size_bytes_count

  • Type: Counter

  • Description: Number of responses measured for size

  • Labels: host, method, protocol, status_code

  • Use: Paired with _sum for average calculations

Combined Metrics

qtap_http_duration_ms_bucket

  • Type: Histogram

  • Description: Combined request+response duration distribution in milliseconds

  • Labels: None (aggregates all traffic)

  • Use: Calculate overall latency percentiles across all hosts

qtap_http_duration_ms_sum

  • Type: Counter

  • Description: Total combined request+response time in milliseconds

  • Labels: None

  • Use: Calculate overall average transaction duration

qtap_http_duration_ms_count

  • Type: Counter

  • Description: Total number of completed HTTP transactions

  • Labels: None

  • Use: Paired with _sum for average calculations

Connection-Level Metrics

These metrics are always available without the http_metrics plugin:

qtap_connection_open_total

  • Type: Counter

  • Description: Total number of connections opened

  • Labels: direction, remote_addr, remote_port

qtap_connection_close_total

  • Type: Counter

  • Description: Total number of connections closed

  • Labels: direction, remote_addr, remote_port

qtap_connection_active_total

  • Type: Gauge

  • Description: Number of currently active connections

  • Labels: direction, remote_addr, remote_port

  • Use: Monitor real-time connection usage by destination

qtap_connection_protocol_total

  • Type: Counter

  • Description: Total connections by protocol type

  • Labels: protocol

  • Use: Track distribution of connection protocols (http1, http2, etc.)

qtap_connection_bytes_sent_total

  • Type: Counter

  • Description: Total bytes sent across all connections

  • Labels: direction, remote_addr, remote_port

qtap_connection_bytes_recv_total

  • Type: Counter

  • Description: Total bytes received across all connections

  • Labels: direction, remote_addr, remote_port

qtap_connection_duration_ms

  • Type: Histogram

  • Description: Connection duration distribution in milliseconds

  • Labels: direction, remote_addr, remote_port, le

qtap_connection_tls_handshake_total

  • Type: Counter

  • Description: Total number of TLS handshakes completed

  • Labels: direction, remote_addr, remote_port, sni, version

  • Use: Track TLS/HTTPS connection establishment by SNI and TLS version

Common Queries

Request Rate (per second)

Error Rate (4xx and 5xx responses)

Average Response Time

95th Percentile Latency

Top Hosts by Request Volume

Request Rate by Method

Overall Transaction Latency (Combined)

Active Connections

TLS Handshake Rate

Connection Throughput

Grafana Dashboard

Qpoint provides a sample Grafana dashboard that visualizes these metrics:

Dashboard: qtap-http-overview.jsonarrow-up-right

The dashboard includes:

  • Request Rate: Overall HTTP request volume over time

  • Error Rate: Percentage of failed requests (4xx/5xx)

  • Response Time: Average response duration and TTFB

  • Latency Percentiles: p50, p95, p99 request duration

  • Request/Response Sizes: Average payload sizes

  • Traffic by Host: Request distribution across hosts

  • Status Code Distribution: Breakdown of HTTP status codes

Importing the Dashboard

  1. Download the dashboard JSON from the link above

  2. In Grafana, navigate to DashboardsImport

  3. Upload the JSON file or paste its contents

  4. Configure your Prometheus datasource

  5. Save the dashboard

Prometheus Configuration

Add qtap as a scrape target in your prometheus.yml:

For Kubernetes deployments, use ServiceMonitor (Prometheus Operator) or pod annotations for automatic discovery:

Alerting Examples

High Error Rate Alert

High Latency Alert

Best Practices

  1. Use labels wisely: The host, method, protocol, and status_code labels enable powerful filtering but can increase cardinality. Filter noisy traffic if needed.

  2. Set appropriate scrape intervals: 15-30 seconds is typical for application metrics. System metrics can be scraped less frequently.

  3. Use recording rules: Pre-compute expensive queries like percentiles to improve dashboard performance.

  4. Monitor agent health: Track /system/metrics to ensure qtap itself is healthy and not experiencing backpressure.

  5. Set up alerts: Don't just collect metrics - alert on anomalies like error rate spikes or latency increases.

Troubleshooting

Metrics endpoint not accessible

Check that qtap is running and port 10001 is accessible:

No metrics appearing

Verify qtap is capturing traffic:

High cardinality issues

If you have too many unique label combinations:

  1. Use filters to exclude noisy processes (see Traffic Capture Settings)

  2. Aggregate metrics in Prometheus recording rules

  3. Limit capture to specific hosts with endpoints configuration

Last updated