Prometheus Metrics

Qtap exposes Prometheus-compatible metrics endpoints for monitoring traffic activity and agent health. These metrics enable you to build dashboards, set up alerts, and track system performance.

Configuration Requirement

To enable HTTP-level Prometheus metrics, you must add the http_metrics plugin to your stack:

stacks:
  my_stack:
    plugins:
      - type: http_capture  # Optional
        config:
          level: summary
      - type: http_metrics  # REQUIRED for Prometheus metrics

Without this plugin, only connection-level metrics will be available.

Metrics Endpoints

Qtap provides two metrics endpoints on localhost:10001:

/metrics

Application metrics for monitoring HTTP traffic activity on your system. Use these metrics to build dashboards tracking:

  • Request rates and volumes

  • Error rates and status code distributions

  • Response times and latency percentiles

  • Request/response payload sizes

  • Traffic patterns by host, method, and protocol

/system/metrics

System metrics related to the health and performance of the qtap agent itself. These include:

  • eBPF program performance

  • Memory and CPU usage

  • Event processing rates

  • Internal error counts

Accessing Metrics

From the Host

curl http://localhost:10001/metrics
curl http://localhost:10001/system/metrics

From Docker Container

When running qtap in Docker, expose port 10001:

docker run -d --name qtap \
  -p 10001:10001 \
  # ... other flags ...
  us-docker.pkg.dev/qpoint-edge/public/qtap:v0

Then access from the host:

curl http://localhost:10001/metrics

From Kubernetes

Forward the metrics port to your local machine:

kubectl port-forward pod/qtap-xxxxx 10001:10001

Then access locally:

curl http://localhost:10001/metrics

Key Application Metrics

The /metrics endpoint provides these HTTP traffic metrics:

Request Metrics

qtap_http_requests_total

  • Type: Counter

  • Description: Total number of HTTP requests observed

  • Labels: host, method, protocol, status_code

  • Use: Calculate request rates, error rates, and traffic distribution

qtap_http_requests_duration_ms_bucket

  • Type: Histogram

  • Description: Request duration distribution in milliseconds

  • Labels: host, method, protocol, status_code, le (histogram bucket)

  • Use: Calculate latency percentiles (p50, p95, p99)

qtap_http_requests_duration_ms_sum

  • Type: Counter

  • Description: Total request processing time in milliseconds

  • Labels: host, method, protocol, status_code

  • Use: Calculate average request duration

qtap_http_requests_duration_ms_count

  • Type: Counter

  • Description: Total number of requests measured for duration

  • Labels: host, method, protocol, status_code

  • Use: Paired with _sum for average calculations

qtap_http_requests_size_bytes_bucket

  • Type: Histogram

  • Description: Request payload size distribution in bytes

  • Labels: host, method, protocol, status_code, le

  • Use: Calculate payload size percentiles

qtap_http_requests_size_bytes_sum

  • Type: Counter

  • Description: Total size of HTTP request payloads in bytes

  • Labels: host, method, protocol, status_code

  • Use: Calculate average request size

qtap_http_requests_size_bytes_count

  • Type: Counter

  • Description: Number of requests measured for size

  • Labels: host, method, protocol, status_code

  • Use: Paired with _sum for average calculations

Response Metrics

qtap_http_responses_total

  • Type: Counter

  • Description: Total number of HTTP responses observed

  • Labels: host, method, protocol, status_code

  • Use: Track response counts

qtap_http_responses_duration_ms_bucket

  • Type: Histogram

  • Description: Response duration distribution (TTFB) in milliseconds

  • Labels: host, method, protocol, status_code, le

  • Use: Calculate response time percentiles

qtap_http_responses_duration_ms_sum

  • Type: Counter

  • Description: Total response time in milliseconds (TTFB - Time To First Byte)

  • Labels: host, method, protocol, status_code

  • Use: Calculate average response times

qtap_http_responses_duration_ms_count

  • Type: Counter

  • Description: Number of responses measured for duration

  • Labels: host, method, protocol, status_code

  • Use: Paired with _sum for average calculations

qtap_http_responses_size_bytes_bucket

  • Type: Histogram

  • Description: Response payload size distribution in bytes

  • Labels: host, method, protocol, status_code, le

  • Use: Calculate payload size percentiles

qtap_http_responses_size_bytes_sum

  • Type: Counter

  • Description: Total size of HTTP response payloads in bytes

  • Labels: host, method, protocol, status_code

  • Use: Calculate average response size

qtap_http_responses_size_bytes_count

  • Type: Counter

  • Description: Number of responses measured for size

  • Labels: host, method, protocol, status_code

  • Use: Paired with _sum for average calculations

Combined Metrics

qtap_http_duration_ms_bucket

  • Type: Histogram

  • Description: Combined request+response duration distribution in milliseconds

  • Labels: None (aggregates all traffic)

  • Use: Calculate overall latency percentiles across all hosts

qtap_http_duration_ms_sum

  • Type: Counter

  • Description: Total combined request+response time in milliseconds

  • Labels: None

  • Use: Calculate overall average transaction duration

qtap_http_duration_ms_count

  • Type: Counter

  • Description: Total number of completed HTTP transactions

  • Labels: None

  • Use: Paired with _sum for average calculations

Connection-Level Metrics

These metrics are always available without the http_metrics plugin:

qtap_connection_open_total

  • Type: Counter

  • Description: Total number of connections opened

  • Labels: direction, remote_addr, remote_port

qtap_connection_close_total

  • Type: Counter

  • Description: Total number of connections closed

  • Labels: direction, remote_addr, remote_port

qtap_connection_active_total

  • Type: Gauge

  • Description: Number of currently active connections

  • Labels: direction, remote_addr, remote_port

  • Use: Monitor real-time connection usage by destination

qtap_connection_protocol_total

  • Type: Counter

  • Description: Total connections by protocol type

  • Labels: protocol

  • Use: Track distribution of connection protocols (http1, http2, etc.)

qtap_connection_bytes_sent_total

  • Type: Counter

  • Description: Total bytes sent across all connections

  • Labels: direction, remote_addr, remote_port

qtap_connection_bytes_recv_total

  • Type: Counter

  • Description: Total bytes received across all connections

  • Labels: direction, remote_addr, remote_port

qtap_connection_duration_ms

  • Type: Histogram

  • Description: Connection duration distribution in milliseconds

  • Labels: direction, remote_addr, remote_port, le

qtap_connection_tls_handshake_total

  • Type: Counter

  • Description: Total number of TLS handshakes completed

  • Labels: direction, remote_addr, remote_port, sni, version

  • Use: Track TLS/HTTPS connection establishment by SNI and TLS version

Common Queries

Request Rate (per second)

rate(qtap_http_requests_total[5m])

Error Rate (4xx and 5xx responses)

sum(rate(qtap_http_requests_total{status_code=~"4..|5.."}[5m]))
/
sum(rate(qtap_http_requests_total[5m]))

Average Response Time

rate(qtap_http_responses_duration_ms_sum[5m])
/
rate(qtap_http_responses_duration_ms_count[5m])

95th Percentile Latency

histogram_quantile(0.95,
  rate(qtap_http_requests_duration_ms_bucket[5m])
)

Top Hosts by Request Volume

topk(10, sum by (host) (rate(qtap_http_requests_total[5m])))

Request Rate by Method

sum by (method) (rate(qtap_http_requests_total[5m]))

Overall Transaction Latency (Combined)

# Average overall transaction time
rate(qtap_http_duration_ms_sum[5m])
/
rate(qtap_http_duration_ms_count[5m])

# 95th percentile overall latency
histogram_quantile(0.95,
  rate(qtap_http_duration_ms_bucket[5m])
)

Active Connections

# Total active connections
sum(qtap_connection_active_total)

# Active connections by destination
topk(10, qtap_connection_active_total)

# Active connections by direction
sum by (direction) (qtap_connection_active_total)

TLS Handshake Rate

# Overall handshake rate
sum(rate(qtap_connection_tls_handshake_total[5m]))

# Handshake rate by SNI (domain)
sum by (sni) (rate(qtap_connection_tls_handshake_total[5m]))

# Handshake rate by TLS version
sum by (version) (rate(qtap_connection_tls_handshake_total[5m]))

Connection Throughput

# Bytes sent per second
rate(qtap_connection_bytes_sent_total[5m])

# Bytes received per second
rate(qtap_connection_bytes_recv_total[5m])

Grafana Dashboard

Qpoint provides a sample Grafana dashboard that visualizes these metrics:

Dashboard: qtap-http-overview.json

The dashboard includes:

  • Request Rate: Overall HTTP request volume over time

  • Error Rate: Percentage of failed requests (4xx/5xx)

  • Response Time: Average response duration and TTFB

  • Latency Percentiles: p50, p95, p99 request duration

  • Request/Response Sizes: Average payload sizes

  • Traffic by Host: Request distribution across hosts

  • Status Code Distribution: Breakdown of HTTP status codes

Importing the Dashboard

  1. Download the dashboard JSON from the link above

  2. In Grafana, navigate to DashboardsImport

  3. Upload the JSON file or paste its contents

  4. Configure your Prometheus datasource

  5. Save the dashboard

Prometheus Configuration

Add qtap as a scrape target in your prometheus.yml:

scrape_configs:
  - job_name: 'qtap'
    static_configs:
      - targets: ['localhost:10001']
    metrics_path: '/metrics'
    scrape_interval: 15s

  - job_name: 'qtap-system'
    static_configs:
      - targets: ['localhost:10001']
    metrics_path: '/system/metrics'
    scrape_interval: 30s

For Kubernetes deployments, use ServiceMonitor (Prometheus Operator) or pod annotations for automatic discovery:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "10001"
  prometheus.io/path: "/metrics"

Alerting Examples

High Error Rate Alert

groups:
  - name: qtap
    rules:
      - alert: HighHTTPErrorRate
        expr: |
          (
            sum(rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
            /
            sum(rate(qtap_http_requests_total[5m]))
          ) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High HTTP 5xx error rate"
          description: "Error rate is {{ $value | humanizePercentage }}"

High Latency Alert

- alert: HighResponseLatency
  expr: |
    histogram_quantile(0.95,
      rate(qtap_http_requests_duration_ms_bucket[5m])
    ) > 1000
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "High response latency detected"
    description: "95th percentile latency is {{ $value }}ms"

Best Practices

  1. Use labels wisely: The host, method, protocol, and status_code labels enable powerful filtering but can increase cardinality. Filter noisy traffic if needed.

  2. Set appropriate scrape intervals: 15-30 seconds is typical for application metrics. System metrics can be scraped less frequently.

  3. Use recording rules: Pre-compute expensive queries like percentiles to improve dashboard performance.

  4. Monitor agent health: Track /system/metrics to ensure qtap itself is healthy and not experiencing backpressure.

  5. Set up alerts: Don't just collect metrics - alert on anomalies like error rate spikes or latency increases.

Troubleshooting

Metrics endpoint not accessible

Check that qtap is running and port 10001 is accessible:

docker logs qtap
netstat -tlnp | grep 10001

No metrics appearing

Verify qtap is capturing traffic:

# Check qtap logs for capture activity
docker logs qtap | grep -i http

# Generate test traffic
curl https://httpbin.org/get

# Check metrics again
curl http://localhost:10001/metrics | grep qtap_http_requests_total

High cardinality issues

If you have too many unique label combinations:

  1. Use filters to exclude noisy processes (see Traffic Capture Settings)

  2. Aggregate metrics in Prometheus recording rules

  3. Limit capture to specific hosts with endpoints configuration

Last updated