Prometheus Metrics

Qtap exposes Prometheus-compatible metrics endpoints for monitoring traffic activity and agent health. These metrics enable you to build dashboards, set up alerts, and track system performance.

REQUIRED: HTTP-level metrics require the http_metrics plugin in your stack configuration. See Configuration Requirement below.

Configuration Requirement

To enable HTTP-level Prometheus metrics, you must add the http_metrics plugin to your stack:

stacks:
  my_stack:
    plugins:
      - type: http_capture  # Optional
        config:
          level: summary
      - type: http_metrics  # REQUIRED for Prometheus metrics

Without this plugin, only connection-level metrics will be available.

Metrics Endpoints

Qtap provides two metrics endpoints on localhost:10001:

`/metrics`

Application metrics for monitoring HTTP traffic activity on your system. Use these metrics to build dashboards tracking:

Request rates and volumes
Error rates and status code distributions
Response times and latency percentiles
Request/response payload sizes
Traffic patterns by host, method, and protocol

`/system/metrics`

System metrics related to the health and performance of the qtap agent itself. These include:

eBPF program performance
Memory and CPU usage
Event processing rates
Internal error counts

Accessing Metrics

From the Host

curl http://localhost:10001/metrics
curl http://localhost:10001/system/metrics

From Docker Container

When running qtap in Docker, expose port 10001:

docker run -d --name qtap \
  -p 10001:10001 \
  # ... other flags ...
  us-docker.pkg.dev/qpoint-edge/public/qtap:v0

Then access from the host:

curl http://localhost:10001/metrics

From Kubernetes

Forward the metrics port to your local machine:

kubectl port-forward pod/qtap-xxxxx 10001:10001

Then access locally:

curl http://localhost:10001/metrics

Key Application Metrics

The /metrics endpoint provides these HTTP traffic metrics:

Request Metrics

qtap_http_requests_total

Type: Counter
Description: Total number of HTTP requests observed
Labels: host, method, protocol, status_code
Use: Calculate request rates, error rates, and traffic distribution

qtap_http_requests_duration_ms_bucket

Type: Histogram
Description: Request duration distribution in milliseconds
Labels: host, method, protocol, status_code, le (histogram bucket)
Use: Calculate latency percentiles (p50, p95, p99)

qtap_http_requests_duration_ms_sum

Type: Counter
Description: Total request processing time in milliseconds
Labels: host, method, protocol, status_code
Use: Calculate average request duration

qtap_http_requests_duration_ms_count

Type: Counter
Description: Total number of requests measured for duration
Labels: host, method, protocol, status_code
Use: Paired with _sum for average calculations

qtap_http_requests_size_bytes_bucket

Type: Histogram
Description: Request payload size distribution in bytes
Labels: host, method, protocol, status_code, le
Use: Calculate payload size percentiles

qtap_http_requests_size_bytes_sum

Type: Counter
Description: Total size of HTTP request payloads in bytes
Labels: host, method, protocol, status_code
Use: Calculate average request size

qtap_http_requests_size_bytes_count

Type: Counter
Description: Number of requests measured for size
Labels: host, method, protocol, status_code
Use: Paired with _sum for average calculations

Response Metrics

qtap_http_responses_total

Type: Counter
Description: Total number of HTTP responses observed
Labels: host, method, protocol, status_code
Use: Track response counts

qtap_http_responses_duration_ms_bucket

Type: Histogram
Description: Response duration distribution (TTFB) in milliseconds
Labels: host, method, protocol, status_code, le
Use: Calculate response time percentiles

qtap_http_responses_duration_ms_sum

Type: Counter
Description: Total response time in milliseconds (TTFB - Time To First Byte)
Labels: host, method, protocol, status_code
Use: Calculate average response times

qtap_http_responses_duration_ms_count

Type: Counter
Description: Number of responses measured for duration
Labels: host, method, protocol, status_code
Use: Paired with _sum for average calculations

qtap_http_responses_size_bytes_bucket

Type: Histogram
Description: Response payload size distribution in bytes
Labels: host, method, protocol, status_code, le
Use: Calculate payload size percentiles

qtap_http_responses_size_bytes_sum

Type: Counter
Description: Total size of HTTP response payloads in bytes
Labels: host, method, protocol, status_code
Use: Calculate average response size

qtap_http_responses_size_bytes_count

Type: Counter
Description: Number of responses measured for size
Labels: host, method, protocol, status_code
Use: Paired with _sum for average calculations

Combined Metrics

qtap_http_duration_ms_bucket

Type: Histogram
Description: Combined request+response duration distribution in milliseconds
Labels: None (aggregates all traffic)
Use: Calculate overall latency percentiles across all hosts

qtap_http_duration_ms_sum

Type: Counter
Description: Total combined request+response time in milliseconds
Labels: None
Use: Calculate overall average transaction duration

qtap_http_duration_ms_count

Type: Counter
Description: Total number of completed HTTP transactions
Labels: None
Use: Paired with _sum for average calculations

Connection-Level Metrics

These metrics are always available without the http_metrics plugin:

qtap_connection_open_total

Type: Counter
Description: Total number of connections opened
Labels: direction, remote_addr, remote_port

qtap_connection_close_total

Type: Counter
Description: Total number of connections closed
Labels: direction, remote_addr, remote_port

qtap_connection_active_total

Type: Gauge
Description: Number of currently active connections
Labels: direction, remote_addr, remote_port
Use: Monitor real-time connection usage by destination

qtap_connection_protocol_total

Type: Counter
Description: Total connections by protocol type
Labels: protocol
Use: Track distribution of connection protocols (http1, http2, etc.)

qtap_connection_bytes_sent_total

Type: Counter
Description: Total bytes sent across all connections
Labels: direction, remote_addr, remote_port

qtap_connection_bytes_recv_total

Type: Counter
Description: Total bytes received across all connections
Labels: direction, remote_addr, remote_port

qtap_connection_duration_ms

Type: Histogram
Description: Connection duration distribution in milliseconds
Labels: direction, remote_addr, remote_port, le

qtap_connection_tls_handshake_total

Type: Counter
Description: Total number of TLS handshakes completed
Labels: direction, remote_addr, remote_port, sni, version
Use: Track TLS/HTTPS connection establishment by SNI and TLS version

Common Queries

Request Rate (per second)

rate(qtap_http_requests_total[5m])

Error Rate (4xx and 5xx responses)

sum(rate(qtap_http_requests_total{status_code=~"4..|5.."}[5m]))
/
sum(rate(qtap_http_requests_total[5m]))

Average Response Time

rate(qtap_http_responses_duration_ms_sum[5m])
/
rate(qtap_http_responses_duration_ms_count[5m])

95th Percentile Latency

histogram_quantile(0.95,
  rate(qtap_http_requests_duration_ms_bucket[5m])
)

Top Hosts by Request Volume

topk(10, sum by (host) (rate(qtap_http_requests_total[5m])))

Request Rate by Method

sum by (method) (rate(qtap_http_requests_total[5m]))

Overall Transaction Latency (Combined)

# Average overall transaction time
rate(qtap_http_duration_ms_sum[5m])
/
rate(qtap_http_duration_ms_count[5m])

# 95th percentile overall latency
histogram_quantile(0.95,
  rate(qtap_http_duration_ms_bucket[5m])
)

Active Connections

# Total active connections
sum(qtap_connection_active_total)

# Active connections by destination
topk(10, qtap_connection_active_total)

# Active connections by direction
sum by (direction) (qtap_connection_active_total)

TLS Handshake Rate

# Overall handshake rate
sum(rate(qtap_connection_tls_handshake_total[5m]))

# Handshake rate by SNI (domain)
sum by (sni) (rate(qtap_connection_tls_handshake_total[5m]))

# Handshake rate by TLS version
sum by (version) (rate(qtap_connection_tls_handshake_total[5m]))

Connection Throughput

# Bytes sent per second
rate(qtap_connection_bytes_sent_total[5m])

# Bytes received per second
rate(qtap_connection_bytes_recv_total[5m])

Grafana Dashboard

Qpoint provides a sample Grafana dashboard that visualizes these metrics:

Dashboard: qtap-http-overview.json

The dashboard includes:

Request Rate: Overall HTTP request volume over time
Error Rate: Percentage of failed requests (4xx/5xx)
Response Time: Average response duration and TTFB
Latency Percentiles: p50, p95, p99 request duration
Request/Response Sizes: Average payload sizes
Traffic by Host: Request distribution across hosts
Status Code Distribution: Breakdown of HTTP status codes

Importing the Dashboard

Download the dashboard JSON from the link above
In Grafana, navigate to Dashboards → Import
Upload the JSON file or paste its contents
Configure your Prometheus datasource
Save the dashboard

Prometheus Configuration

Add qtap as a scrape target in your prometheus.yml:

scrape_configs:
  - job_name: 'qtap'
    static_configs:
      - targets: ['localhost:10001']
    metrics_path: '/metrics'
    scrape_interval: 15s

  - job_name: 'qtap-system'
    static_configs:
      - targets: ['localhost:10001']
    metrics_path: '/system/metrics'
    scrape_interval: 30s

For Kubernetes deployments, use ServiceMonitor (Prometheus Operator) or pod annotations for automatic discovery:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "10001"
  prometheus.io/path: "/metrics"

Alerting Examples

High Error Rate Alert

groups:
  - name: qtap
    rules:
      - alert: HighHTTPErrorRate
        expr: |
          (
            sum(rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
            /
            sum(rate(qtap_http_requests_total[5m]))
          ) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High HTTP 5xx error rate"
          description: "Error rate is {{ $value | humanizePercentage }}"

High Latency Alert

- alert: HighResponseLatency
  expr: |
    histogram_quantile(0.95,
      rate(qtap_http_requests_duration_ms_bucket[5m])
    ) > 1000
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "High response latency detected"
    description: "95th percentile latency is {{ $value }}ms"

Best Practices

Use labels wisely: The host, method, protocol, and status_code labels enable powerful filtering but can increase cardinality. Filter noisy traffic if needed.
Set appropriate scrape intervals: 15-30 seconds is typical for application metrics. System metrics can be scraped less frequently.
Use recording rules: Pre-compute expensive queries like percentiles to improve dashboard performance.
Monitor agent health: Track /system/metrics to ensure qtap itself is healthy and not experiencing backpressure.
Set up alerts: Don't just collect metrics - alert on anomalies like error rate spikes or latency increases.

Troubleshooting

Metrics endpoint not accessible

Check that qtap is running and port 10001 is accessible:

docker logs qtap
netstat -tlnp | grep 10001

No metrics appearing

Verify qtap is capturing traffic:

# Check qtap logs for capture activity
docker logs qtap | grep -i http

# Generate test traffic
curl https://httpbin.org/get

# Check metrics again
curl http://localhost:10001/metrics | grep qtap_http_requests_total

High cardinality issues

If you have too many unique label combinations:

Use filters to exclude noisy processes (see Traffic Capture Settings)
Aggregate metrics in Prometheus recording rules
Limit capture to specific hosts with endpoints configuration

Last updated 3 months ago

hashtagConfiguration Requirement

hashtagMetrics Endpoints

hashtag/metrics

hashtag/system/metrics

hashtagAccessing Metrics

hashtagFrom the Host

hashtagFrom Docker Container

hashtagFrom Kubernetes

hashtagKey Application Metrics

hashtagRequest Metrics

hashtagResponse Metrics

hashtagCombined Metrics

hashtagConnection-Level Metrics

hashtagCommon Queries

hashtagRequest Rate (per second)

hashtagError Rate (4xx and 5xx responses)

hashtagAverage Response Time

hashtag95th Percentile Latency

hashtagTop Hosts by Request Volume

hashtagRequest Rate by Method

hashtagOverall Transaction Latency (Combined)

hashtagActive Connections

hashtagTLS Handshake Rate

hashtagConnection Throughput

hashtagGrafana Dashboard

hashtagImporting the Dashboard

hashtagPrometheus Configuration

hashtagAlerting Examples

hashtagHigh Error Rate Alert

hashtagHigh Latency Alert

hashtagBest Practices

hashtagTroubleshooting

hashtagMetrics endpoint not accessible

hashtagNo metrics appearing

hashtagHigh cardinality issues

Configuration Requirement

Metrics Endpoints

`/metrics`

`/system/metrics`

Accessing Metrics

From the Host

From Docker Container

From Kubernetes

Key Application Metrics

Request Metrics

Response Metrics

Combined Metrics

Connection-Level Metrics

Common Queries

Request Rate (per second)

Error Rate (4xx and 5xx responses)

Average Response Time

95th Percentile Latency

Top Hosts by Request Volume

Request Rate by Method

Overall Transaction Latency (Combined)

Active Connections

TLS Handshake Rate

Connection Throughput

Grafana Dashboard

Importing the Dashboard

Prometheus Configuration

Alerting Examples

High Error Rate Alert

High Latency Alert

Best Practices

Troubleshooting

Metrics endpoint not accessible

No metrics appearing

High cardinality issues