Prometheus Metrics
Qtap exposes Prometheus-compatible metrics endpoints for monitoring traffic activity and agent health. These metrics enable you to build dashboards, set up alerts, and track system performance.
REQUIRED: HTTP-level metrics require the http_metrics
plugin in your stack configuration. See Configuration Requirement below.
Configuration Requirement
To enable HTTP-level Prometheus metrics, you must add the http_metrics
plugin to your stack:
stacks:
my_stack:
plugins:
- type: http_capture # Optional
config:
level: summary
- type: http_metrics # REQUIRED for Prometheus metrics
Without this plugin, only connection-level metrics will be available.
Metrics Endpoints
Qtap provides two metrics endpoints on localhost:10001
:
/metrics
/metrics
Application metrics for monitoring HTTP traffic activity on your system. Use these metrics to build dashboards tracking:
Request rates and volumes
Error rates and status code distributions
Response times and latency percentiles
Request/response payload sizes
Traffic patterns by host, method, and protocol
/system/metrics
/system/metrics
System metrics related to the health and performance of the qtap agent itself. These include:
eBPF program performance
Memory and CPU usage
Event processing rates
Internal error counts
Accessing Metrics
From the Host
curl http://localhost:10001/metrics
curl http://localhost:10001/system/metrics
From Docker Container
When running qtap in Docker, expose port 10001:
docker run -d --name qtap \
-p 10001:10001 \
# ... other flags ...
us-docker.pkg.dev/qpoint-edge/public/qtap:v0
Then access from the host:
curl http://localhost:10001/metrics
From Kubernetes
Forward the metrics port to your local machine:
kubectl port-forward pod/qtap-xxxxx 10001:10001
Then access locally:
curl http://localhost:10001/metrics
Key Application Metrics
The /metrics
endpoint provides these HTTP traffic metrics:
Request Metrics
qtap_http_requests_total
Type: Counter
Description: Total number of HTTP requests observed
Labels:
host
,method
,protocol
,status_code
Use: Calculate request rates, error rates, and traffic distribution
qtap_http_requests_duration_ms_bucket
Type: Histogram
Description: Request duration distribution in milliseconds
Labels:
host
,method
,protocol
,status_code
,le
(histogram bucket)Use: Calculate latency percentiles (p50, p95, p99)
qtap_http_requests_duration_ms_sum
Type: Counter
Description: Total request processing time in milliseconds
Labels:
host
,method
,protocol
,status_code
Use: Calculate average request duration
qtap_http_requests_duration_ms_count
Type: Counter
Description: Total number of requests measured for duration
Labels:
host
,method
,protocol
,status_code
Use: Paired with
_sum
for average calculations
qtap_http_requests_size_bytes_bucket
Type: Histogram
Description: Request payload size distribution in bytes
Labels:
host
,method
,protocol
,status_code
,le
Use: Calculate payload size percentiles
qtap_http_requests_size_bytes_sum
Type: Counter
Description: Total size of HTTP request payloads in bytes
Labels:
host
,method
,protocol
,status_code
Use: Calculate average request size
qtap_http_requests_size_bytes_count
Type: Counter
Description: Number of requests measured for size
Labels:
host
,method
,protocol
,status_code
Use: Paired with
_sum
for average calculations
Response Metrics
qtap_http_responses_total
Type: Counter
Description: Total number of HTTP responses observed
Labels:
host
,method
,protocol
,status_code
Use: Track response counts
qtap_http_responses_duration_ms_bucket
Type: Histogram
Description: Response duration distribution (TTFB) in milliseconds
Labels:
host
,method
,protocol
,status_code
,le
Use: Calculate response time percentiles
qtap_http_responses_duration_ms_sum
Type: Counter
Description: Total response time in milliseconds (TTFB - Time To First Byte)
Labels:
host
,method
,protocol
,status_code
Use: Calculate average response times
qtap_http_responses_duration_ms_count
Type: Counter
Description: Number of responses measured for duration
Labels:
host
,method
,protocol
,status_code
Use: Paired with
_sum
for average calculations
qtap_http_responses_size_bytes_bucket
Type: Histogram
Description: Response payload size distribution in bytes
Labels:
host
,method
,protocol
,status_code
,le
Use: Calculate payload size percentiles
qtap_http_responses_size_bytes_sum
Type: Counter
Description: Total size of HTTP response payloads in bytes
Labels:
host
,method
,protocol
,status_code
Use: Calculate average response size
qtap_http_responses_size_bytes_count
Type: Counter
Description: Number of responses measured for size
Labels:
host
,method
,protocol
,status_code
Use: Paired with
_sum
for average calculations
Combined Metrics
qtap_http_duration_ms_bucket
Type: Histogram
Description: Combined request+response duration distribution in milliseconds
Labels: None (aggregates all traffic)
Use: Calculate overall latency percentiles across all hosts
qtap_http_duration_ms_sum
Type: Counter
Description: Total combined request+response time in milliseconds
Labels: None
Use: Calculate overall average transaction duration
qtap_http_duration_ms_count
Type: Counter
Description: Total number of completed HTTP transactions
Labels: None
Use: Paired with
_sum
for average calculations
Connection-Level Metrics
These metrics are always available without the http_metrics
plugin:
qtap_connection_open_total
Type: Counter
Description: Total number of connections opened
Labels:
direction
,remote_addr
,remote_port
qtap_connection_close_total
Type: Counter
Description: Total number of connections closed
Labels:
direction
,remote_addr
,remote_port
qtap_connection_active_total
Type: Gauge
Description: Number of currently active connections
Labels:
direction
,remote_addr
,remote_port
Use: Monitor real-time connection usage by destination
qtap_connection_protocol_total
Type: Counter
Description: Total connections by protocol type
Labels:
protocol
Use: Track distribution of connection protocols (http1, http2, etc.)
qtap_connection_bytes_sent_total
Type: Counter
Description: Total bytes sent across all connections
Labels:
direction
,remote_addr
,remote_port
qtap_connection_bytes_recv_total
Type: Counter
Description: Total bytes received across all connections
Labels:
direction
,remote_addr
,remote_port
qtap_connection_duration_ms
Type: Histogram
Description: Connection duration distribution in milliseconds
Labels:
direction
,remote_addr
,remote_port
,le
qtap_connection_tls_handshake_total
Type: Counter
Description: Total number of TLS handshakes completed
Labels:
direction
,remote_addr
,remote_port
,sni
,version
Use: Track TLS/HTTPS connection establishment by SNI and TLS version
Common Queries
Request Rate (per second)
rate(qtap_http_requests_total[5m])
Error Rate (4xx and 5xx responses)
sum(rate(qtap_http_requests_total{status_code=~"4..|5.."}[5m]))
/
sum(rate(qtap_http_requests_total[5m]))
Average Response Time
rate(qtap_http_responses_duration_ms_sum[5m])
/
rate(qtap_http_responses_duration_ms_count[5m])
95th Percentile Latency
histogram_quantile(0.95,
rate(qtap_http_requests_duration_ms_bucket[5m])
)
Top Hosts by Request Volume
topk(10, sum by (host) (rate(qtap_http_requests_total[5m])))
Request Rate by Method
sum by (method) (rate(qtap_http_requests_total[5m]))
Overall Transaction Latency (Combined)
# Average overall transaction time
rate(qtap_http_duration_ms_sum[5m])
/
rate(qtap_http_duration_ms_count[5m])
# 95th percentile overall latency
histogram_quantile(0.95,
rate(qtap_http_duration_ms_bucket[5m])
)
Active Connections
# Total active connections
sum(qtap_connection_active_total)
# Active connections by destination
topk(10, qtap_connection_active_total)
# Active connections by direction
sum by (direction) (qtap_connection_active_total)
TLS Handshake Rate
# Overall handshake rate
sum(rate(qtap_connection_tls_handshake_total[5m]))
# Handshake rate by SNI (domain)
sum by (sni) (rate(qtap_connection_tls_handshake_total[5m]))
# Handshake rate by TLS version
sum by (version) (rate(qtap_connection_tls_handshake_total[5m]))
Connection Throughput
# Bytes sent per second
rate(qtap_connection_bytes_sent_total[5m])
# Bytes received per second
rate(qtap_connection_bytes_recv_total[5m])
Grafana Dashboard
Qpoint provides a sample Grafana dashboard that visualizes these metrics:
Dashboard: qtap-http-overview.json
The dashboard includes:
Request Rate: Overall HTTP request volume over time
Error Rate: Percentage of failed requests (4xx/5xx)
Response Time: Average response duration and TTFB
Latency Percentiles: p50, p95, p99 request duration
Request/Response Sizes: Average payload sizes
Traffic by Host: Request distribution across hosts
Status Code Distribution: Breakdown of HTTP status codes
Importing the Dashboard
Download the dashboard JSON from the link above
In Grafana, navigate to Dashboards → Import
Upload the JSON file or paste its contents
Configure your Prometheus datasource
Save the dashboard
Prometheus Configuration
Add qtap as a scrape target in your prometheus.yml
:
scrape_configs:
- job_name: 'qtap'
static_configs:
- targets: ['localhost:10001']
metrics_path: '/metrics'
scrape_interval: 15s
- job_name: 'qtap-system'
static_configs:
- targets: ['localhost:10001']
metrics_path: '/system/metrics'
scrape_interval: 30s
For Kubernetes deployments, use ServiceMonitor (Prometheus Operator) or pod annotations for automatic discovery:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "10001"
prometheus.io/path: "/metrics"
Alerting Examples
High Error Rate Alert
groups:
- name: qtap
rules:
- alert: HighHTTPErrorRate
expr: |
(
sum(rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(qtap_http_requests_total[5m]))
) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "High HTTP 5xx error rate"
description: "Error rate is {{ $value | humanizePercentage }}"
High Latency Alert
- alert: HighResponseLatency
expr: |
histogram_quantile(0.95,
rate(qtap_http_requests_duration_ms_bucket[5m])
) > 1000
for: 10m
labels:
severity: warning
annotations:
summary: "High response latency detected"
description: "95th percentile latency is {{ $value }}ms"
Best Practices
Use labels wisely: The
host
,method
,protocol
, andstatus_code
labels enable powerful filtering but can increase cardinality. Filter noisy traffic if needed.Set appropriate scrape intervals: 15-30 seconds is typical for application metrics. System metrics can be scraped less frequently.
Use recording rules: Pre-compute expensive queries like percentiles to improve dashboard performance.
Monitor agent health: Track
/system/metrics
to ensure qtap itself is healthy and not experiencing backpressure.Set up alerts: Don't just collect metrics - alert on anomalies like error rate spikes or latency increases.
Troubleshooting
Metrics endpoint not accessible
Check that qtap is running and port 10001 is accessible:
docker logs qtap
netstat -tlnp | grep 10001
No metrics appearing
Verify qtap is capturing traffic:
# Check qtap logs for capture activity
docker logs qtap | grep -i http
# Generate test traffic
curl https://httpbin.org/get
# Check metrics again
curl http://localhost:10001/metrics | grep qtap_http_requests_total
High cardinality issues
If you have too many unique label combinations:
Use filters to exclude noisy processes (see Traffic Capture Settings)
Aggregate metrics in Prometheus recording rules
Limit capture to specific hosts with endpoints configuration
Last updated