Monitoring Qtap with Prometheus and Grafana
This guide walks you through setting up comprehensive monitoring for qtap using Prometheus for metrics collection and Grafana for visualization. By the end, you'll have dashboards showing request rates, error rates, latency percentiles, and traffic patterns.
What You'll Build
Prometheus scraping qtap metrics every 15 seconds
Grafana dashboard visualizing HTTP traffic patterns
Alerts for high error rates and latency spikes
RED metrics (Rate, Errors, Duration) for all observed traffic
Prerequisites
Qtap installed and running (see Getting Started)
Docker or Kubernetes environment
Basic familiarity with Prometheus and Grafana
Step 0: Enable HTTP Metrics Plugin
CRITICAL FIRST STEP: Qtap requires the http_metrics
plugin to expose HTTP-level Prometheus metrics. Without this plugin, you'll only see connection-level metrics.
Update Your Qtap Configuration
Edit your qtap configuration file to include the http_metrics
plugin:
stacks:
monitoring_stack:
plugins:
# HTTP capture plugin (optional - for logging/storage)
- type: http_capture
config:
level: summary
format: json
# HTTP metrics plugin (REQUIRED for Prometheus metrics)
- type: http_metrics
tap:
direction: egress
http:
stack: monitoring_stack
Restart Qtap
After adding the plugin, restart qtap:
# Docker
docker restart qtap
# Kubernetes
kubectl rollout restart daemonset/qtap
Step 1: Verify Metrics Are Available
First, confirm qtap is exposing HTTP metrics:
# Check qtap is running
docker ps | grep qtap
# Access metrics endpoint
curl http://localhost:10001/metrics | grep qtap_http_requests_total
You should see Prometheus-formatted metrics:
# HELP qtap_http_requests_total Total HTTP requests observed
# TYPE qtap_http_requests_total counter
qtap_http_requests_total{host="httpbin.org",method="GET",protocol="http2",status_code="200"} 5
If you don't see qtap_http_requests_total
, the http_metrics
plugin is not configured. Go back to Step 0.
If metrics exist but show zero, generate test traffic:
# Generate test traffic
curl https://httpbin.org/get
# Check metrics again (wait 5 seconds for metrics to update)
sleep 5
curl http://localhost:10001/metrics | grep qtap_http_requests_total
Step 2: Deploy Prometheus
Option A: Docker Compose
Create prometheus.yml
:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
# Application metrics
- job_name: 'qtap'
static_configs:
- targets: ['host.docker.internal:10001']
metrics_path: '/metrics'
# Agent health metrics
- job_name: 'qtap-system'
static_configs:
- targets: ['host.docker.internal:10001']
metrics_path: '/system/metrics'
scrape_interval: 30s
Create docker-compose.yml
:
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
extra_hosts:
- "host.docker.internal:host-gateway"
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
prometheus-data:
grafana-data:
Start the stack:
docker compose up -d
Option B: Kubernetes with ServiceMonitor
If using Prometheus Operator, create a ServiceMonitor:
apiVersion: v1
kind: Service
metadata:
name: qtap-metrics
labels:
app: qtap
spec:
selector:
app: qtap
ports:
- name: metrics
port: 10001
targetPort: 10001
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: qtap
labels:
app: qtap
spec:
selector:
matchLabels:
app: qtap
endpoints:
- port: metrics
path: /metrics
interval: 15s
- port: metrics
path: /system/metrics
interval: 30s
Apply it:
kubectl apply -f qtap-servicemonitor.yaml
Option C: Kubernetes with Pod Annotations
For standard Prometheus server, add annotations to your qtap DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: qtap
spec:
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "10001"
prometheus.io/path: "/metrics"
spec:
# ... qtap pod spec
Add Qtap Scrape Config to Existing Prometheus
If you already have Prometheus running, add these scrape configs to your existing prometheus.yml
:
scrape_configs:
# ... your existing jobs ...
# Qtap application metrics
- job_name: 'qtap'
static_configs:
- targets: ['<qtap-host>:10001'] # Replace with your qtap host
metrics_path: '/metrics'
scrape_interval: 15s
# Qtap system/health metrics
- job_name: 'qtap-system'
static_configs:
- targets: ['<qtap-host>:10001']
metrics_path: '/system/metrics'
scrape_interval: 30s
For Kubernetes with Prometheus Operator: If you already have Prometheus Operator installed, just apply the ServiceMonitor from Option B above - no need to modify Prometheus config files.
Reload Prometheus configuration:
# Docker
docker exec prometheus kill -HUP 1
# Kubernetes (if using Prometheus Operator, reload is automatic)
kubectl rollout restart deployment/prometheus-server -n monitoring
Step 3: Verify Prometheus Is Scraping
Open Prometheus UI at http://localhost:9090
and check:
Status → Targets: Verify
qtap
andqtap-system
jobs are "UP"Run a query: Try
qtap_http_requests_total
in the query box
If targets show as DOWN:
# Check network connectivity from Prometheus container
docker exec prometheus wget -O- http://host.docker.internal:10001/metrics
# For Kubernetes, check service and endpoints
kubectl get svc qtap-metrics
kubectl get endpoints qtap-metrics
Step 4: Import Grafana Dashboard
Access Grafana
Navigate to http://localhost:3000
and log in:
Username: admin
Password: admin (or value from
GF_SECURITY_ADMIN_PASSWORD
)
Add Prometheus Data Source
Navigate to Configuration → Data Sources
Click Add data source
Select Prometheus
Set URL:
Docker Compose:
http://prometheus:9090
Kubernetes:
http://prometheus-server.monitoring.svc.cluster.local
Click Save & Test
Import Qtap Dashboard
Download the dashboard: qtap-http-overview.json
In Grafana, navigate to Dashboards → Import
Click Upload JSON file and select the downloaded file
Select your Prometheus data source
Click Import
Dashboard Panels
The dashboard includes:
Request Rate: Total requests per second over time
Error Rate: Percentage of 4xx/5xx responses
Average Response Time: Mean response duration
Latency Percentiles: p50, p95, p99 request duration
Request/Response Sizes: Average payload sizes
Top Hosts: Highest traffic hosts
Status Code Distribution: Breakdown by HTTP status
Protocol Distribution: HTTP/1.1 vs HTTP/2 traffic
Step 5: Set Up Alerts
Create alerts.yml
for Prometheus:
groups:
- name: qtap-alerts
interval: 30s
rules:
# High error rate alert
- alert: HighHTTPErrorRate
expr: |
(
sum(rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(qtap_http_requests_total[5m]))
) > 0.05
for: 5m
labels:
severity: warning
component: qtap
annotations:
summary: "High HTTP 5xx error rate detected"
description: "Error rate is {{ $value | humanizePercentage }} over the last 5 minutes"
# High latency alert
- alert: HighResponseLatency
expr: |
histogram_quantile(0.95,
rate(qtap_http_requests_duration_ms_bucket[5m])
) > 1000
for: 10m
labels:
severity: warning
component: qtap
annotations:
summary: "High response latency detected"
description: "95th percentile latency is {{ $value }}ms"
# Traffic spike alert
- alert: TrafficSpike
expr: |
rate(qtap_http_requests_total[5m])
> 2 * rate(qtap_http_requests_total[1h] offset 1h)
for: 5m
labels:
severity: info
component: qtap
annotations:
summary: "Unusual traffic spike detected"
description: "Current request rate is {{ $value | humanize }}req/s"
# Qtap agent health
- alert: QtapAgentDown
expr: up{job="qtap"} == 0
for: 2m
labels:
severity: critical
component: qtap
annotations:
summary: "Qtap agent is down"
description: "Qtap metrics endpoint is not responding"
Update prometheus.yml
to load the rules:
global:
scrape_interval: 15s
rule_files:
- /etc/prometheus/alerts.yml
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093'] # Optional: configure Alertmanager
scrape_configs:
# ... existing scrape configs
Reload Prometheus configuration:
# Docker Compose
docker compose restart prometheus
# Kubernetes
kubectl rollout restart deployment prometheus-server
Configure Grafana Alerts (Alternative)
You can also set up alerts directly in Grafana:
Open a dashboard panel
Click Edit → Alert tab
Create alert conditions based on the queries
Configure notification channels (Slack, PagerDuty, email, etc.)
Step 6: Test the Monitoring Setup
Generate various types of traffic to see metrics flow through:
Successful Requests
for i in {1..50}; do
curl -s https://httpbin.org/get > /dev/null
sleep 0.1
done
Error Responses
for i in {1..10}; do
curl -s https://httpbin.org/status/500 > /dev/null
sleep 0.2
done
High Latency
for i in {1..5}; do
curl -s https://httpbin.org/delay/2 > /dev/null
done
View in Grafana
Open the qtap dashboard
Observe request rate increase
See error rate spike from 500 responses
Check latency percentiles from delayed requests
Step 7: Optimize for Production
Reduce Metric Cardinality
High cardinality (many unique label combinations) can impact Prometheus performance. To reduce it:
Filter noisy processes in qtap config:
filters:
groups:
- qpoint
custom:
- exe: /usr/bin/health-check
strategy: exact
- exe: /usr/local/bin/monitoring-
strategy: prefix
Focus on important domains:
tap:
endpoints:
- domain: 'api.production.example.com'
http:
stack: monitored_stack
Use Recording Rules
Pre-compute expensive queries with Prometheus recording rules:
groups:
- name: qtap-recordings
interval: 15s
rules:
# Pre-compute request rate by host
- record: qtap:http_requests:rate5m
expr: |
sum by (host) (rate(qtap_http_requests_total[5m]))
# Pre-compute error rate
- record: qtap:http_errors:rate5m
expr: |
sum(rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(qtap_http_requests_total[5m]))
# Pre-compute p95 latency
- record: qtap:http_latency:p95
expr: |
histogram_quantile(0.95,
sum(rate(qtap_http_requests_duration_ms_bucket[5m])) by (le)
)
Use these in your dashboards and alerts for better performance.
Retention Settings
Configure Prometheus retention based on your needs:
# docker-compose.yml
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d' # Keep 30 days
- '--storage.tsdb.retention.size=10GB' # Or max 10GB
Common Queries for Troubleshooting
Find services with high error rates
topk(5,
sum by (host) (rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
/
sum by (host) (rate(qtap_http_requests_total[5m]))
)
Identify slow endpoints
topk(10,
histogram_quantile(0.95,
sum by (host, le) (rate(qtap_http_requests_duration_ms_bucket[5m]))
)
)
HTTP/2 vs HTTP/1 Traffic
sum by (protocol) (rate(qtap_http_requests_total[5m]))
Average Request/Response Sizes
# Average request size
rate(qtap_http_requests_size_bytes_sum[5m])
/
rate(qtap_http_requests_size_bytes_count[5m])
# Average response size
rate(qtap_http_responses_size_bytes_sum[5m])
/
rate(qtap_http_responses_size_bytes_count[5m])
Compare traffic patterns over time
sum(rate(qtap_http_requests_total[5m]))
/
sum(rate(qtap_http_requests_total[5m] offset 1d))
Monitor overall system latency
# Average end-to-end transaction time (combined request+response)
rate(qtap_http_duration_ms_sum[5m])
/
rate(qtap_http_duration_ms_count[5m])
# 95th percentile overall latency
histogram_quantile(0.95,
rate(qtap_http_duration_ms_bucket[5m])
)
Check active connections
# Total active connections
sum(qtap_connection_active_total)
# Active connections by destination
topk(10, qtap_connection_active_total)
Track TLS/HTTPS usage
# TLS handshake rate (HTTPS connection establishment)
sum(rate(qtap_connection_tls_handshake_total[5m]))
# Connections by TLS version
sum by (version) (rate(qtap_connection_tls_handshake_total[5m]))
Troubleshooting
No HTTP metrics appearing in Prometheus
Most Common Issue: Missing http_metrics
plugin.
# Check if HTTP metrics exist
curl http://localhost:10001/metrics | grep "qtap_http_requests_total"
If empty:
Verify
http_metrics
plugin is in your qtap config:
stacks:
my_stack:
plugins:
- type: http_metrics # ← Must be present
Restart qtap after adding it:
docker restart qtap
Verify metrics now appear:
curl http://localhost:10001/metrics | grep "qtap_http_requests_total"
Only seeing connection metrics
If you see qtap_connection_*
metrics but no qtap_http_*
metrics, the http_metrics
plugin is not configured. See Step 0.
Metrics show but dashboard is empty
Check label names: Qtap v0 uses
host
, notdomain
. Update queries:# Wrong sum by (domain) (rate(qtap_http_requests_total[5m])) # Correct sum by (host) (rate(qtap_http_requests_total[5m]))
Check data source: Ensure Grafana is connected to the right Prometheus
Check time range: Extend the time range in Grafana
Verify queries: Test dashboard queries directly in Prometheus
Prometheus can't reach qtap
# Docker Compose
docker exec prometheus wget -O- http://host.docker.internal:10001/metrics
# Kubernetes
kubectl run curl-test --image=curlimages/curl --rm -it -- \
curl http://qtap-metrics.default.svc.cluster.local:10001/metrics
High cardinality warnings
level=warn msg="Metric has too many labels"
This means you have too many unique label combinations. Solutions:
Filter processes in qtap configuration
Use recording rules to aggregate
Limit domain capture with endpoints configuration
Consider using Prometheus remote write to long-term storage
Missing metrics after qtap restart
Prometheus counters reset when qtap restarts. This is normal. Use rate()
function which handles counter resets automatically.
Next Steps
Customize dashboards: Add panels for your specific use cases
Set up Alertmanager: Route alerts to Slack, PagerDuty, or email
Create SLOs: Define service level objectives based on qtap metrics
Integrate with logs: Correlate metrics with qtap captured payloads in your object store
Multi-cluster monitoring: Federate metrics from multiple qtap deployments
Additional Resources
Last updated