Prometheus + Grafana Monitoring
This guide walks you through setting up comprehensive monitoring for qtap using Prometheus for metrics collection and Grafana for visualization. By the end, you'll have dashboards showing request rates, error rates, latency percentiles, and traffic patterns.
What You'll Build
Prometheus scraping qtap metrics every 15 seconds
Grafana dashboard visualizing HTTP traffic patterns
Alerts for high error rates and latency spikes
RED metrics (Rate, Errors, Duration) for all observed traffic
Prerequisites
Qtap installed and running (see Getting Started)
Docker or Kubernetes environment
Basic familiarity with Prometheus and Grafana
Step 0: Enable HTTP Metrics Plugin
CRITICAL FIRST STEP: Qtap requires the http_metrics plugin to expose HTTP-level Prometheus metrics. Without this plugin, you'll only see connection-level metrics.
Update Your Qtap Configuration
Edit your qtap configuration file to include the http_metrics plugin:
version: 2
services:
event_stores:
- type: stdout
object_stores:
- type: stdout
stacks:
monitoring_stack:
plugins:
# HTTP capture plugin (optional - for logging/storage)
- type: http_capture
config:
level: summary
format: json
# HTTP metrics plugin (REQUIRED for Prometheus metrics)
- type: http_metrics
tap:
direction: egress
http:
stack: monitoring_stackRestart Qtap
After adding the plugin, restart qtap:
# Docker
docker restart qtap
# Kubernetes
kubectl rollout restart daemonset/qtapStep 1: Verify Metrics Are Available
First, confirm qtap is exposing HTTP metrics:
# Check qtap is running
docker ps | grep qtap
# Access metrics endpoint
curl http://localhost:10001/metrics | grep qtap_http_requests_totalYou should see Prometheus-formatted metrics:
# HELP qtap_http_requests_total Total HTTP requests observed
# TYPE qtap_http_requests_total counter
qtap_http_requests_total{host="httpbin.org",method="GET",protocol="http2",status_code="200"} 5If you don't see qtap_http_requests_total, the http_metrics plugin is not configured. Go back to Step 0.
If metrics exist but show zero, generate test traffic:
# Generate test traffic
curl https://httpbin.org/get
# Check metrics again (wait 5 seconds for metrics to update)
sleep 5
curl http://localhost:10001/metrics | grep qtap_http_requests_totalStep 2: Deploy Prometheus
Option A: Docker Compose
Create prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
# Application metrics
- job_name: 'qtap'
static_configs:
- targets: ['host.docker.internal:10001']
metrics_path: '/metrics'
# Agent health metrics
- job_name: 'qtap-system'
static_configs:
- targets: ['host.docker.internal:10001']
metrics_path: '/system/metrics'
scrape_interval: 30sCreate docker-compose.yml:
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
extra_hosts:
- "host.docker.internal:host-gateway" # On Linux you may need this to reach the host; see note below
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
prometheus-data:
grafana-data:Start the stack:
docker compose up -dOption B: Kubernetes with ServiceMonitor
If using Prometheus Operator, create a ServiceMonitor:
apiVersion: v1
kind: Service
metadata:
name: qtap-metrics
labels:
app: qtap
spec:
selector:
app: qtap
ports:
- name: metrics
port: 10001
targetPort: 10001
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: qtap
labels:
app: qtap
spec:
selector:
matchLabels:
app: qtap
endpoints:
- port: metrics
path: /metrics
interval: 15s
- port: metrics
path: /system/metrics
interval: 30sApply it:
kubectl apply -f qtap-servicemonitor.yamlOption C: Kubernetes with Pod Annotations
For standard Prometheus server, add annotations to your qtap DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: qtap
spec:
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "10001"
prometheus.io/path: "/metrics"
spec:
# ... qtap pod specAdd Qtap Scrape Config to Existing Prometheus
If you already have Prometheus running, add these scrape configs to your existing prometheus.yml:
scrape_configs:
# ... your existing jobs ...
# Qtap application metrics
- job_name: 'qtap'
# Docker for Mac/Windows exposes host.docker.internal automatically.
# On Linux, add --add-host=host.docker.internal:host-gateway (Compose v2)
# or replace with your bridge IP (commonly 172.17.0.1).
static_configs:
- targets: ['<qtap-host>:10001'] # Replace with your qtap host
metrics_path: '/metrics'
scrape_interval: 15s
# Qtap system/health metrics
- job_name: 'qtap-system'
static_configs:
- targets: ['<qtap-host>:10001']
metrics_path: '/system/metrics'
scrape_interval: 30sFor Kubernetes with Prometheus Operator: If you already have Prometheus Operator installed, just apply the ServiceMonitor from Option B above - no need to modify Prometheus config files.
Reload Prometheus configuration:
# Docker
docker exec prometheus kill -HUP 1
# Kubernetes (if using Prometheus Operator, reload is automatic)
kubectl rollout restart deployment/prometheus-server -n monitoringStep 3: Verify Prometheus Is Scraping
Open Prometheus UI at http://localhost:9090 and check:
Status → Targets: Verify
qtapandqtap-systemjobs are "UP"Run a query: Try
qtap_http_requests_totalin the query box
If targets show as DOWN:
# Check network connectivity from Prometheus container
# macOS/Windows
docker exec prometheus wget -O- http://host.docker.internal:10001/metrics
# Linux (if host.docker.internal is unavailable)
# docker exec prometheus wget -O- http://172.17.0.1:10001/metrics
# For Kubernetes, check service and endpoints
kubectl get svc qtap-metrics
kubectl get endpoints qtap-metricsStep 4: Import Grafana Dashboard
Access Grafana
Navigate to http://localhost:3000 and log in:
Username: admin
Password: admin (or value from
GF_SECURITY_ADMIN_PASSWORD)
Add Prometheus Data Source
Navigate to Configuration → Data Sources
Click Add data source
Select Prometheus
Set URL:
Docker Compose:
http://prometheus:9090Kubernetes:
http://prometheus-server.monitoring.svc.cluster.local
Click Save & Test
Import Qtap Dashboard
Download the dashboard: qtap-http-overview.json
In Grafana, navigate to Dashboards → Import
Click Upload JSON file and select the downloaded file
Select your Prometheus data source
Click Import
Dashboard Panels
The dashboard includes:
Request Rate: Total requests per second over time
Error Rate: Percentage of 4xx/5xx responses
Average Response Time: Mean response duration
Latency Percentiles: p50, p95, p99 request duration
Request/Response Sizes: Average payload sizes
Top Hosts: Highest traffic hosts
Status Code Distribution: Breakdown by HTTP status
Protocol Distribution: HTTP/1.1 vs HTTP/2 traffic
Step 5: Set Up Alerts
Create alerts.yml for Prometheus:
groups:
- name: qtap-alerts
interval: 30s
rules:
# High error rate alert
- alert: HighHTTPErrorRate
expr: |
(
sum(rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(qtap_http_requests_total[5m]))
) > 0.05
for: 5m
labels:
severity: warning
component: qtap
annotations:
summary: "High HTTP 5xx error rate detected"
description: "Error rate is {{ $value | humanizePercentage }} over the last 5 minutes"
# High latency alert
- alert: HighResponseLatency
expr: |
histogram_quantile(0.95,
rate(qtap_http_requests_duration_ms_bucket[5m])
) > 1000
for: 10m
labels:
severity: warning
component: qtap
annotations:
summary: "High response latency detected"
description: "95th percentile latency is {{ $value }}ms"
# Traffic spike alert
- alert: TrafficSpike
expr: |
rate(qtap_http_requests_total[5m])
> 2 * rate(qtap_http_requests_total[1h] offset 1h)
for: 5m
labels:
severity: info
component: qtap
annotations:
summary: "Unusual traffic spike detected"
description: "Current request rate is {{ $value | humanize }}req/s"
# Qtap agent health
- alert: QtapAgentDown
expr: up{job="qtap"} == 0
for: 2m
labels:
severity: critical
component: qtap
annotations:
summary: "Qtap agent is down"
description: "Qtap metrics endpoint is not responding"Update prometheus.yml to load the rules:
global:
scrape_interval: 15s
rule_files:
- /etc/prometheus/alerts.yml
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093'] # Optional: configure Alertmanager
scrape_configs:
# ... existing scrape configsReload Prometheus configuration:
# Docker Compose
docker compose restart prometheus
# Kubernetes
kubectl rollout restart deployment prometheus-serverConfigure Grafana Alerts (Alternative)
You can also set up alerts directly in Grafana:
Open a dashboard panel
Click Edit → Alert tab
Create alert conditions based on the queries
Configure notification channels (Slack, PagerDuty, email, etc.)
Step 6: Test the Monitoring Setup
Generate various types of traffic to see metrics flow through:
Successful Requests
for i in {1..50}; do
curl -s https://httpbin.org/get > /dev/null
sleep 0.1
doneError Responses
for i in {1..10}; do
curl -s https://httpbin.org/status/500 > /dev/null
sleep 0.2
doneHigh Latency
for i in {1..5}; do
curl -s https://httpbin.org/delay/2 > /dev/null
doneView in Grafana
Open the qtap dashboard
Observe request rate increase
See error rate spike from 500 responses
Check latency percentiles from delayed requests
Step 7: Optimize for Production
Reduce Metric Cardinality
High cardinality (many unique label combinations) can impact Prometheus performance. To reduce it:
Filter noisy processes in qtap config:
filters:
groups:
- qpoint
custom:
- exe: /usr/bin/health-check
strategy: exact
- exe: /usr/local/bin/monitoring-
strategy: prefixFocus on important domains:
tap:
endpoints:
- domain: 'api.production.example.com'
http:
stack: monitored_stackUse Recording Rules
Pre-compute expensive queries with Prometheus recording rules:
groups:
- name: qtap-recordings
interval: 15s
rules:
# Pre-compute request rate by host
- record: qtap:http_requests:rate5m
expr: |
sum by (host) (rate(qtap_http_requests_total[5m]))
# Pre-compute error rate
- record: qtap:http_errors:rate5m
expr: |
sum(rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(qtap_http_requests_total[5m]))
# Pre-compute p95 latency
- record: qtap:http_latency:p95
expr: |
histogram_quantile(0.95,
sum(rate(qtap_http_requests_duration_ms_bucket[5m])) by (le)
)Use these in your dashboards and alerts for better performance.
Retention Settings
Configure Prometheus retention based on your needs:
# docker-compose.yml
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d' # Keep 30 days
- '--storage.tsdb.retention.size=10GB' # Or max 10GBCommon Queries for Troubleshooting
Find services with high error rates
topk(5,
sum by (host) (rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
/
sum by (host) (rate(qtap_http_requests_total[5m]))
)Identify slow endpoints
topk(10,
histogram_quantile(0.95,
sum by (host, le) (rate(qtap_http_requests_duration_ms_bucket[5m]))
)
)HTTP/2 vs HTTP/1 Traffic
sum by (protocol) (rate(qtap_http_requests_total[5m]))Average Request/Response Sizes
# Average request size
rate(qtap_http_requests_size_bytes_sum[5m])
/
rate(qtap_http_requests_size_bytes_count[5m])
# Average response size
rate(qtap_http_responses_size_bytes_sum[5m])
/
rate(qtap_http_responses_size_bytes_count[5m])Compare traffic patterns over time
sum(rate(qtap_http_requests_total[5m]))
/
sum(rate(qtap_http_requests_total[5m] offset 1d))Monitor overall system latency
# Average end-to-end transaction time (combined request+response)
rate(qtap_http_duration_ms_sum[5m])
/
rate(qtap_http_duration_ms_count[5m])
# 95th percentile overall latency
histogram_quantile(0.95,
rate(qtap_http_duration_ms_bucket[5m])
)Check active connections
# Total active connections
sum(qtap_connection_active_total)
# Active connections by destination
topk(10, qtap_connection_active_total)Track TLS/HTTPS usage
# TLS handshake rate (HTTPS connection establishment)
sum(rate(qtap_connection_tls_handshake_total[5m]))
# Connections by TLS version
sum by (version) (rate(qtap_connection_tls_handshake_total[5m]))Troubleshooting
No HTTP metrics appearing in Prometheus
Most Common Issue: Missing http_metrics plugin.
# Check if HTTP metrics exist
curl http://localhost:10001/metrics | grep "qtap_http_requests_total"If empty:
Verify
http_metricsplugin is in your qtap config:
stacks:
my_stack:
plugins:
- type: http_metrics # ← Must be presentRestart qtap after adding it:
docker restart qtapVerify metrics now appear:
curl http://localhost:10001/metrics | grep "qtap_http_requests_total"Only seeing connection metrics
If you see qtap_connection_* metrics but no qtap_http_* metrics, the http_metrics plugin is not configured. See Step 0.
Metrics show but dashboard is empty
Check label names: Qtap v0 uses
host, notdomain. Update queries:# Wrong sum by (domain) (rate(qtap_http_requests_total[5m])) # Correct sum by (host) (rate(qtap_http_requests_total[5m]))Check data source: Ensure Grafana is connected to the right Prometheus
Check time range: Extend the time range in Grafana
Verify queries: Test dashboard queries directly in Prometheus
Prometheus can't reach qtap
# Docker Compose (Mac/Windows)
docker exec prometheus wget -O- http://host.docker.internal:10001/metrics
# Docker Compose (Linux):
# docker exec prometheus wget -O- http://172.17.0.1:10001/metrics
# Kubernetes
kubectl run curl-test --image=curlimages/curl --rm -it -- \
curl http://qtap-metrics.default.svc.cluster.local:10001/metricsHigh cardinality warnings
level=warn msg="Metric has too many labels"This means you have too many unique label combinations. Solutions:
Filter processes in qtap configuration
Use recording rules to aggregate
Limit domain capture with endpoints configuration
Consider using Prometheus remote write to long-term storage
Missing metrics after qtap restart
Prometheus counters reset when qtap restarts. This is normal. Use rate() function which handles counter resets automatically.
Next Steps
Customize dashboards: Add panels for your specific use cases
Set up Alertmanager: Route alerts to Slack, PagerDuty, or email
Create SLOs: Define service level objectives based on qtap metrics
Integrate with logs: Correlate metrics with qtap captured payloads in your object store
Multi-cluster monitoring: Federate metrics from multiple qtap deployments
Additional Resources
Last updated