Prometheus + Grafana Monitoring
This guide walks you through setting up comprehensive monitoring for qtap using Prometheus for metrics collection and Grafana for visualization. By the end, you'll have dashboards showing request rates, error rates, latency percentiles, and traffic patterns.
What You'll Build
Prometheus scraping qtap metrics every 15 seconds
Grafana dashboard visualizing HTTP traffic patterns
Alerts for high error rates and latency spikes
RED metrics (Rate, Errors, Duration) for all observed traffic
Prerequisites
Qtap installed and running (see Getting Started)
Docker or Kubernetes environment
Basic familiarity with Prometheus and Grafana
Step 0: Enable HTTP Metrics Plugin
CRITICAL FIRST STEP: Qtap requires the http_metrics plugin to expose HTTP-level Prometheus metrics. Without this plugin, you'll only see connection-level metrics.
Update Your Qtap Configuration
Edit your qtap configuration file to include the http_metrics plugin:
Restart Qtap
After adding the plugin, restart qtap:
Step 1: Verify Metrics Are Available
First, confirm qtap is exposing HTTP metrics:
You should see Prometheus-formatted metrics:
If you don't see qtap_http_requests_total, the http_metrics plugin is not configured. Go back to Step 0.
If metrics exist but show zero, generate test traffic:
Step 2: Deploy Prometheus
Option A: Docker Compose
Create prometheus.yml:
Create docker-compose.yml:
Start the stack:
Option B: Kubernetes with ServiceMonitor
If using Prometheus Operator, create a ServiceMonitor:
Apply it:
Option C: Kubernetes with Pod Annotations
For standard Prometheus server, add annotations to your qtap DaemonSet:
Add Qtap Scrape Config to Existing Prometheus
If you already have Prometheus running, add these scrape configs to your existing prometheus.yml:
For Kubernetes with Prometheus Operator: If you already have Prometheus Operator installed, just apply the ServiceMonitor from Option B above - no need to modify Prometheus config files.
Reload Prometheus configuration:
Step 3: Verify Prometheus Is Scraping
Open Prometheus UI at http://localhost:9090 and check:
Status → Targets: Verify
qtapandqtap-systemjobs are "UP"Run a query: Try
qtap_http_requests_totalin the query box
If targets show as DOWN:
Step 4: Import Grafana Dashboard
Access Grafana
Navigate to http://localhost:3000 and log in:
Username: admin
Password: admin (or value from
GF_SECURITY_ADMIN_PASSWORD)
Add Prometheus Data Source
Navigate to Configuration → Data Sources
Click Add data source
Select Prometheus
Set URL:
Docker Compose:
http://prometheus:9090Kubernetes:
http://prometheus-server.monitoring.svc.cluster.local
Click Save & Test
Import Qtap Dashboard
Download the dashboard: qtap-http-overview.json
In Grafana, navigate to Dashboards → Import
Click Upload JSON file and select the downloaded file
Select your Prometheus data source
Click Import
Dashboard Panels
The dashboard includes:
Request Rate: Total requests per second over time
Error Rate: Percentage of 4xx/5xx responses
Average Response Time: Mean response duration
Latency Percentiles: p50, p95, p99 request duration
Request/Response Sizes: Average payload sizes
Top Hosts: Highest traffic hosts
Status Code Distribution: Breakdown by HTTP status
Protocol Distribution: HTTP/1.1 vs HTTP/2 traffic
Step 5: Set Up Alerts
Create alerts.yml for Prometheus:
Update prometheus.yml to load the rules:
Reload Prometheus configuration:
Configure Grafana Alerts (Alternative)
You can also set up alerts directly in Grafana:
Open a dashboard panel
Click Edit → Alert tab
Create alert conditions based on the queries
Configure notification channels (Slack, PagerDuty, email, etc.)
Step 6: Test the Monitoring Setup
Generate various types of traffic to see metrics flow through:
Successful Requests
Error Responses
High Latency
View in Grafana
Open the qtap dashboard
Observe request rate increase
See error rate spike from 500 responses
Check latency percentiles from delayed requests
Step 7: Optimize for Production
Reduce Metric Cardinality
High cardinality (many unique label combinations) can impact Prometheus performance. To reduce it:
Filter noisy processes in qtap config:
Focus on important domains:
Use Recording Rules
Pre-compute expensive queries with Prometheus recording rules:
Use these in your dashboards and alerts for better performance.
Retention Settings
Configure Prometheus retention based on your needs:
Common Queries for Troubleshooting
Find services with high error rates
Identify slow endpoints
HTTP/2 vs HTTP/1 Traffic
Average Request/Response Sizes
Compare traffic patterns over time
Monitor overall system latency
Check active connections
Track TLS/HTTPS usage
Troubleshooting
No HTTP metrics appearing in Prometheus
Most Common Issue: Missing http_metrics plugin.
If empty:
Verify
http_metricsplugin is in your qtap config:
Restart qtap after adding it:
Verify metrics now appear:
Only seeing connection metrics
If you see qtap_connection_* metrics but no qtap_http_* metrics, the http_metrics plugin is not configured. See Step 0.
Metrics show but dashboard is empty
Check label names: Qtap v0 uses
host, notdomain. Update queries:Check data source: Ensure Grafana is connected to the right Prometheus
Check time range: Extend the time range in Grafana
Verify queries: Test dashboard queries directly in Prometheus
Prometheus can't reach qtap
High cardinality warnings
This means you have too many unique label combinations. Solutions:
Filter processes in qtap configuration
Use recording rules to aggregate
Limit domain capture with endpoints configuration
Consider using Prometheus remote write to long-term storage
Missing metrics after qtap restart
Prometheus counters reset when qtap restarts. This is normal. Use rate() function which handles counter resets automatically.
Next Steps
Customize dashboards: Add panels for your specific use cases
Set up Alertmanager: Route alerts to Slack, PagerDuty, or email
Create SLOs: Define service level objectives based on qtap metrics
Integrate with logs: Correlate metrics with qtap captured payloads in your object store
Multi-cluster monitoring: Federate metrics from multiple qtap deployments
Additional Resources
Last updated