Prometheus + Grafana Monitoring

This guide walks you through setting up comprehensive monitoring for qtap using Prometheus for metrics collection and Grafana for visualization. By the end, you'll have dashboards showing request rates, error rates, latency percentiles, and traffic patterns.

What You'll Build

  • Prometheus scraping qtap metrics every 15 seconds

  • Grafana dashboard visualizing HTTP traffic patterns

  • Alerts for high error rates and latency spikes

  • RED metrics (Rate, Errors, Duration) for all observed traffic

Prerequisites

  • Qtap installed and running (see Getting Started)

  • Docker or Kubernetes environment

  • Basic familiarity with Prometheus and Grafana

Step 0: Enable HTTP Metrics Plugin

Update Your Qtap Configuration

Edit your qtap configuration file to include the http_metrics plugin:

Restart Qtap

After adding the plugin, restart qtap:

Step 1: Verify Metrics Are Available

First, confirm qtap is exposing HTTP metrics:

You should see Prometheus-formatted metrics:

If metrics exist but show zero, generate test traffic:

Step 2: Deploy Prometheus

Already have Prometheus running? Skip the deployment sections below and jump to adding the qtap scrape configuration to your existing prometheus.yml.

Option A: Docker Compose

Create prometheus.yml:

Create docker-compose.yml:

Start the stack:

Option B: Kubernetes with ServiceMonitor

If using Prometheus Operator, create a ServiceMonitor:

Apply it:

Option C: Kubernetes with Pod Annotations

For standard Prometheus server, add annotations to your qtap DaemonSet:

Add Qtap Scrape Config to Existing Prometheus

If you already have Prometheus running, add these scrape configs to your existing prometheus.yml:

For Kubernetes with Prometheus Operator: If you already have Prometheus Operator installed, just apply the ServiceMonitor from Option B above - no need to modify Prometheus config files.

Reload Prometheus configuration:

Step 3: Verify Prometheus Is Scraping

Open Prometheus UI at http://localhost:9090 and check:

  1. Status → Targets: Verify qtap and qtap-system jobs are "UP"

  2. Run a query: Try qtap_http_requests_total in the query box

If targets show as DOWN:

Step 4: Import Grafana Dashboard

Already have Grafana running? Great! Skip ahead to Add Prometheus Data Source and use your existing Grafana instance.

Access Grafana

Navigate to http://localhost:3000 and log in:

  • Username: admin

  • Password: admin (or value from GF_SECURITY_ADMIN_PASSWORD)

Add Prometheus Data Source

Already have Prometheus configured as a data source? You can skip this section and go straight to Import Qtap Dashboard.

  1. Navigate to ConfigurationData Sources

  2. Click Add data source

  3. Select Prometheus

  4. Set URL:

    • Docker Compose: http://prometheus:9090

    • Kubernetes: http://prometheus-server.monitoring.svc.cluster.local

  5. Click Save & Test

Import Qtap Dashboard

  1. Download the dashboard: qtap-http-overview.json

  2. In Grafana, navigate to DashboardsImport

  3. Click Upload JSON file and select the downloaded file

  4. Select your Prometheus data source

  5. Click Import

The official Grafana dashboard may need label adjustments. Qtap v0 uses host labels, not domain. If panels are empty, edit queries to replace domain with host.

Dashboard Panels

The dashboard includes:

  • Request Rate: Total requests per second over time

  • Error Rate: Percentage of 4xx/5xx responses

  • Average Response Time: Mean response duration

  • Latency Percentiles: p50, p95, p99 request duration

  • Request/Response Sizes: Average payload sizes

  • Top Hosts: Highest traffic hosts

  • Status Code Distribution: Breakdown by HTTP status

  • Protocol Distribution: HTTP/1.1 vs HTTP/2 traffic

Step 5: Set Up Alerts

Create alerts.yml for Prometheus:

Update prometheus.yml to load the rules:

Reload Prometheus configuration:

Configure Grafana Alerts (Alternative)

You can also set up alerts directly in Grafana:

  1. Open a dashboard panel

  2. Click EditAlert tab

  3. Create alert conditions based on the queries

  4. Configure notification channels (Slack, PagerDuty, email, etc.)

Step 6: Test the Monitoring Setup

Generate various types of traffic to see metrics flow through:

Successful Requests

Error Responses

High Latency

View in Grafana

  1. Open the qtap dashboard

  2. Observe request rate increase

  3. See error rate spike from 500 responses

  4. Check latency percentiles from delayed requests

Step 7: Optimize for Production

Reduce Metric Cardinality

High cardinality (many unique label combinations) can impact Prometheus performance. To reduce it:

Filter noisy processes in qtap config:

Focus on important domains:

Use Recording Rules

Pre-compute expensive queries with Prometheus recording rules:

Use these in your dashboards and alerts for better performance.

Retention Settings

Configure Prometheus retention based on your needs:

Common Queries for Troubleshooting

Find services with high error rates

Identify slow endpoints

HTTP/2 vs HTTP/1 Traffic

Average Request/Response Sizes

Compare traffic patterns over time

Monitor overall system latency

Check active connections

Track TLS/HTTPS usage

Troubleshooting

No HTTP metrics appearing in Prometheus

Most Common Issue: Missing http_metrics plugin.

If empty:

  1. Verify http_metrics plugin is in your qtap config:

  1. Restart qtap after adding it:

  1. Verify metrics now appear:

Only seeing connection metrics

If you see qtap_connection_* metrics but no qtap_http_* metrics, the http_metrics plugin is not configured. See Step 0.

Metrics show but dashboard is empty

  1. Check label names: Qtap v0 uses host, not domain. Update queries:

  2. Check data source: Ensure Grafana is connected to the right Prometheus

  3. Check time range: Extend the time range in Grafana

  4. Verify queries: Test dashboard queries directly in Prometheus

Prometheus can't reach qtap

High cardinality warnings

This means you have too many unique label combinations. Solutions:

  1. Filter processes in qtap configuration

  2. Use recording rules to aggregate

  3. Limit domain capture with endpoints configuration

  4. Consider using Prometheus remote write to long-term storage

Missing metrics after qtap restart

Prometheus counters reset when qtap restarts. This is normal. Use rate() function which handles counter resets automatically.

Next Steps

  • Customize dashboards: Add panels for your specific use cases

  • Set up Alertmanager: Route alerts to Slack, PagerDuty, or email

  • Create SLOs: Define service level objectives based on qtap metrics

  • Integrate with logs: Correlate metrics with qtap captured payloads in your object store

  • Multi-cluster monitoring: Federate metrics from multiple qtap deployments

Additional Resources

Last updated