> For the complete documentation index, see [llms.txt](https://docs.qpoint.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.qpoint.io/guides/qtap-guides/observability-and-integration/monitoring-qtap-with-prometheus-and-grafana.md).

# Prometheus + Grafana Monitoring

This guide walks you through setting up comprehensive monitoring for qtap using Prometheus for metrics collection and Grafana for visualization. By the end, you'll have dashboards showing request rates, error rates, latency percentiles, and traffic patterns.

## What You'll Build

* **Prometheus** scraping qtap metrics every 15 seconds
* **Grafana dashboard** visualizing HTTP traffic patterns
* **Alerts** for high error rates and latency spikes
* **RED metrics** (Rate, Errors, Duration) for all observed traffic

## Prerequisites

* Qtap installed and running (see [Getting Started](/getting-started/qtap/getting-started.md))
* Docker or Kubernetes environment
* Basic familiarity with Prometheus and Grafana

## Step 0: Enable HTTP Metrics Plugin

{% hint style="danger" %}
**CRITICAL FIRST STEP**: Qtap requires the `http_metrics` plugin to expose HTTP-level Prometheus metrics. Without this plugin, you'll only see connection-level metrics.
{% endhint %}

### Update Your Qtap Configuration

Edit your qtap configuration file to include the `http_metrics` plugin:

```yaml
version: 2

services:
  event_stores:
    - type: stdout
  object_stores:
    - type: stdout

stacks:
  monitoring_stack:
    plugins:
      # HTTP capture plugin (optional - for logging/storage)
      - type: http_capture
        config:
          level: summary
          format: json

      # HTTP metrics plugin (REQUIRED for Prometheus metrics)
      - type: http_metrics

tap:
  direction: egress
  http:
    stack: monitoring_stack
```

### Restart Qtap

After adding the plugin, restart qtap:

```bash
# Docker
docker restart qtap

# Kubernetes
kubectl rollout restart daemonset/qtap
```

## Step 1: Verify Metrics Are Available

First, confirm qtap is exposing HTTP metrics:

```bash
# Check qtap is running
docker ps | grep qtap

# Access metrics endpoint
curl http://localhost:10001/metrics | grep qtap_http_requests_total
```

You should see Prometheus-formatted metrics:

```
# HELP qtap_http_requests_total Total HTTP requests observed
# TYPE qtap_http_requests_total counter
qtap_http_requests_total{host="httpbin.org",method="GET",protocol="http2",status_code="200"} 5
```

{% hint style="warning" %}
**If you don't see `qtap_http_requests_total`**, the `http_metrics` plugin is not configured. Go back to Step 0.
{% endhint %}

If metrics exist but show zero, generate test traffic:

```bash
# Generate test traffic
curl https://httpbin.org/get

# Check metrics again (wait 5 seconds for metrics to update)
sleep 5
curl http://localhost:10001/metrics | grep qtap_http_requests_total
```

## Step 2: Deploy Prometheus

{% hint style="info" %}
**Already have Prometheus running?** Skip the deployment sections below and jump to [adding the qtap scrape configuration](#add-qtap-scrape-config-to-existing-prometheus) to your existing `prometheus.yml`.
{% endhint %}

### Option A: Docker Compose

Create `prometheus.yml`:

```yaml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  # Application metrics
  - job_name: 'qtap'
    static_configs:
      - targets: ['host.docker.internal:10001']
    metrics_path: '/metrics'

  # Agent health metrics
  - job_name: 'qtap-system'
    static_configs:
      - targets: ['host.docker.internal:10001']
    metrics_path: '/system/metrics'
    scrape_interval: 30s
```

Create `docker-compose.yml`:

```yaml
services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    extra_hosts:
      - "host.docker.internal:host-gateway"  # On Linux you may need this to reach the host; see note below

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false

volumes:
  prometheus-data:
  grafana-data:
```

Start the stack:

```bash
docker compose up -d
```

### Option B: Kubernetes with ServiceMonitor

If using Prometheus Operator, create a ServiceMonitor:

```yaml
apiVersion: v1
kind: Service
metadata:
  name: qtap-metrics
  labels:
    app: qtap
spec:
  selector:
    app: qtap
  ports:
    - name: metrics
      port: 10001
      targetPort: 10001
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: qtap
  labels:
    app: qtap
spec:
  selector:
    matchLabels:
      app: qtap
  endpoints:
    - port: metrics
      path: /metrics
      interval: 15s
    - port: metrics
      path: /system/metrics
      interval: 30s
```

Apply it:

```bash
kubectl apply -f qtap-servicemonitor.yaml
```

### Option C: Kubernetes with Pod Annotations

For standard Prometheus server, add annotations to your qtap DaemonSet:

```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: qtap
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "10001"
        prometheus.io/path: "/metrics"
    spec:
      # ... qtap pod spec
```

### Add Qtap Scrape Config to Existing Prometheus

If you already have Prometheus running, add these scrape configs to your existing `prometheus.yml`:

```yaml
scrape_configs:
  # ... your existing jobs ...

  # Qtap application metrics
  - job_name: 'qtap'
    # Docker for Mac/Windows exposes host.docker.internal automatically.
    # On Linux, add --add-host=host.docker.internal:host-gateway (Compose v2)
    # or replace with your bridge IP (commonly 172.17.0.1).
    static_configs:
      - targets: ['<qtap-host>:10001']  # Replace with your qtap host
    metrics_path: '/metrics'
    scrape_interval: 15s

  # Qtap system/health metrics
  - job_name: 'qtap-system'
    static_configs:
      - targets: ['<qtap-host>:10001']
    metrics_path: '/system/metrics'
    scrape_interval: 30s
```

**For Kubernetes with Prometheus Operator:** If you already have Prometheus Operator installed, just apply the ServiceMonitor from Option B above - no need to modify Prometheus config files.

Reload Prometheus configuration:

```bash
# Docker
docker exec prometheus kill -HUP 1

# Kubernetes (if using Prometheus Operator, reload is automatic)
kubectl rollout restart deployment/prometheus-server -n monitoring
```

## Step 3: Verify Prometheus Is Scraping

Open Prometheus UI at `http://localhost:9090` and check:

1. **Status → Targets**: Verify `qtap` and `qtap-system` jobs are "UP"
2. **Run a query**: Try `qtap_http_requests_total` in the query box

If targets show as DOWN:

```bash
# Check network connectivity from Prometheus container
# macOS/Windows
docker exec prometheus wget -O- http://host.docker.internal:10001/metrics
# Linux (if host.docker.internal is unavailable)
# docker exec prometheus wget -O- http://172.17.0.1:10001/metrics

# For Kubernetes, check service and endpoints
kubectl get svc qtap-metrics
kubectl get endpoints qtap-metrics
```

## Step 4: Import Grafana Dashboard

{% hint style="info" %}
**Already have Grafana running?** Great! Skip ahead to [Add Prometheus Data Source](#add-prometheus-data-source) and use your existing Grafana instance.
{% endhint %}

### Access Grafana

Navigate to `http://localhost:3000` and log in:

* **Username**: admin
* **Password**: admin (or value from `GF_SECURITY_ADMIN_PASSWORD`)

### Add Prometheus Data Source

{% hint style="info" %}
**Already have Prometheus configured as a data source?** You can skip this section and go straight to [Import Qtap Dashboard](#import-qtap-dashboard).
{% endhint %}

1. Navigate to **Configuration** → **Data Sources**
2. Click **Add data source**
3. Select **Prometheus**
4. Set URL:
   * Docker Compose: `http://prometheus:9090`
   * Kubernetes: `http://prometheus-server.monitoring.svc.cluster.local`
5. Click **Save & Test**

### Import Qtap Dashboard

1. Download the dashboard: [qtap-http-overview.json](https://github.com/qpoint-io/qtap/blob/main/examples/dashboards/qtap-http-overview.json)
2. In Grafana, navigate to **Dashboards** → **Import**
3. Click **Upload JSON file** and select the downloaded file
4. Select your Prometheus data source
5. Click **Import**

{% hint style="info" %}
The official Grafana dashboard may need label adjustments. Qtap v0 uses `host` labels, not `domain`. If panels are empty, edit queries to replace `domain` with `host`.
{% endhint %}

### Dashboard Panels

The dashboard includes:

* **Request Rate**: Total requests per second over time
* **Error Rate**: Percentage of 4xx/5xx responses
* **Average Response Time**: Mean response duration
* **Latency Percentiles**: p50, p95, p99 request duration
* **Request/Response Sizes**: Average payload sizes
* **Top Hosts**: Highest traffic hosts
* **Status Code Distribution**: Breakdown by HTTP status
* **Protocol Distribution**: HTTP/1.1 vs HTTP/2 traffic

## Step 5: Set Up Alerts

Create `alerts.yml` for Prometheus:

```yaml
groups:
  - name: qtap-alerts
    interval: 30s
    rules:
      # High error rate alert
      - alert: HighHTTPErrorRate
        expr: |
          (
            sum(rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
            /
            sum(rate(qtap_http_requests_total[5m]))
          ) > 0.05
        for: 5m
        labels:
          severity: warning
          component: qtap
        annotations:
          summary: "High HTTP 5xx error rate detected"
          description: "Error rate is {{ $value | humanizePercentage }} over the last 5 minutes"

      # High latency alert
      - alert: HighResponseLatency
        expr: |
          histogram_quantile(0.95,
            rate(qtap_http_requests_duration_ms_bucket[5m])
          ) > 1000
        for: 10m
        labels:
          severity: warning
          component: qtap
        annotations:
          summary: "High response latency detected"
          description: "95th percentile latency is {{ $value }}ms"

      # Traffic spike alert
      - alert: TrafficSpike
        expr: |
          rate(qtap_http_requests_total[5m])
          > 2 * rate(qtap_http_requests_total[1h] offset 1h)
        for: 5m
        labels:
          severity: info
          component: qtap
        annotations:
          summary: "Unusual traffic spike detected"
          description: "Current request rate is {{ $value | humanize }}req/s"

      # Qtap agent health
      - alert: QtapAgentDown
        expr: up{job="qtap"} == 0
        for: 2m
        labels:
          severity: critical
          component: qtap
        annotations:
          summary: "Qtap agent is down"
          description: "Qtap metrics endpoint is not responding"
```

Update `prometheus.yml` to load the rules:

```yaml
global:
  scrape_interval: 15s

rule_files:
  - /etc/prometheus/alerts.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']  # Optional: configure Alertmanager

scrape_configs:
  # ... existing scrape configs
```

Reload Prometheus configuration:

```bash
# Docker Compose
docker compose restart prometheus

# Kubernetes
kubectl rollout restart deployment prometheus-server
```

### Configure Grafana Alerts (Alternative)

You can also set up alerts directly in Grafana:

1. Open a dashboard panel
2. Click **Edit** → **Alert** tab
3. Create alert conditions based on the queries
4. Configure notification channels (Slack, PagerDuty, email, etc.)

## Step 6: Test the Monitoring Setup

Generate various types of traffic to see metrics flow through:

### Successful Requests

```bash
for i in {1..50}; do
  curl -s https://httpbin.org/get > /dev/null
  sleep 0.1
done
```

### Error Responses

```bash
for i in {1..10}; do
  curl -s https://httpbin.org/status/500 > /dev/null
  sleep 0.2
done
```

### High Latency

```bash
for i in {1..5}; do
  curl -s https://httpbin.org/delay/2 > /dev/null
done
```

### View in Grafana

1. Open the qtap dashboard
2. Observe request rate increase
3. See error rate spike from 500 responses
4. Check latency percentiles from delayed requests

## Step 7: Optimize for Production

### Reduce Metric Cardinality

High cardinality (many unique label combinations) can impact Prometheus performance. To reduce it:

**Filter noisy processes** in qtap config:

```yaml
filters:
  groups:
    - qpoint
  custom:
    - exe: /usr/bin/health-check
      strategy: exact
    - exe: /usr/local/bin/monitoring-
      strategy: prefix
```

**Focus on important domains**:

```yaml
tap:
  endpoints:
    - domain: 'api.production.example.com'
      http:
        stack: monitored_stack
```

### Use Recording Rules

Pre-compute expensive queries with Prometheus recording rules:

```yaml
groups:
  - name: qtap-recordings
    interval: 15s
    rules:
      # Pre-compute request rate by host
      - record: qtap:http_requests:rate5m
        expr: |
          sum by (host) (rate(qtap_http_requests_total[5m]))

      # Pre-compute error rate
      - record: qtap:http_errors:rate5m
        expr: |
          sum(rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
          /
          sum(rate(qtap_http_requests_total[5m]))

      # Pre-compute p95 latency
      - record: qtap:http_latency:p95
        expr: |
          histogram_quantile(0.95,
            sum(rate(qtap_http_requests_duration_ms_bucket[5m])) by (le)
          )
```

Use these in your dashboards and alerts for better performance.

### Retention Settings

Configure Prometheus retention based on your needs:

```yaml
# docker-compose.yml
command:
  - '--config.file=/etc/prometheus/prometheus.yml'
  - '--storage.tsdb.path=/prometheus'
  - '--storage.tsdb.retention.time=30d'    # Keep 30 days
  - '--storage.tsdb.retention.size=10GB'    # Or max 10GB
```

## Common Queries for Troubleshooting

### Find services with high error rates

```promql
topk(5,
  sum by (host) (rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
  /
  sum by (host) (rate(qtap_http_requests_total[5m]))
)
```

### Identify slow endpoints

```promql
topk(10,
  histogram_quantile(0.95,
    sum by (host, le) (rate(qtap_http_requests_duration_ms_bucket[5m]))
  )
)
```

### HTTP/2 vs HTTP/1 Traffic

```promql
sum by (protocol) (rate(qtap_http_requests_total[5m]))
```

### Average Request/Response Sizes

```promql
# Average request size
rate(qtap_http_requests_size_bytes_sum[5m])
/
rate(qtap_http_requests_size_bytes_count[5m])

# Average response size
rate(qtap_http_responses_size_bytes_sum[5m])
/
rate(qtap_http_responses_size_bytes_count[5m])
```

### Compare traffic patterns over time

```promql
sum(rate(qtap_http_requests_total[5m]))
/
sum(rate(qtap_http_requests_total[5m] offset 1d))
```

### Monitor overall system latency

```promql
# Average end-to-end transaction time (combined request+response)
rate(qtap_http_duration_ms_sum[5m])
/
rate(qtap_http_duration_ms_count[5m])

# 95th percentile overall latency
histogram_quantile(0.95,
  rate(qtap_http_duration_ms_bucket[5m])
)
```

### Check active connections

```promql
# Total active connections
sum(qtap_connection_active_total)

# Active connections by destination
topk(10, qtap_connection_active_total)
```

### Track TLS/HTTPS usage

```promql
# TLS handshake rate (HTTPS connection establishment)
sum(rate(qtap_connection_tls_handshake_total[5m]))

# Connections by TLS version
sum by (version) (rate(qtap_connection_tls_handshake_total[5m]))
```

## Troubleshooting

### No HTTP metrics appearing in Prometheus

**Most Common Issue**: Missing `http_metrics` plugin.

```bash
# Check if HTTP metrics exist
curl http://localhost:10001/metrics | grep "qtap_http_requests_total"
```

If empty:

1. **Verify `http_metrics` plugin is in your qtap config:**

```yaml
stacks:
  my_stack:
    plugins:
      - type: http_metrics  # ← Must be present
```

2. **Restart qtap after adding it:**

```bash
docker restart qtap
```

3. **Verify metrics now appear:**

```bash
curl http://localhost:10001/metrics | grep "qtap_http_requests_total"
```

### Only seeing connection metrics

If you see `qtap_connection_*` metrics but no `qtap_http_*` metrics, the `http_metrics` plugin is not configured. See Step 0.

### Metrics show but dashboard is empty

1. **Check label names**: Qtap v0 uses `host`, not `domain`. Update queries:

   ```promql
   # Wrong
   sum by (domain) (rate(qtap_http_requests_total[5m]))

   # Correct
   sum by (host) (rate(qtap_http_requests_total[5m]))
   ```
2. **Check data source**: Ensure Grafana is connected to the right Prometheus
3. **Check time range**: Extend the time range in Grafana
4. **Verify queries**: Test dashboard queries directly in Prometheus

### Prometheus can't reach qtap

```bash
# Docker Compose (Mac/Windows)
docker exec prometheus wget -O- http://host.docker.internal:10001/metrics
# Docker Compose (Linux):
# docker exec prometheus wget -O- http://172.17.0.1:10001/metrics

# Kubernetes
kubectl run curl-test --image=curlimages/curl --rm -it -- \
  curl http://qtap-metrics.default.svc.cluster.local:10001/metrics
```

### High cardinality warnings

```
level=warn msg="Metric has too many labels"
```

This means you have too many unique label combinations. Solutions:

1. Filter processes in qtap configuration
2. Use recording rules to aggregate
3. Limit domain capture with endpoints configuration
4. Consider using Prometheus remote write to long-term storage

### Missing metrics after qtap restart

Prometheus counters reset when qtap restarts. This is normal. Use `rate()` function which handles counter resets automatically.

## Next Steps

* **Customize dashboards**: Add panels for your specific use cases
* **Set up Alertmanager**: Route alerts to Slack, PagerDuty, or email
* **Create SLOs**: Define service level objectives based on qtap metrics
* **Integrate with logs**: Correlate metrics with qtap captured payloads in your object store
* **Multi-cluster monitoring**: Federate metrics from multiple qtap deployments

## Additional Resources

* [Metrics Configuration Reference](/getting-started/qtap/configuration/metrics.md)
* [Prometheus Metrics Reference](/appendix/prometheus-metrics.md)
* [Traffic Capture Settings](/getting-started/qtap/configuration/traffic-capture-settings.md)
* [Prometheus Documentation](https://prometheus.io/docs/)
* [Grafana Tutorials](https://grafana.com/tutorials/)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.qpoint.io/guides/qtap-guides/observability-and-integration/monitoring-qtap-with-prometheus-and-grafana.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
