# Prometheus Metrics

Qtap exposes Prometheus-compatible metrics endpoints for monitoring traffic activity and agent health. These metrics enable you to build dashboards, set up alerts, and track system performance.

{% hint style="warning" %}
**REQUIRED**: HTTP-level metrics require the `http_metrics` plugin in your stack configuration. See [Configuration Requirement](#configuration-requirement) below.
{% endhint %}

## Configuration Requirement

To enable HTTP-level Prometheus metrics, you **must** add the `http_metrics` plugin to your stack:

```yaml
stacks:
  my_stack:
    plugins:
      - type: http_capture  # Optional
        config:
          level: summary
      - type: http_metrics  # REQUIRED for Prometheus metrics
```

Without this plugin, only connection-level metrics will be available.

## Metrics Endpoints

Qtap provides two metrics endpoints on `localhost:10001`:

### `/metrics`

Application metrics for monitoring HTTP traffic activity on your system. Use these metrics to build dashboards tracking:

* Request rates and volumes
* Error rates and status code distributions
* Response times and latency percentiles
* Request/response payload sizes
* Traffic patterns by host, method, and protocol

### `/system/metrics`

System metrics related to the health and performance of the qtap agent itself. These include:

* eBPF program performance
* Memory and CPU usage
* Event processing rates
* Internal error counts

## Accessing Metrics

### From the Host

```bash
curl http://localhost:10001/metrics
curl http://localhost:10001/system/metrics
```

### From Docker Container

When running qtap in Docker, expose port 10001:

```bash
docker run -d --name qtap \
  -p 10001:10001 \
  # ... other flags ...
  us-docker.pkg.dev/qpoint-edge/public/qtap:v0
```

Then access from the host:

```bash
curl http://localhost:10001/metrics
```

### From Kubernetes

Forward the metrics port to your local machine:

```bash
kubectl port-forward pod/qtap-xxxxx 10001:10001
```

Then access locally:

```bash
curl http://localhost:10001/metrics
```

## Key Application Metrics

The `/metrics` endpoint provides these HTTP traffic metrics:

### Request Metrics

**`qtap_http_requests_total`**

* **Type**: Counter
* **Description**: Total number of HTTP requests observed
* **Labels**: `host`, `method`, `protocol`, `status_code`
* **Use**: Calculate request rates, error rates, and traffic distribution

**`qtap_http_requests_duration_ms_bucket`**

* **Type**: Histogram
* **Description**: Request duration distribution in milliseconds
* **Labels**: `host`, `method`, `protocol`, `status_code`, `le` (histogram bucket)
* **Use**: Calculate latency percentiles (p50, p95, p99)

**`qtap_http_requests_duration_ms_sum`**

* **Type**: Counter
* **Description**: Total request processing time in milliseconds
* **Labels**: `host`, `method`, `protocol`, `status_code`
* **Use**: Calculate average request duration

**`qtap_http_requests_duration_ms_count`**

* **Type**: Counter
* **Description**: Total number of requests measured for duration
* **Labels**: `host`, `method`, `protocol`, `status_code`
* **Use**: Paired with `_sum` for average calculations

**`qtap_http_requests_size_bytes_bucket`**

* **Type**: Histogram
* **Description**: Request payload size distribution in bytes
* **Labels**: `host`, `method`, `protocol`, `status_code`, `le`
* **Use**: Calculate payload size percentiles

**`qtap_http_requests_size_bytes_sum`**

* **Type**: Counter
* **Description**: Total size of HTTP request payloads in bytes
* **Labels**: `host`, `method`, `protocol`, `status_code`
* **Use**: Calculate average request size

**`qtap_http_requests_size_bytes_count`**

* **Type**: Counter
* **Description**: Number of requests measured for size
* **Labels**: `host`, `method`, `protocol`, `status_code`
* **Use**: Paired with `_sum` for average calculations

### Response Metrics

**`qtap_http_responses_total`**

* **Type**: Counter
* **Description**: Total number of HTTP responses observed
* **Labels**: `host`, `method`, `protocol`, `status_code`
* **Use**: Track response counts

**`qtap_http_responses_duration_ms_bucket`**

* **Type**: Histogram
* **Description**: Response duration distribution (TTFB) in milliseconds
* **Labels**: `host`, `method`, `protocol`, `status_code`, `le`
* **Use**: Calculate response time percentiles

**`qtap_http_responses_duration_ms_sum`**

* **Type**: Counter
* **Description**: Total response time in milliseconds (TTFB - Time To First Byte)
* **Labels**: `host`, `method`, `protocol`, `status_code`
* **Use**: Calculate average response times

**`qtap_http_responses_duration_ms_count`**

* **Type**: Counter
* **Description**: Number of responses measured for duration
* **Labels**: `host`, `method`, `protocol`, `status_code`
* **Use**: Paired with `_sum` for average calculations

**`qtap_http_responses_size_bytes_bucket`**

* **Type**: Histogram
* **Description**: Response payload size distribution in bytes
* **Labels**: `host`, `method`, `protocol`, `status_code`, `le`
* **Use**: Calculate payload size percentiles

**`qtap_http_responses_size_bytes_sum`**

* **Type**: Counter
* **Description**: Total size of HTTP response payloads in bytes
* **Labels**: `host`, `method`, `protocol`, `status_code`
* **Use**: Calculate average response size

**`qtap_http_responses_size_bytes_count`**

* **Type**: Counter
* **Description**: Number of responses measured for size
* **Labels**: `host`, `method`, `protocol`, `status_code`
* **Use**: Paired with `_sum` for average calculations

### Combined Metrics

**`qtap_http_duration_ms_bucket`**

* **Type**: Histogram
* **Description**: Combined request+response duration distribution in milliseconds
* **Labels**: None (aggregates all traffic)
* **Use**: Calculate overall latency percentiles across all hosts

**`qtap_http_duration_ms_sum`**

* **Type**: Counter
* **Description**: Total combined request+response time in milliseconds
* **Labels**: None
* **Use**: Calculate overall average transaction duration

**`qtap_http_duration_ms_count`**

* **Type**: Counter
* **Description**: Total number of completed HTTP transactions
* **Labels**: None
* **Use**: Paired with `_sum` for average calculations

### Connection-Level Metrics

These metrics are always available without the `http_metrics` plugin:

**`qtap_connection_open_total`**

* **Type**: Counter
* **Description**: Total number of connections opened
* **Labels**: `direction`, `remote_addr`, `remote_port`

**`qtap_connection_close_total`**

* **Type**: Counter
* **Description**: Total number of connections closed
* **Labels**: `direction`, `remote_addr`, `remote_port`

**`qtap_connection_active_total`**

* **Type**: Gauge
* **Description**: Number of currently active connections
* **Labels**: `direction`, `remote_addr`, `remote_port`
* **Use**: Monitor real-time connection usage by destination

**`qtap_connection_protocol_total`**

* **Type**: Counter
* **Description**: Total connections by protocol type
* **Labels**: `protocol`
* **Use**: Track distribution of connection protocols (http1, http2, etc.)

**`qtap_connection_bytes_sent_total`**

* **Type**: Counter
* **Description**: Total bytes sent across all connections
* **Labels**: `direction`, `remote_addr`, `remote_port`

**`qtap_connection_bytes_recv_total`**

* **Type**: Counter
* **Description**: Total bytes received across all connections
* **Labels**: `direction`, `remote_addr`, `remote_port`

**`qtap_connection_duration_ms`**

* **Type**: Histogram
* **Description**: Connection duration distribution in milliseconds
* **Labels**: `direction`, `remote_addr`, `remote_port`, `le`

**`qtap_connection_tls_handshake_total`**

* **Type**: Counter
* **Description**: Total number of TLS handshakes completed
* **Labels**: `direction`, `remote_addr`, `remote_port`, `sni`, `version`
* **Use**: Track TLS/HTTPS connection establishment by SNI and TLS version

## Common Queries

### Request Rate (per second)

```promql
rate(qtap_http_requests_total[5m])
```

### Error Rate (4xx and 5xx responses)

```promql
sum(rate(qtap_http_requests_total{status_code=~"4..|5.."}[5m]))
/
sum(rate(qtap_http_requests_total[5m]))
```

### Average Response Time

```promql
rate(qtap_http_responses_duration_ms_sum[5m])
/
rate(qtap_http_responses_duration_ms_count[5m])
```

### 95th Percentile Latency

```promql
histogram_quantile(0.95,
  rate(qtap_http_requests_duration_ms_bucket[5m])
)
```

### Top Hosts by Request Volume

```promql
topk(10, sum by (host) (rate(qtap_http_requests_total[5m])))
```

### Request Rate by Method

```promql
sum by (method) (rate(qtap_http_requests_total[5m]))
```

### Overall Transaction Latency (Combined)

```promql
# Average overall transaction time
rate(qtap_http_duration_ms_sum[5m])
/
rate(qtap_http_duration_ms_count[5m])

# 95th percentile overall latency
histogram_quantile(0.95,
  rate(qtap_http_duration_ms_bucket[5m])
)
```

### Active Connections

```promql
# Total active connections
sum(qtap_connection_active_total)

# Active connections by destination
topk(10, qtap_connection_active_total)

# Active connections by direction
sum by (direction) (qtap_connection_active_total)
```

### TLS Handshake Rate

```promql
# Overall handshake rate
sum(rate(qtap_connection_tls_handshake_total[5m]))

# Handshake rate by SNI (domain)
sum by (sni) (rate(qtap_connection_tls_handshake_total[5m]))

# Handshake rate by TLS version
sum by (version) (rate(qtap_connection_tls_handshake_total[5m]))
```

### Connection Throughput

```promql
# Bytes sent per second
rate(qtap_connection_bytes_sent_total[5m])

# Bytes received per second
rate(qtap_connection_bytes_recv_total[5m])
```

## Grafana Dashboard

Qpoint provides a sample Grafana dashboard that visualizes these metrics:

**Dashboard**: [qtap-http-overview.json](https://github.com/qpoint-io/qtap/blob/main/examples/dashboards/qtap-http-overview.json)

The dashboard includes:

* **Request Rate**: Overall HTTP request volume over time
* **Error Rate**: Percentage of failed requests (4xx/5xx)
* **Response Time**: Average response duration and TTFB
* **Latency Percentiles**: p50, p95, p99 request duration
* **Request/Response Sizes**: Average payload sizes
* **Traffic by Host**: Request distribution across hosts
* **Status Code Distribution**: Breakdown of HTTP status codes

### Importing the Dashboard

1. Download the dashboard JSON from the link above
2. In Grafana, navigate to **Dashboards** → **Import**
3. Upload the JSON file or paste its contents
4. Configure your Prometheus datasource
5. Save the dashboard

## Prometheus Configuration

Add qtap as a scrape target in your `prometheus.yml`:

```yaml
scrape_configs:
  - job_name: 'qtap'
    static_configs:
      - targets: ['localhost:10001']
    metrics_path: '/metrics'
    scrape_interval: 15s

  - job_name: 'qtap-system'
    static_configs:
      - targets: ['localhost:10001']
    metrics_path: '/system/metrics'
    scrape_interval: 30s
```

For Kubernetes deployments, use ServiceMonitor (Prometheus Operator) or pod annotations for automatic discovery:

```yaml
annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "10001"
  prometheus.io/path: "/metrics"
```

## Alerting Examples

### High Error Rate Alert

```yaml
groups:
  - name: qtap
    rules:
      - alert: HighHTTPErrorRate
        expr: |
          (
            sum(rate(qtap_http_requests_total{status_code=~"5.."}[5m]))
            /
            sum(rate(qtap_http_requests_total[5m]))
          ) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High HTTP 5xx error rate"
          description: "Error rate is {{ $value | humanizePercentage }}"
```

### High Latency Alert

```yaml
- alert: HighResponseLatency
  expr: |
    histogram_quantile(0.95,
      rate(qtap_http_requests_duration_ms_bucket[5m])
    ) > 1000
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "High response latency detected"
    description: "95th percentile latency is {{ $value }}ms"
```

## Best Practices

1. **Use labels wisely**: The `host`, `method`, `protocol`, and `status_code` labels enable powerful filtering but can increase cardinality. Filter noisy traffic if needed.
2. **Set appropriate scrape intervals**: 15-30 seconds is typical for application metrics. System metrics can be scraped less frequently.
3. **Use recording rules**: Pre-compute expensive queries like percentiles to improve dashboard performance.
4. **Monitor agent health**: Track `/system/metrics` to ensure qtap itself is healthy and not experiencing backpressure.
5. **Set up alerts**: Don't just collect metrics - alert on anomalies like error rate spikes or latency increases.

## Troubleshooting

### Metrics endpoint not accessible

Check that qtap is running and port 10001 is accessible:

```bash
docker logs qtap
netstat -tlnp | grep 10001
```

### No metrics appearing

Verify qtap is capturing traffic:

```bash
# Check qtap logs for capture activity
docker logs qtap | grep -i http

# Generate test traffic
curl https://httpbin.org/get

# Check metrics again
curl http://localhost:10001/metrics | grep qtap_http_requests_total
```

### High cardinality issues

If you have too many unique label combinations:

1. Use filters to exclude noisy processes (see [Traffic Capture Settings](/getting-started/qtap/configuration/traffic-capture-settings.md))
2. Aggregate metrics in Prometheus recording rules
3. Limit capture to specific hosts with endpoints configuration


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.qpoint.io/appendix/prometheus-metrics.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
