> For the complete documentation index, see [llms.txt](https://docs.qpoint.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.qpoint.io/guides/qtap-guides/observability-and-integration/capturing-all-http-traffic-with-fluent-bit.md).

# Fluent Bit Batching

## Capturing All HTTP Traffic with Fluent Bit

### Overview

This guide shows you how to capture **every HTTP request and response** from Qtap and route it to storage backends using [Fluent Bit](https://fluentbit.io/). This pattern is ideal for comprehensive observability, troubleshooting, audit trails, and compliance requirements where you need complete traffic capture.

#### Why Fluent Bit?

While Qtap can write directly to S3-compatible object stores, **Fluent Bit provides critical advantages at scale**:

* **Performance**: Batches and buffers writes to reduce API calls
* **Reliability**: Built-in retry logic, buffering, and backpressure handling
* **Flexibility**: Route traffic to multiple destinations (S3, CloudWatch, Elasticsearch, etc.)
* **Filtering**: Apply additional filtering, enrichment, or transformation
* **Cost optimization**: Batch uploads reduce S3 API costs significantly

#### Architecture

```
┌──────────┐  forward  ┌──────────────┐  parse   ┌──────────┐
│   Qtap   │──────────▸│  Fluent Bit  │─────────▸│ Filter & │
│  (eBPF)  │  port     │  (Batching)  │          │  Route   │
└──────────┘  24224    └──────────────┘          └────┬─────┘
                                                       │
                                        ┌──────────────┴──────────────┐
                                        ▼                             ▼
                                  ┌──────────┐                 ┌──────────┐
                                  │    S3    │                 │ Stdout / │
                                  │ (MinIO,  │                 │  Other   │
                                  │ AWS, GCS)│                 └──────────┘
                                  └──────────┘
```

**How it works:**

1. Qtap captures HTTP traffic using eBPF (TLS inspection, no proxies)
2. Qtap writes HTTP transaction objects as structured logs
3. Docker forwards logs to Fluent Bit using the fluentd log driver
4. Fluent Bit parses JSON, filters, and tags HTTP transactions
5. Fluent Bit batches, buffers, and routes to storage backends
6. Set TTL policies on storage (recommended: 90 days)

### Docker Deployment

This setup uses Docker's fluentd log driver to forward logs directly to Fluent Bit over the network. This approach is simpler and more reliable than tailing log files.

#### Step 1: Create Qtap Configuration

Create `qtap-config.yaml`:

```yaml
version: 2

services:
  event_stores:
    - type: stdout          # Connection metadata

  object_stores:
    - type: stdout          # Full HTTP payloads

stacks:
  capture_all:
    plugins:
      - type: http_capture
        config:
          level: full       # (none|summary|headers|full) - Capture headers + bodies
          format: json      # (json|text) - JSON for Fluent Bit parsing

tap:
  direction: egress         # egress | egress-external | egress-internal | ingress | all
  ignore_loopback: true     # Skip localhost traffic
  audit_include_dns: false  # Skip DNS queries
  http:
    stack: capture_all
```

#### Step 2: Create Fluent Bit Configuration

Create `fluent-bit.conf`:

```conf
[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    info
    Parsers_File parsers.conf

# Input: Receive logs via forward protocol
[INPUT]
    Name        forward
    Listen      0.0.0.0
    Port        24224

# Filter: Only keep HTTP transactions (lines with metadata field)
[FILTER]
    Name    grep
    Match   docker.qtap     # Matches the logging tag configured in docker-compose
    Regex   log .*"metadata":.*

# Filter: Parse JSON logs from Qtap (runs after grep so the log field still exists)
[FILTER]
    Name        parser
    Match       docker.qtap
    Key_Name    log
    Parser      generic_json_parser
    Reserve_Data On
    Preserve_Key Off

# Output: Write to stdout for testing
[OUTPUT]
    Name         stdout
    Match        docker.qtap
    Format       json_lines
```

{% hint style="info" %}
**How the filters work:**

* The **grep** filter runs **before** parsing so it can match the raw `log` field while it still exists
* The **parser** filter then expands the JSON payload and removes the original `log` key
* This sequencing ensures only HTTP transaction data reaches downstream outputs
  {% endhint %}

Create `parsers.conf`:

```conf
[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On

[PARSER]
    Name   generic_json_parser
    Format json
```

#### Step 3: Create Docker Compose

Create `docker-compose.yaml`:

```yaml
version: '3'

services:
  fluent-bit:
    image: fluent/fluent-bit:latest
    container_name: fluent-bit
    network_mode: host
    volumes:
      - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
      - ./parsers.conf:/fluent-bit/etc/parsers.conf
    command: ["fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.conf"]

  qtap:
    image: us-docker.pkg.dev/qpoint-edge/public/qtap:v0
    container_name: qtap
    depends_on:
      - fluent-bit
    privileged: true
    user: "0:0"
    cap_add:
      - CAP_BPF
      - CAP_SYS_ADMIN
    pid: host
    network_mode: host
    volumes:
      - /sys:/sys
      - ./qtap-config.yaml:/app/config/qtap.yaml
    environment:
      - TINI_SUBREAPER=1
    ulimits:
      memlock: -1
    logging:
      driver: fluentd
      options:
        tag: docker.qtap       # Must match the Fluent Bit Match pattern
        fluentd-address: 127.0.0.1:24224
    command:
      - --log-level=info
      - --log-encoding=console
      - --config=/app/config/qtap.yaml
```

#### Step 4: Start and Validate

```bash
# Start services
docker compose up -d

# Wait a few seconds for eBPF initialization
sleep 5

# Generate test traffic
docker run --rm curlimages/curl -s https://httpbin.org/get

# View captured HTTP traffic
docker logs fluent-bit | grep 'metadata'
```

**Expected output** (one line per HTTP transaction):

```json
{
  "metadata": {
    "process_id": "108520",
    "process_exe": "/usr/bin/curl",
    "bytes_sent": 41,
    "bytes_received": 395,
    "connection_id": "d3shhpg7p3qm85psn2rg",
    "endpoint_id": "httpbin.org"
  },
  "request": {
    "method": "GET",
    "url": "https://httpbin.org/get",
    "scheme": "https",
    "path": "/get",
    "authority": "httpbin.org",
    "protocol": "http2",
    "request_id": "d3shhpg7p3qm85psn2s0",
    "user_agent": "curl/8.12.1",
    "headers": {
      ":authority": "httpbin.org",
      ":method": "GET",
      ":path": "/get",
      ":scheme": "https",
      "Accept": "*/*",
      "User-Agent": "curl/8.12.1"
    }
  },
  "response": {
    "status": 200,
    "content_type": "application/json",
    "headers": {
      ":status": "200",
      "Access-Control-Allow-Credentials": "true",
      "Access-Control-Allow-Origin": "*",
      "Content-Length": "255",
      "Content-Type": "application/json",
      "Date": "Wed, 22 Oct 2025 17:48:54 GMT",
      "Server": "gunicorn/19.9.0"
    },
    "body": "ewogICJhcmdzIjoge30sIAogICJoZWFkZXJzIjogewogICAgIkFjY2VwdCI6ICIqLyoiLCAKICAgICJIb3N0IjogImh0dHBiaW4ub3JnIiwgCiAgICAiVXNlci1BZ2VudCI6ICJjdXJsLzguMTIuMSIsIAogICAgIlgtQW16bi1UcmFjZS1JZCI6ICJSb290PTEtNjhmOTE4ZTYtNmRiZGJlZDA0ZDllMjNhOTU1NjQ0YmEyIgogIH0sIAogICJvcmlnaW4iOiAiNzMuNzEuMTM4LjEwOCIsIAogICJ1cmwiOiAiaHR0cHM6Ly9odHRwYmluLm9yZy9nZXQiCn0K"
  },
  "transaction_time": "2025-10-22T17:48:54.114375279Z",
  "duration_ms": 31205,
  "direction": "egress-external",
  "container_id": "49c2e0af4ffbbfff89be4766c6f84d7a3ad8e528d88fafeb3010facdb6374fb7",
  "container_name": "/qtap",
  "source": "stdout"
}
```

{% hint style="info" %}
Response bodies are base64 encoded in the `body` field.
{% endhint %}

### Production Outputs

#### AWS S3

Replace the stdout output in `fluent-bit.conf` with:

```conf
[OUTPUT]
    Name              s3
    Match             docker.qtap
    bucket            your-bucket-name
    region            us-east-1
    total_file_size   50M
    upload_timeout    10m
    compression       gzip
    s3_key_format     /qtap/year=%Y/month=%m/day=%d/hour=%H/$UUID.gz
    store_dir         /tmp/fluent-bit/s3
    use_put_object    On
```

Add environment variables to the fluent-bit service in docker-compose:

```yaml
environment:
  - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
  - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
```

**Set lifecycle policy:**

```bash
aws s3api put-bucket-lifecycle-configuration \
  --bucket your-bucket-name \
  --lifecycle-configuration '{
    "Rules": [{
      "Id": "DeleteAfter90Days",
      "Status": "Enabled",
      "Prefix": "qtap/",
      "Expiration": {"Days": 90}
    }]
  }'
```

#### MinIO (S3-Compatible)

```conf
[OUTPUT]
    Name              s3
    Match             docker.qtap
    endpoint          http://minio:9000
    bucket            qtap-http-traffic
    region            us-east-1
    total_file_size   50M
    compression       gzip
    s3_key_format     /qtap/year=%Y/month=%m/day=%d/$UUID.gz
    use_put_object    On
```

#### AWS CloudWatch Logs

```conf
[OUTPUT]
    Name              cloudwatch_logs
    Match             docker.qtap
    region            us-east-1
    log_group_name    /qpoint/http-traffic
    log_stream_prefix qtap-
    auto_create_group On
```

**Set retention policy:**

```bash
aws logs put-retention-policy \
  --log-group-name /qpoint/http-traffic \
  --retention-in-days 90
```

#### Multiple Destinations

You can send data to multiple outputs simultaneously:

```conf
# Send to S3 for long-term storage
[OUTPUT]
    Name              s3
    Match             docker.qtap
    bucket            your-bucket-name
    region            us-east-1
    total_file_size   50M
    upload_timeout    10m
    compression       gzip
    s3_key_format     /qtap/year=%Y/month=%m/day=%d/hour=%H/$UUID.gz
    store_dir         /tmp/fluent-bit/s3
    use_put_object    On

# Also send to stdout for debugging
[OUTPUT]
    Name    stdout
    Match   docker.qtap
    Format  json_lines
```

### Filtering and Optimization

#### Capture Only Errors (4xx/5xx)

To reduce volume, capture only failed requests using Qtap's Rulekit:

```yaml
stacks:
  error_only:
    plugins:
      - type: http_capture
        config:
          level: none           # Don't capture by default
          format: json
          rules:
            - name: "Capture errors"
              expr: http.res.status >= 400
              level: full       # Capture errors fully
```

#### Filter by Domain

Capture only specific domains:

```yaml
tap:
  http:
    stack: lightweight_stack    # Default: summary level

  endpoints:
    - domain: 'api.important.com'
      http:
        stack: capture_all      # Full capture for this domain
```

#### Exclude Noisy Processes

```yaml
tap:
  filters:
    groups:
      - qpoint              # Exclude qtap's own traffic
    custom:
      - exe: /usr/bin/healthcheck
        strategy: exact
      - exe: /usr/sbin/
        strategy: prefix    # Exclude all /usr/sbin/ processes
```

#### Filter in Fluent Bit

You can also filter within Fluent Bit using the grep filter:

```conf
# Only capture requests to specific domain
[FILTER]
    Name    grep
    Match   docker.qtap
    Regex   request.url .*api\.important\.com.*

# Only keep stdout log lines (tail input exposes `stream`, the forward driver exposes `source`)
[FILTER]
    Name    grep
    Match   docker.*
    Regex   stream stdout

# Exclude health checks
[FILTER]
    Name    grep
    Match   docker.qtap
    Exclude request.url .*/health$
```

When you aggregate logs with Docker's forward log driver, Fluent Bit surfaces the channel as `source` instead of `stream`, so update the regex to `Regex source stdout` in that deployment model.

### Monitoring and Troubleshooting

#### Verify Qtap is Capturing Traffic

```bash
# Check Qtap logs for HTTP transactions
docker logs qtap | grep '"metadata"'

# Verify TLS detection
docker logs qtap | grep '"is_tls":true'
```

**What to look for:**

* `"exe": "/usr/bin/curl"` - Process identified correctly
* `"protocol": "http2"` or `"http1"` - NOT "other"
* `"is_tls": true` - TLS detected
* `"tlsProbeTypesDetected": ["openssl"]` - TLS library hooked
* Full HTTP details visible despite HTTPS

#### Verify Fluent Bit is Processing

```bash
# Check Fluent Bit logs for errors
docker logs fluent-bit | grep -i "error\|warn"

# Count processed HTTP objects (check for the metadata field)
docker logs fluent-bit | grep -c '"metadata"'

# Check S3 uploads (if using S3 output)
docker logs fluent-bit | grep "s3"
```

#### Common Issues

**No HTTP objects captured:**

* Qtap must be running **before** traffic is generated
* Wait 5-10 seconds after starting qtap for eBPF initialization
* Check qtap logs for connection events
* Verify processes are using supported TLS libraries (OpenSSL, BoringSSL, GnuTLS)

**Fluent Bit not receiving logs:**

* Check that Fluent Bit started before Qtap
* Verify port 24224 is accessible from Qtap container
* Test connectivity: `docker exec qtap nc -zv 127.0.0.1 24224`
* Check Fluent Bit logs for connection messages
* If you see `Error binding socket`, remove any leftover Fluent Bit containers that already bound to port 24224 and re-run `docker compose up -d`

**No HTTP transactions in Fluent Bit output:**

* Qtap writes both HTTP transaction JSON and operational messages to **stdout**
* The grep filter `Regex log .*"metadata":.*` ensures only HTTP transactions are captured
* Verify HTTP transactions reached Fluent Bit: `docker logs fluent-bit | grep '"metadata"'`
* If you see records without `"metadata"`, those are operational logs that should be filtered out
* Make sure the grep filter runs **before** the parser filter or leave `Preserve_Key On`; otherwise the `log` field will disappear before the match runs

**High memory usage:**

* Reduce Fluent Bit `Flush` interval (more frequent uploads)
* Adjust `total_file_size` for S3 batching (smaller = more frequent uploads)
* Add filtering to reduce captured volume
* Enable compression for outputs

**S3 upload failures:**

* Verify AWS credentials are correct
* Check IAM permissions (s3:PutObject required)
* Ensure bucket exists and is in the correct region
* Check network connectivity to S3 endpoint

### Alternative: File Tailing Approach

If you have existing logging infrastructure or cannot use the forward protocol, you can tail Docker's log files directly:

**fluent-bit.conf:**

```conf
[INPUT]
    Name              tail
    Path              /var/lib/docker/containers/*/*.log
    Parser            docker
    Tag               docker.*
    Refresh_Interval  5
    Read_from_Head    true

# Keep only stdout log lines (tail input exposes the field as `stream`)
[FILTER]
    Name    grep
    Match   docker.*
    Regex   stream stdout

[FILTER]
    Name    grep
    Match   docker.*
    Regex   log .*"metadata":.*

# Parse the JSON payload after filtering
[FILTER]
    Name        parser
    Match       docker.*
    Key_Name    log
    Parser      generic_json_parser
    Reserve_Data On
    Preserve_Key Off
```

**docker-compose.yaml changes:**

```yaml
fluent-bit:
  volumes:
    - /var/lib/docker/containers:/var/lib/docker/containers:ro

qtap:
  logging:
    driver: json-file
    options:
      max-size: "10m"
      max-file: "3"
```

Ensure your `parsers.conf` file includes both the built-in `docker` parser and the `generic_json_parser` definition so the tail input can decode Docker log metadata before emitting HTTP transactions.

This approach requires more complex path management and may have permission issues. The forward protocol approach is recommended.

### Best Practices

1. **Start Small**: Begin with error-only capture or specific domains, then expand
2. **Set TTLs**: Always configure lifecycle policies (90 days recommended)
3. **Monitor Volume**: Track storage growth and adjust filtering as needed
4. **Secure Credentials**: Use `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables — never hardcode credentials in configuration files. Rotate keys regularly
5. **Compress**: Enable gzip compression for S3 uploads
6. **Batch Uploads**: Use appropriate `total_file_size` (50M default)
7. **Test Filtering**: Validate filters match expected objects
8. **Health Checks**: Monitor Fluent Bit metrics and error logs
9. **Backup Config**: Version control all configuration files
10. **Security**: Limit access to logs - they contain sensitive data

### Summary

This guide demonstrated a validated deployment pattern for capturing all HTTP traffic with Fluent Bit using Docker:

**Docker Deployment**: Qtap stdout → Forward protocol → Fluent Bit → S3/CloudWatch

This pattern provides:

* ✅ Complete HTTP capture (headers + bodies)
* ✅ TLS inspection without proxies
* ✅ Batched, buffered writes for performance
* ✅ Flexible routing to multiple destinations
* ✅ Data sovereignty (sensitive data stays in your infrastructure)

For questions or advanced configurations, see:

* [Traffic Processing with Plugins](https://github.com/qpoint-io/documentation/blob/main/guides/getting-started/qtap/configuration/traffic-processing-with-plugins.md)
* [Storage Configuration](https://github.com/qpoint-io/documentation/blob/main/guides/getting-started/qtap/configuration/storage-configuration.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.qpoint.io/guides/qtap-guides/observability-and-integration/capturing-all-http-traffic-with-fluent-bit.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
