Fluent Bit Batching
Capturing All HTTP Traffic with Fluent Bit
Overview
This guide shows you how to capture every HTTP request and response from Qtap and route it to storage backends using Fluent Bit. This pattern is ideal for comprehensive observability, troubleshooting, audit trails, and compliance requirements where you need complete traffic capture.
Why Fluent Bit?
While Qtap can write directly to S3-compatible object stores, Fluent Bit provides critical advantages at scale:
Performance: Batches and buffers writes to reduce API calls
Reliability: Built-in retry logic, buffering, and backpressure handling
Flexibility: Route traffic to multiple destinations (S3, CloudWatch, Elasticsearch, etc.)
Filtering: Apply additional filtering, enrichment, or transformation
Cost optimization: Batch uploads reduce S3 API costs significantly
Architecture
┌──────────┐ forward ┌──────────────┐ parse ┌──────────┐
│ Qtap │──────────▸│ Fluent Bit │─────────▸│ Filter & │
│ (eBPF) │ port │ (Batching) │ │ Route │
└──────────┘ 24224 └──────────────┘ └────┬─────┘
│
┌──────────────┴──────────────┐
▼ ▼
┌──────────┐ ┌──────────┐
│ S3 │ │ Stdout / │
│ (MinIO, │ │ Other │
│ AWS, GCS)│ └──────────┘
└──────────┘How it works:
Qtap captures HTTP traffic using eBPF (TLS inspection, no proxies)
Qtap writes HTTP transaction objects as structured logs
Docker forwards logs to Fluent Bit using the fluentd log driver
Fluent Bit parses JSON, filters, and tags HTTP transactions
Fluent Bit batches, buffers, and routes to storage backends
Set TTL policies on storage (recommended: 90 days)
Docker Deployment
This setup uses Docker's fluentd log driver to forward logs directly to Fluent Bit over the network. This approach is simpler and more reliable than tailing log files.
Step 1: Create Qtap Configuration
Create qtap-config.yaml:
Step 2: Create Fluent Bit Configuration
Create fluent-bit.conf:
Create parsers.conf:
Step 3: Create Docker Compose
Create docker-compose.yaml:
Step 4: Start and Validate
Expected output (one line per HTTP transaction):
Production Outputs
AWS S3
Replace the stdout output in fluent-bit.conf with:
Add environment variables to the fluent-bit service in docker-compose:
Set lifecycle policy:
MinIO (S3-Compatible)
AWS CloudWatch Logs
Set retention policy:
Multiple Destinations
You can send data to multiple outputs simultaneously:
Filtering and Optimization
Capture Only Errors (4xx/5xx)
To reduce volume, capture only failed requests using Qtap's Rulekit:
Filter by Domain
Capture only specific domains:
Exclude Noisy Processes
Filter in Fluent Bit
You can also filter within Fluent Bit using the grep filter:
When you aggregate logs with Docker's forward log driver, Fluent Bit surfaces the channel as source instead of stream, so update the regex to Regex source stdout in that deployment model.
Monitoring and Troubleshooting
Verify Qtap is Capturing Traffic
What to look for:
"exe": "/usr/bin/curl"- Process identified correctly"protocol": "http2"or"http1"- NOT "other""is_tls": true- TLS detected"tlsProbeTypesDetected": ["openssl"]- TLS library hookedFull HTTP details visible despite HTTPS
Verify Fluent Bit is Processing
Common Issues
No HTTP objects captured:
Qtap must be running before traffic is generated
Wait 5-10 seconds after starting qtap for eBPF initialization
Check qtap logs for connection events
Verify processes are using supported TLS libraries (OpenSSL, BoringSSL, GnuTLS)
Fluent Bit not receiving logs:
Check that Fluent Bit started before Qtap
Verify port 24224 is accessible from Qtap container
Test connectivity:
docker exec qtap nc -zv 127.0.0.1 24224Check Fluent Bit logs for connection messages
If you see
Error binding socket, remove any leftover Fluent Bit containers that already bound to port 24224 and re-rundocker compose up -d
No HTTP transactions in Fluent Bit output:
Qtap writes both HTTP transaction JSON and operational messages to stdout
The grep filter
Regex log .*"metadata":.*ensures only HTTP transactions are capturedVerify HTTP transactions reached Fluent Bit:
docker logs fluent-bit | grep '"metadata"'If you see records without
"metadata", those are operational logs that should be filtered outMake sure the grep filter runs before the parser filter or leave
Preserve_Key On; otherwise thelogfield will disappear before the match runs
High memory usage:
Reduce Fluent Bit
Flushinterval (more frequent uploads)Adjust
total_file_sizefor S3 batching (smaller = more frequent uploads)Add filtering to reduce captured volume
Enable compression for outputs
S3 upload failures:
Verify AWS credentials are correct
Check IAM permissions (s3:PutObject required)
Ensure bucket exists and is in the correct region
Check network connectivity to S3 endpoint
Alternative: File Tailing Approach
If you have existing logging infrastructure or cannot use the forward protocol, you can tail Docker's log files directly:
fluent-bit.conf:
docker-compose.yaml changes:
Ensure your parsers.conf file includes both the built-in docker parser and the generic_json_parser definition so the tail input can decode Docker log metadata before emitting HTTP transactions.
This approach requires more complex path management and may have permission issues. The forward protocol approach is recommended.
Best Practices
Start Small: Begin with error-only capture or specific domains, then expand
Set TTLs: Always configure lifecycle policies (90 days recommended)
Monitor Volume: Track storage growth and adjust filtering as needed
Use IAM Roles: Never hardcode S3 credentials in production
Compress: Enable gzip compression for S3 uploads
Batch Uploads: Use appropriate
total_file_size(50M default)Test Filtering: Validate filters match expected objects
Health Checks: Monitor Fluent Bit metrics and error logs
Backup Config: Version control all configuration files
Security: Limit access to logs - they contain sensitive data
Summary
This guide demonstrated a validated deployment pattern for capturing all HTTP traffic with Fluent Bit using Docker:
Docker Deployment: Qtap stdout → Forward protocol → Fluent Bit → S3/CloudWatch
This pattern provides:
✅ Complete HTTP capture (headers + bodies)
✅ TLS inspection without proxies
✅ Batched, buffered writes for performance
✅ Flexible routing to multiple destinations
✅ Data sovereignty (sensitive data stays in your infrastructure)
For questions or advanced configurations, see:
Last updated