Tracing
Qtap can export OpenTelemetry Traces that provide network-level observability of service connections using standard OpenTelemetry SDK environment variables. These traces capture the complete TCP connection lifecycle, including TLS handshake timing, protocol detection, and actual data transfer events.
When to Use Tracing
Enable tracing when you need to:
Monitor service response times at the network level - Track complete connection duration including TLS handshake and data transfer timing
Understand connection lifecycle - See when connections open, TLS handshake completes, protocol detection occurs, and data is transmitted
Debug network-level issues - Identify slow TLS handshakes, protocol negotiation delays, or data transfer timing problems
Correlate network timing with HTTP transactions - Use
connection.id
to link network-level traces with HTTP transaction logsMonitor event store performance - Track how long qtap takes to write events to backends
Traces complement logs by showing the network-level view while logs provide the HTTP transaction details.
Enabling Tracing
Tracing is opt-in via standard OpenTelemetry environment variables. No qtap.yaml configuration changes are required.
Required Environment Variables
OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_PROTOCOL=grpc # or 'http'
Optional Environment Variables
# OTLP endpoint (defaults to localhost:4317 for gRPC, localhost:4318 for HTTP)
OTEL_EXPORTER_OTLP_ENDPOINT=http://your-collector:4317
# TLS configuration (if needed)
OTEL_EXPORTER_OTLP_INSECURE=false
Deployment Examples
Docker
docker run -d --name qtap \
--user 0:0 --privileged \
--cap-add CAP_BPF --cap-add CAP_SYS_ADMIN \
--pid=host --network=host \
-v /sys:/sys \
-v /var/run/docker.sock:/var/run/docker.sock \
-v "$(pwd)/qtap.yaml:/app/config/qtap.yaml" \
-e TINI_SUBREAPER=1 \
-e OTEL_TRACES_EXPORTER=otlp \
-e OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
--ulimit=memlock=-1 \
us-docker.pkg.dev/qpoint-edge/public/qtap:v0 \
--config="/app/config/qtap.yaml"
Kubernetes (Helm)
extraEnv:
- name: OTEL_TRACES_EXPORTER
value: "otlp"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "grpc"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector.monitoring.svc.cluster.local:4317"
Kubernetes (DaemonSet)
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: qtap
spec:
template:
spec:
containers:
- name: qtap
image: us-docker.pkg.dev/qpoint-edge/public/qtap:v0
env:
- name: OTEL_TRACES_EXPORTER
value: "otlp"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "grpc"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector.monitoring.svc.cluster.local:4317"
# ... other config
What Traces Contain
Qtap exports traces showing network-level connection behavior and service response timing. These spans capture the complete TCP connection lifecycle from the kernel's perspective.
Connection Lifecycle Spans
Tracks the complete TCP connection lifecycle including network timing:
Span Name: Connection
Attributes:
- connection.id: Unique connection identifier (correlates with logs)
- connection.cookie: Internal tracking ID
- connection.handler: Handler type (e.g., "raw")
- connection.strategy: "observe"
Events:
- OpenEvent: Connection established (timestamp)
- TLSClientHelloEvent: TLS handshake initiated (timestamp + duration from open)
- ProtocolEvent: L7 protocol detected (e.g., http2) (timestamp)
- DataEvent: Network data transmitted (timestamp, byte count)
- CloseEvent: Connection closed (timestamp)
Duration: Total connection lifetime (matches service response time)
Key Insight: The span duration shows the complete service response time at the network level. For example, a request to a service with a 2-second delay shows a Connection span of ~2 seconds, with DataEvents revealing exactly when the server responded.
Use case: Monitor service response times at the network level, identify slow TLS handshakes, track protocol negotiation, and correlate network timing with HTTP transaction details using connection.id
.
Event Store Operation Spans
Instrumentation Scope: qtap.eventstore
Tracks qtap's internal performance when writing events to configured backends:
Span Name: (varies by operation)
Duration: Time qtap spent writing to event stores
Use case: Monitor event store write latency to identify slow backends or performance bottlenecks in qtap's event processing.
Sample Connection Trace
Real example from qtap v0.11.2 showing a request to https://httpbin.org/delay/2
(service with 2-second delay):
{
"traceId": "2897161d955cf0cc50bb54778a15e99f",
"spanId": "8388bab4ebafb20f",
"name": "Connection",
"kind": "Internal",
"startTime": "2025-10-16T21:06:23.337719776Z",
"endTime": "2025-10-16T21:06:26.363734867Z",
"attributes": {
"connection.id": "d3olsjo7p3qod2n6hdd0",
"connection.cookie": 479863,
"connection.handler": "raw",
"connection.strategy": "observe"
},
"events": [
{
"name": "OpenEvent",
"timestamp": "2025-10-16T21:06:23.337922855Z"
},
{
"name": "TLSClientHelloEvent",
"timestamp": "2025-10-16T21:06:23.4135878Z"
},
{
"name": "ProtocolEvent",
"timestamp": "2025-10-16T21:06:23.568508654Z",
"attributes": { "protocol": "http2" }
},
{
"name": "DataEvent",
"timestamp": "2025-10-16T21:06:23.568554984Z",
"attributes": { "bytes": 108 }
},
{
"name": "DataEvent",
"timestamp": "2025-10-16T21:06:26.362644285Z",
"attributes": { "bytes": 436 }
},
{
"name": "CloseEvent",
"timestamp": "2025-10-16T21:06:26.363614848Z"
}
],
"resource": {
"service.name": "tap",
"service.namespace": "qpoint",
"service.version": "v0.11.2",
"service.instance.id": "qpoint"
},
"scope": {
"name": "github.com/qpoint-io/qtap/pkg/connection"
}
}
Understanding the Timing
Span Duration: 3.026 seconds (closely matches the 2-second service delay + network overhead)
Event Timeline:
21:06:23.337: Connection opened (OpenEvent)
21:06:23.413: TLS handshake initiated (+76ms)
21:06:23.568: HTTP/2 protocol detected (+231ms total)
21:06:23.568: Request data sent (108 bytes)
21:06:26.362: Server response received (+2.718s - includes the 2-second delay!)
21:06:26.363: Connection closed
This trace shows the complete network-level view of the service interaction. The 2.718-second gap between request and response DataEvents reveals the actual server processing time, matching the expected 2-second delay plus network latency.
Correlating Traces with Logs
Use connection.id
to link network-level traces with HTTP transaction logs for complete observability:
In Traces (Network Timing):
{
"name": "Connection",
"attributes": {
"connection.id": "d3olsjo7p3qod2n6hdd0"
},
"startTime": "2025-10-16T21:06:23.337719776Z",
"endTime": "2025-10-16T21:06:26.363734867Z",
"events": [
{ "name": "OpenEvent", "timestamp": "2025-10-16T21:06:23.337922855Z" },
{ "name": "TLSClientHelloEvent", "timestamp": "2025-10-16T21:06:23.4135878Z" },
{ "name": "DataEvent", "timestamp": "2025-10-16T21:06:26.362644285Z" }
]
}
In Event Logs (HTTP Transaction Details):
{
"event.type": "artifact_record",
"event": {
"summary.connection_id": "d3olsjo7p3qod2n6hdd0",
"summary.request_method": "GET",
"summary.request_host": "httpbin.org",
"summary.request_path": "/delay/2",
"summary.response_status": 200,
"summary.duration_ms": 2794
}
}
Complete Picture:
Trace shows: Connection took 3.026s total, TLS handshake took 76ms, server responded after 2.718s
Log shows: GET request to httpbin.org/delay/2 returned 200 OK in 2.794s
Together: Full observability of both network-level timing and HTTP transaction details
Querying Traces
Monitor Service Response Times
// Find slow service responses (> 1 second at network level)
service.name="tap" AND span.name="Connection" AND span.duration > 1000
Identify TLS Handshake Issues
// Find connections with slow TLS handshakes (> 100ms)
service.name="tap" AND span.name="Connection" AND span.events.name="TLSClientHelloEvent"
// Then examine event timestamps to calculate TLS handshake duration
Track Protocol Detection Timing
// Find connections where protocol detection took time
service.name="tap" AND span.name="Connection" AND span.events.name="ProtocolEvent"
Monitor Event Store Performance
// Track event store write latency
instrumentation.scope="qtap.eventstore"
Correlate Trace with Specific HTTP Transaction
// Find network-level trace for specific HTTP transaction
service.name="tap" AND span.name="Connection" AND attributes.connection.id="d3olsjo7p3qod2n6hdd0"
// Then correlate with logs using the same connection.id
OpenTelemetry Collector Configuration
Ensure your OTel Collector has a traces pipeline configured:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 10s
exporters:
otlphttp:
endpoint: "https://your-backend.com/v1/traces"
service:
pipelines:
traces: # Required for qtap tracing
receivers: [otlp]
processors: [batch]
exporters: [otlphttp]
Troubleshooting
No Traces Appearing
Verify environment variables are set:
# Docker
docker inspect qtap | grep -A 5 OTEL
# Kubernetes
kubectl describe pod qtap-xxxxx | grep -A 5 OTEL
Check OTel Collector has traces pipeline:
# Should see "traces" in pipelines section
kubectl logs -n monitoring otel-collector-xxxxx | grep pipeline
Traces But No Event Logs
Tracing and logging are independent. Traces require environment variables, logs require event_stores configuration in qtap.yaml. Ensure both are configured if you want both.
High Trace Volume
Qtap creates Connection spans for every TCP connection it processes. To reduce volume:
Use sampling in your OTel Collector:
processors: probabilistic_sampler: sampling_percentage: 10 # Sample 10% of traces
Filter by instrumentation scope:
processors: filter/traces: traces: exclude: match_type: strict instrumentation_libraries: - name: qtap.eventstore # Exclude event store traces
Best Practices
Use traces for network-level service monitoring - Monitor Connection span duration to track service response times at the network level, including TLS and protocol negotiation overhead
Combine with logs for complete observability - Use
connection.id
to correlate network timing (traces) with HTTP transaction details (logs)Use sampling in production - Sample 10-20% of traces to reduce volume while maintaining visibility into network-level performance patterns
Analyze DataEvent timing - Examine DataEvent timestamps to identify when servers actually respond, revealing server-side delays
Monitor TLS and protocol timing - Track TLSClientHelloEvent and ProtocolEvent to identify connection establishment bottlenecks
Investigate Connection span outliers - High-duration Connection spans indicate slow service responses; correlate with logs to see which endpoints/requests are affected
Monitor event store latency - Track
qtap.eventstore
spans to identify slow event store writes
What Traces Show vs. What Logs Show
Traces, logs (events), and objects provide complementary views of the same traffic. Here's what each contains using a real httpbin.org example:
Traces Provide Network-Level Timing
Connection span showing network lifecycle (via OpenTelemetry Traces):
{
"name": "Connection",
"startTime": "2025-10-16T21:06:23.337719776Z",
"endTime": "2025-10-16T21:06:26.363734867Z",
"attributes": {
"connection.id": "d3olsjo7p3qod2n6hdd0"
},
"events": [
{"name": "OpenEvent", "timestamp": "2025-10-16T21:06:23.337922855Z"},
{"name": "TLSClientHelloEvent", "timestamp": "2025-10-16T21:06:23.4135878Z"},
{"name": "ProtocolEvent", "timestamp": "2025-10-16T21:06:23.568508654Z"},
{"name": "DataEvent", "timestamp": "2025-10-16T21:06:26.362644285Z"}
]
}
Shows: Connection took 3.026s, TLS handshake at +76ms, server responded at +2.7s
Logs Provide HTTP Transaction Metadata
Event showing HTTP transaction summary (via OpenTelemetry Logs to event_stores):
{
"event.type": "artifact_record",
"event": {
"summary.connection_id": "d3olsjo7p3qod2n6hdd0",
"summary.request_method": "GET",
"summary.request_host": "httpbin.org",
"summary.request_path": "/delay/2",
"summary.response_status": 200,
"summary.duration_ms": 2794,
"process.exe": "/usr/bin/curl"
}
}
Shows: What happened (GET to /delay/2, returned 200 in 2.794s, from curl process) Does NOT contain: Headers or bodies
Objects Provide Sensitive Data
HTTP artifact with headers and bodies (captured by http_capture plugin, sent to object_stores like S3):
{
"connection_id": "d3olsjo7p3qod2n6hdd0",
"request": {
"method": "GET",
"url": "https://httpbin.org/delay/2",
"headers": {
":authority": "httpbin.org",
":method": "GET",
":path": "/delay/2",
"User-Agent": "curl/8.14.1"
}
},
"response": {
"status": 200,
"headers": {
":status": "200",
"Content-Type": "application/json",
"Content-Length": "305",
"Server": "gunicorn/19.9.0"
},
"body": "{\"args\": {}, \"headers\": {...}, \"origin\": \"73.71.138.108\"}"
}
}
Shows: Complete HTTP details including headers and response body Stored in: S3-compatible storage for data sovereignty (keeps sensitive data in your network)
Using All Three Together
For the same request (connection.id: d3olsjo7p3qod2n6hdd0
):
Trace (OpenTelemetry): Network timing (3.026s total, TLS handshake 76ms, server response at +2.7s)
Log (OpenTelemetry): What happened (GET /delay/2 returned 200 in 2.794s from curl)
Object (http_capture plugin): Complete details (all headers, response body with actual data)
Correlate using connection.id
to get the complete picture.
Note: Traces and logs use OpenTelemetry. Objects are captured by the http_capture plugin and sent to object_stores.
Backend-Specific Notes
Jaeger
Jaeger works well with qtap traces:
# Deploy Jaeger with OTLP support
kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/latest/download/jaeger-operator.yaml
Set endpoint to Jaeger's OTLP collector:
OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger-collector:4317
Datadog APM
OTEL_EXPORTER_OTLP_ENDPOINT=https://trace.agent.datadoghq.com
# Note: Datadog also requires API key via DD-API-KEY header
Honeycomb
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=YOUR_API_KEY,x-honeycomb-dataset=qtap-traces
Grafana Tempo
OTEL_EXPORTER_OTLP_ENDPOINT=https://tempo-prod-us-central-0.grafana.net:443
# Requires authentication via headers
Related Documentation
Storage Configuration - Configure event stores for HTTP transaction logs (primary observability data)
Sending Qtap Events to OpenTelemetry - Complete OTLP integration guide for logs
Metrics - Prometheus metrics for monitoring HTTP traffic
Additional Resources
Last updated