Tracing

Qtap can export OpenTelemetry Traces that provide network-level observability of service connections using standard OpenTelemetry SDK environment variables. These traces capture the complete TCP connection lifecycle, including TLS handshake timing, protocol detection, and actual data transfer events.

circle-info

Tracing vs Event Logs - Complementary Data:

  • Traces (this page): Network-level timing - connection lifecycle, TLS handshake, data transfer events, total service response time

  • Event Logs (event stores): HTTP transaction metadata - method, URL, path, status code, duration, process context

Use both together for complete observability: traces show network timing, event logs show HTTP transaction metadata. For headers/bodies, use object stores.

When to Use Tracing

Enable tracing when you need to:

  • Monitor service response times at the network level - Track complete connection duration including TLS handshake and data transfer timing

  • Understand connection lifecycle - See when connections open, TLS handshake completes, protocol detection occurs, and data is transmitted

  • Debug network-level issues - Identify slow TLS handshakes, protocol negotiation delays, or data transfer timing problems

  • Correlate network timing with HTTP transactions - Use connection.id to link network-level traces with HTTP transaction logs

  • Monitor event store performance - Track how long qtap takes to write events to backends

Traces complement logs by showing the network-level view while logs provide the HTTP transaction details.

Enabling Tracing

Tracing is opt-in via standard OpenTelemetry environment variables. No qtap.yaml configuration changes are required.

Required Environment Variables

Optional Environment Variables

Deployment Examples

Docker

Kubernetes (Helm)

Kubernetes (DaemonSet)

What Traces Contain

Qtap exports traces showing network-level connection behavior and service response timing. These spans capture the complete TCP connection lifecycle from the kernel's perspective.

Connection Lifecycle Spans

Tracks the complete TCP connection lifecycle including network timing:

Key Insight: The span duration shows the complete service response time at the network level. For example, a request to a service with a 2-second delay shows a Connection span of ~2 seconds, with DataEvents revealing exactly when the server responded.

Use case: Monitor service response times at the network level, identify slow TLS handshakes, track protocol negotiation, and correlate network timing with HTTP transaction details using connection.id.

Event Store Operation Spans

Instrumentation Scope: qtap.eventstore

Tracks qtap's internal performance when writing events to configured backends:

Use case: Monitor event store write latency to identify slow backends or performance bottlenecks in qtap's event processing.

Sample Connection Trace

Real example from qtap v0.11.2 showing a request to https://httpbin.org/delay/2 (service with 2-second delay):

Understanding the Timing

Span Duration: 3.026 seconds (closely matches the 2-second service delay + network overhead)

Event Timeline:

  • 21:06:23.337: Connection opened (OpenEvent)

  • 21:06:23.413: TLS handshake initiated (+76ms)

  • 21:06:23.568: HTTP/2 protocol detected (+231ms total)

  • 21:06:23.568: Request data sent (108 bytes)

  • 21:06:26.362: Server response received (+2.718s - includes the 2-second delay!)

  • 21:06:26.363: Connection closed

This trace shows the complete network-level view of the service interaction. The 2.718-second gap between request and response DataEvents reveals the actual server processing time, matching the expected 2-second delay plus network latency.

Correlating Traces with Logs

Use connection.id to link network-level traces with HTTP transaction logs for complete observability:

In Traces (Network Timing):

In Event Logs (HTTP Transaction Details):

Complete Picture:

  • Trace shows: Connection took 3.026s total, TLS handshake took 76ms, server responded after 2.718s

  • Log shows: GET request to httpbin.org/delay/2 returned 200 OK in 2.794s

  • Together: Full observability of both network-level timing and HTTP transaction details

Querying Traces

Monitor Service Response Times

Identify TLS Handshake Issues

Track Protocol Detection Timing

Monitor Event Store Performance

Correlate Trace with Specific HTTP Transaction

OpenTelemetry Collector Configuration

Ensure your OTel Collector has a traces pipeline configured:

Troubleshooting

No Traces Appearing

Verify environment variables are set:

Check OTel Collector has traces pipeline:

Traces But No Event Logs

Tracing and logging are independent. Traces require environment variables, logs require event_stores configuration in qtap.yaml. Ensure both are configured if you want both.

High Trace Volume

Qtap creates Connection spans for every TCP connection it processes. To reduce volume:

  1. Use sampling in your OTel Collector:

  2. Filter by instrumentation scope:

Best Practices

  1. Use traces for network-level service monitoring - Monitor Connection span duration to track service response times at the network level, including TLS and protocol negotiation overhead

  2. Combine with logs for complete observability - Use connection.id to correlate network timing (traces) with HTTP transaction details (logs)

  3. Use sampling in production - Sample 10-20% of traces to reduce volume while maintaining visibility into network-level performance patterns

  4. Analyze DataEvent timing - Examine DataEvent timestamps to identify when servers actually respond, revealing server-side delays

  5. Monitor TLS and protocol timing - Track TLSClientHelloEvent and ProtocolEvent to identify connection establishment bottlenecks

  6. Investigate Connection span outliers - High-duration Connection spans indicate slow service responses; correlate with logs to see which endpoints/requests are affected

  7. Monitor event store latency - Track qtap.eventstore spans to identify slow event store writes

What Traces Show vs. What Logs Show

Traces, logs (events), and objects provide complementary views of the same traffic. Here's what each contains using a real httpbin.org example:

Traces Provide Network-Level Timing

Connection span showing network lifecycle (via OpenTelemetry Traces):

Shows: Connection took 3.026s, TLS handshake at +76ms, server responded at +2.7s

Logs Provide HTTP Transaction Metadata

Event showing HTTP transaction summary (via OpenTelemetry Logs to event_stores):

Shows: What happened (GET to /delay/2, returned 200 in 2.794s, from curl process) Does NOT contain: Headers or bodies

Objects Provide Sensitive Data

HTTP artifact with headers and bodies (captured by http_capture plugin, sent to object_stores like S3):

Shows: Complete HTTP details including headers and response body Stored in: S3-compatible storage for data sovereignty (keeps sensitive data in your network)

Using All Three Together

For the same request (connection.id: d3olsjo7p3qod2n6hdd0):

  • Trace (OpenTelemetry): Network timing (3.026s total, TLS handshake 76ms, server response at +2.7s)

  • Log (OpenTelemetry): What happened (GET /delay/2 returned 200 in 2.794s from curl)

  • Object (http_capture plugin): Complete details (all headers, response body with actual data)

Correlate using connection.id to get the complete picture.

Note: Traces and logs use OpenTelemetry. Objects are captured by the http_capture plugin and sent to object_stores.

Backend-Specific Notes

Jaeger

Jaeger works well with qtap traces:

Set endpoint to Jaeger's OTLP collector:

Datadog APM

Honeycomb

Grafana Tempo

Additional Resources

Last updated