Service Mesh Lite: Observability Without Sidecars

Service meshes like Istio and Linkerd provide powerful observability into microservice communication, but they come with significant operational overhead: complex deployments, sidecar proxies on every pod, added latency, and resource consumption.

What if you could get service mesh observability without the complexity?

This guide demonstrates how Qtap provides core service mesh observability features—service discovery, request tracing, error monitoring, and latency tracking—without sidecars, without control planes, and with zero latency impact.


The Service Mesh Complexity Problem

Service meshes solve real problems but introduce significant complexity:

Deployment Complexity:

  • Control plane components (istiod, pilot, mixer)

  • Sidecar injection on every pod

  • Webhook configuration and RBAC setup

  • Certificate management for mTLS

Resource Overhead:

  • Each sidecar consumes CPU/memory (multiply by number of pods)

  • Control plane resource requirements

  • Significant cluster resource increase

Latency Impact:

  • Every request goes through proxy (adds milliseconds)

  • Data plane overhead on critical path

  • Performance degradation under load

Operational Burden:

  • Complex troubleshooting (is it my app or the mesh?)

  • Version upgrades affect entire cluster

  • Learning curve for operators

Most teams want observability, not traffic control. If you don't need advanced features like circuit breaking, traffic splitting, or mTLS enforcement, you're paying a high price for what you actually use.


What Qtap Provides Instead

Qtap offers a "Service Mesh Lite" approach focused purely on observability:

✅ What You Get:

  • Service Discovery: Automatic mapping of service dependencies

  • Request Tracing: Follow requests across services

  • Error Monitoring: Track failures and error rates per service

  • Latency Metrics: Measure p50, p95, p99 response times

  • Traffic Volume: Connections per second, bytes transferred

  • Zero Instrumentation: No code changes or configuration needed

✅ How It's Different:

  • Out-of-band observation: Qtap watches traffic passively, doesn't proxy it

  • Zero latency: No proxies in the request path

  • No sidecars: One DaemonSet per node (not per pod)

  • No control plane: Just install qtap and start observing

❌ What Qtap DOESN'T Do:

  • Traffic routing or load balancing

  • Circuit breaking or retries

  • Traffic splitting (canary deployments)

  • mTLS between services

  • Rate limiting or quotas

Qtap is observation-only. If you need traffic control, use a service mesh. If you just need visibility, Qtap is simpler.


Demo Architecture

We'll demonstrate service mesh observability with a multi-service application:

Traffic Flows:

  1. Client → API Gateway → User Service: Simple service-to-service call

  2. Client → API Gateway → Order Service → httpbin.org: Service calling external API

  3. Client → API Gateway → User Service + Order Service: Fan-out to multiple services

Qtap captures ALL of this:

  • Service A calling Service B (both egress from A and ingress to B)

  • External API calls from services

  • Error responses and status codes

  • Request/response timing


Prerequisites

  • Linux host with Docker

  • jq for JSON parsing

  • Basic understanding of microservices


Part 1: Deploy the Multi-Service Application

We'll create three microservices that communicate with each other.

API Gateway Service

User Service

Order Service


Part 2: Qtap Configuration for Service Mesh Observability

Create /tmp/qtap-mesh-lite.yaml:

Key configuration points:

  1. direction: all: CRITICAL for service mesh observability. Captures both:

    • Egress: Requests leaving Service A

    • Ingress: Requests arriving at Service B

    • This gives you BOTH sides of every service-to-service call

  2. ignore_loopback: false: Required to capture localhost traffic in Docker

  3. level: summary by default: Lightweight capture (method, URL, status, timing)

  4. level: full for errors: Capture complete request/response for debugging

  5. Rulekit macros: Identify traffic by container name for filtering


Part 3: Deploy with Docker Compose

Create /tmp/docker-compose-mesh-lite.yaml:

Start the stack:

Wait for services to initialize:


Part 4: Generate Traffic and Observe

Test 1: Simple Service-to-Service Call

Output:

What Qtap captures:

  • Client → API Gateway (egress from curl)

  • API Gateway → User Service (egress from gateway, ingress to user-service)

  • Complete request chain with timing


Test 2: Service Calling External API

Output:

What Qtap captures:

  • Client → API Gateway

  • API Gateway → Order Service

  • Order Service → httpbin.org (external API call)

  • TLS inspection: Qtap sees inside HTTPS to httpbin.org before encryption


Test 3: Fan-Out Request

Output:

What Qtap captures:

  • API Gateway making parallel calls to User Service AND Order Service

  • Both service responses with timing

  • Service dependency mapping (gateway depends on both services)


Test 4: Error Scenario

Output:

What Qtap captures:

  • 404 response captured with level: full (includes headers and body)

  • Error tracking per service

  • Useful for identifying which service is failing


Part 5: Analyze Service Mesh Observability

View Qtap Logs

Example 1: Service-to-Service Call (Egress Side)

Interpretation:

  • Python process (API Gateway) made request

  • Destination: User Service (port 5001)

  • Direction: egress (leaving API Gateway)

  • Response time: 2ms


Example 2: Same Call (Ingress Side)

Interpretation:

  • Same request, different perspective

  • Direction: ingress (arriving at User Service)

  • Qtap captured BOTH sides of the call

  • This is the core of service mesh observability


Example 3: External API Call

Interpretation:

  • Order Service called external API

  • Qtap saw inside HTTPS (before TLS encryption)

  • Much slower (1949ms) than internal calls (1-2ms)

  • Useful for identifying slow external dependencies


Part 6: Service Mesh Features Demonstrated

✅ Service Discovery

Qtap automatically discovered:

  • API Gateway depends on User Service (port 5001)

  • API Gateway depends on Order Service (port 5002)

  • Order Service depends on httpbin.org (external)

No configuration required. Just observe traffic and map dependencies.


✅ Request Tracing

Follow a single request through the system:

Insight: Order Service is slow because it waits for external API. Consider caching or async processing.


✅ Error Monitoring

Errors captured with full details:

Output shows:

  • Which service returned 404 (User Service)

  • Request that caused it (GET /users/999)

  • Full response body with error message

  • Captured at level: full due to rulekit rule


✅ Latency Tracking

Calculate latency percentiles from captured data:

The grep '^{' filter keeps only the JSON capture lines so jq doesn't choke on INFO prefixes.

Analyze:

  • p50 (median): ~2ms for internal calls

  • p95: ~10ms

  • p99: ~2000ms (external API calls)

Insight: Most internal calls are fast (<10ms), but external APIs add significant latency.


Part 7: Comparison - Service Mesh vs Qtap

Feature
Service Mesh (Istio)
Qtap (Service Mesh Lite)

Observability

Service discovery

✅ Yes

✅ Yes

Request tracing

✅ Yes

✅ Yes (both ingress/egress)

Error monitoring

✅ Yes

✅ Yes (with full capture)

Latency metrics

✅ Yes

✅ Yes (per request)

Traffic volume

✅ Yes

✅ Yes

TLS inspection

✅ Yes (via mTLS)

✅ Yes (pre-encryption)

Traffic Control

Load balancing

✅ Yes

❌ No

Circuit breaking

✅ Yes

❌ No

Retries/timeouts

✅ Yes

❌ No

Traffic splitting (canary)

✅ Yes

❌ No

Security

mTLS between services

✅ Yes

❌ No

Authorization policies

✅ Yes

❌ No

Operational

Deployment complexity

⚠️ High

✅ Low (DaemonSet)

Resource overhead

⚠️ High (sidecars on every pod)

✅ Minimal (one per node)

Latency impact

⚠️ Yes (adds ~2-5ms per hop)

✅ Zero (out-of-band)

Code changes required

✅ None

✅ None

Control plane

⚠️ Required (istiod, pilot)

✅ None

Configuration complexity

⚠️ High (CRDs, policies)

✅ Low (one YAML)


When to Use Which

Use Qtap (Service Mesh Lite) When:

You need observability only

  • Understand service dependencies

  • Track errors and latency

  • Debug microservice issues

You want zero latency overhead

  • Performance-critical applications

  • High-throughput services

  • Latency-sensitive workloads

You want operational simplicity

  • Small team without mesh expertise

  • Don't want to manage control plane

  • Avoid sidecar complexity

You have resource constraints

  • Limited cluster resources

  • Cost-sensitive deployments

  • Want to minimize overhead


Use a Service Mesh When:

⚠️ You need traffic control

  • Canary deployments and traffic splitting

  • Circuit breaking for resilience

  • Retry and timeout policies

  • Advanced load balancing

⚠️ You require mTLS

  • Zero-trust network requirements

  • Compliance mandates encryption

  • Need service-to-service authentication

⚠️ You need authorization

  • Fine-grained access control between services

  • Policy-based routing

⚠️ You have dedicated platform team

  • Team can manage mesh complexity

  • Already invested in service mesh skills

  • Have operational capacity


Part 8: Production Considerations

Scaling to Many Services

This demo uses 3 services. In production with 50+ services:

Qtap scales linearly:

  • One qtap per node (DaemonSet in Kubernetes)

  • Resource usage independent of service count

  • No per-service configuration needed

Service Mesh complexity grows:

  • One sidecar per pod (100 pods = 100 sidecars)

  • Control plane resource requirements increase

  • Configuration complexity multiplies


Integration with Observability Tools

Qtap outputs JSON. Pipe to your observability stack:

Send to S3 for long-term storage:

Send events to ClickHouse/Elasticsearch:

Build dashboards:

  • Parse JSON logs to extract metrics

  • Visualize service dependencies

  • Alert on error rate increases


Kubernetes Deployment

For production Kubernetes, deploy qtap as a DaemonSet:


Use Pod Labels for Service Attribution

In Kubernetes, use pod labels to identify services:


Part 9: Cleanup


Key Takeaways

  1. Service mesh observability without service mesh complexity: Qtap provides service discovery, request tracing, error monitoring, and latency tracking without sidecars or control planes.

  2. Zero latency impact: Out-of-band observation means no proxies in the request path. Your services run at full speed.

  3. BOTH sides of every call: direction: all captures egress (leaving Service A) and ingress (arriving at Service B), giving complete visibility.

  4. TLS inspection without certificates: Qtap sees inside HTTPS at the kernel level before encryption happens.

  5. Operational simplicity: One DaemonSet per node vs sidecars on every pod. Simple YAML config vs complex CRDs and policies.

  6. Know when you need more: If you need traffic control (canary, circuit breaking) or mTLS, use a full service mesh. If you just need visibility, Qtap is simpler.


Last updated