Service Mesh Lite: Observability Without Sidecars

Service meshes like Istio and Linkerd provide powerful observability into microservice communication, but they come with significant operational overhead: complex deployments, sidecar proxies on every pod, added latency, and resource consumption.

What if you could get service mesh observability without the complexity?

This guide demonstrates how Qtap provides core service mesh observability features—service discovery, request tracing, error monitoring, and latency tracking—without sidecars, without control planes, and with zero latency impact.

The Service Mesh Complexity Problem

Service meshes solve real problems but introduce significant complexity:

Deployment Complexity:

Control plane components (istiod, pilot, mixer)
Sidecar injection on every pod
Webhook configuration and RBAC setup
Certificate management for mTLS

Resource Overhead:

Each sidecar consumes CPU/memory (multiply by number of pods)
Control plane resource requirements
Significant cluster resource increase

Latency Impact:

Every request goes through proxy (adds milliseconds)
Data plane overhead on critical path
Performance degradation under load

Operational Burden:

Complex troubleshooting (is it my app or the mesh?)
Version upgrades affect entire cluster
Learning curve for operators

Most teams want observability, not traffic control. If you don't need advanced features like circuit breaking, traffic splitting, or mTLS enforcement, you're paying a high price for what you actually use.

What Qtap Provides Instead

Qtap offers a "Service Mesh Lite" approach focused purely on observability:

✅ What You Get:

Service Discovery: Automatic mapping of service dependencies
Request Tracing: Follow requests across services
Error Monitoring: Track failures and error rates per service
Latency Metrics: Measure p50, p95, p99 response times
Traffic Volume: Connections per second, bytes transferred
Zero Instrumentation: No code changes or configuration needed

✅ How It's Different:

Out-of-band observation: Qtap watches traffic passively, doesn't proxy it
Zero latency: No proxies in the request path
No sidecars: One DaemonSet per node (not per pod)
No control plane: Just install qtap and start observing

❌ What Qtap DOESN'T Do:

Traffic routing or load balancing
Circuit breaking or retries
Traffic splitting (canary deployments)
mTLS between services
Rate limiting or quotas

Qtap is observation-only. If you need traffic control, use a service mesh. If you just need visibility, Qtap is simpler.

Demo Architecture

We'll demonstrate service mesh observability with a multi-service application:

                    ┌─────────────┐
         ┌─────────→│  Frontend   │
         │          │  (Client)   │
         │          └──────┬──────┘
         │                 │
         │                 ↓
    [External]      ┌─────────────┐
    [Request]       │     API     │ (Port 5000)
                    │   Gateway   │
                    └──────┬──────┘
                           │
                    ┌──────┴──────┐
                    ↓             ↓
             ┌─────────────┐  ┌─────────────┐
             │   User      │  │   Order     │
             │  Service    │  │  Service    │
             │ (Port 5001) │  │ (Port 5002) │
             └─────────────┘  └──────┬──────┘
                                     │
                                     ↓
                              ┌─────────────┐
                              │  httpbin.org│ (External API)
                              └─────────────┘

Traffic Flows:

Client → API Gateway → User Service: Simple service-to-service call
Client → API Gateway → Order Service → httpbin.org: Service calling external API
Client → API Gateway → User Service + Order Service: Fan-out to multiple services

Qtap captures ALL of this:

Service A calling Service B (both egress from A and ingress to B)
External API calls from services
Error responses and status codes
Request/response timing

Prerequisites

Linux host with Docker
jq for JSON parsing
Basic understanding of microservices

Part 1: Deploy the Multi-Service Application

We'll create three microservices that communicate with each other.

API Gateway Service

#!/usr/bin/env python3
"""
API Gateway - Routes requests to backend services
"""
from flask import Flask, jsonify
import requests
import os

app = Flask(__name__)

USER_SERVICE_URL = os.environ.get('USER_SERVICE_URL', 'http://localhost:5001')
ORDER_SERVICE_URL = os.environ.get('ORDER_SERVICE_URL', 'http://localhost:5002')

@app.route('/health')
def health():
    return jsonify({'status': 'healthy', 'service': 'api-gateway'})

@app.route('/api/users/<user_id>')
def get_user(user_id):
    """Proxy request to user service"""
    response = requests.get(f'{USER_SERVICE_URL}/users/{user_id}', timeout=5)
    return jsonify(response.json()), response.status_code

@app.route('/api/orders/<user_id>')
def get_orders(user_id):
    """Proxy request to order service"""
    response = requests.get(f'{ORDER_SERVICE_URL}/orders/{user_id}', timeout=5)
    return jsonify(response.json()), response.status_code

@app.route('/api/user-with-orders/<user_id>')
def get_user_with_orders(user_id):
    """Fan-out request to both services"""
    user_response = requests.get(f'{USER_SERVICE_URL}/users/{user_id}', timeout=5)
    orders_response = requests.get(f'{ORDER_SERVICE_URL}/orders/{user_id}', timeout=5)

    return jsonify({
        'user': user_response.json(),
        'orders': orders_response.json()
    }), 200

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

User Service

#!/usr/bin/env python3
"""
User Service - Returns user data
"""
from flask import Flask, jsonify

app = Flask(__name__)

USERS = {
    '1': {'id': '1', 'name': 'Alice', 'email': '[email protected]'},
    '2': {'id': '2', 'name': 'Bob', 'email': '[email protected]'},
    '3': {'id': '3', 'name': 'Charlie', 'email': '[email protected]'},
}

@app.route('/health')
def health():
    return jsonify({'status': 'healthy', 'service': 'user-service'})

@app.route('/users/<user_id>')
def get_user(user_id):
    if user_id == '999':  # Simulate error
        return jsonify({'error': 'User not found', 'service': 'user-service'}), 404

    user = USERS.get(user_id)
    if user:
        return jsonify(user), 200
    else:
        return jsonify({'error': 'User not found', 'service': 'user-service'}), 404

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5001)

Order Service

#!/usr/bin/env python3
"""
Order Service - Returns orders and calls external payment API
"""
from flask import Flask, jsonify
import requests

app = Flask(__name__)

ORDERS = {
    '1': [
        {'id': 'order-101', 'user_id': '1', 'total': 99.99, 'status': 'completed'},
        {'id': 'order-102', 'user_id': '1', 'total': 49.99, 'status': 'pending'},
    ],
    '2': [
        {'id': 'order-201', 'user_id': '2', 'total': 199.99, 'status': 'completed'},
    ],
    '3': [],
}

@app.route('/health')
def health():
    return jsonify({'status': 'healthy', 'service': 'order-service'})

@app.route('/orders/<user_id>')
def get_orders(user_id):
    """Get orders and verify with external payment API"""
    try:
        # Call external API (httpbin for demo)
        payment_response = requests.get(
            'https://httpbin.org/json',
            timeout=5,
            headers={'X-User-ID': user_id}
        )
        payment_status = 'payment-api-ok' if payment_response.status_code == 200 else 'payment-api-error'
    except Exception as e:
        payment_status = f'payment-api-error: {str(e)}'

    orders = ORDERS.get(user_id, [])
    return jsonify({
        'user_id': user_id,
        'orders': orders,
        'payment_api_status': payment_status,
        'service': 'order-service'
    }), 200

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5002)

Part 2: Qtap Configuration for Service Mesh Observability

Create /tmp/qtap-mesh-lite.yaml:

version: 2

# Service Mesh Lite Configuration
# Captures all service-to-service traffic for observability

services:
  event_stores:
    - type: stdout
  object_stores:
    - type: stdout

rulekit:
  macros:
    - name: is_error
      expr: http.res.status >= 400 && http.res.status < 600
    - name: is_api_gateway
      expr: src.container.name == "api-gateway"
    - name: is_user_service
      expr: src.container.name == "user-service"
    - name: is_order_service
      expr: src.container.name == "order-service"

stacks:
  mesh_observability:
    plugins:
      - type: http_capture
        config:
          level: summary  # (none|summary|details|full) - Lightweight by default
          format: json    # (json|text) - JSON for easy parsing
          rules:
            # Capture full details for errors to debug failures
            - name: "Service errors"
              expr: is_error()
              level: full

            # Capture details for all API gateway traffic
            - name: "API Gateway traffic"
              expr: is_api_gateway()
              level: details

tap:
  direction: all          # (egress|egress-external|egress-internal|ingress|all)
                          # CRITICAL: Use 'all' to see BOTH sides of service calls
  ignore_loopback: false  # (true|false) - MUST be false to see localhost traffic!
  audit_include_dns: false # (true|false) - Skip DNS noise

  http:
    stack: mesh_observability

  # Filter out qtap's own traffic
  filters:
    groups:
      - qpoint

Key configuration points:

direction: all: CRITICAL for service mesh observability. Captures both:
- Egress: Requests leaving Service A
- Ingress: Requests arriving at Service B
- This gives you BOTH sides of every service-to-service call
ignore_loopback: false: Required to capture localhost traffic in Docker
level: summary by default: Lightweight capture (method, URL, status, timing)
level: full for errors: Capture complete request/response for debugging
Rulekit macros: Identify traffic by container name for filtering

Part 3: Deploy with Docker Compose

Create /tmp/docker-compose-mesh-lite.yaml:

version: '3.9'

services:
  # API Gateway - Entry point
  api-gateway:
    image: python:3.11-slim
    container_name: api-gateway
    working_dir: /app
    network_mode: host  # Required for qtap to see localhost traffic
    command: >
      bash -c "pip install -q flask requests &&
               python api-gateway.py"
    volumes:
      - /tmp/api-gateway.py:/app/api-gateway.py
    environment:
      - USER_SERVICE_URL=http://localhost:5001
      - ORDER_SERVICE_URL=http://localhost:5002

  # User Service
  user-service:
    image: python:3.11-slim
    container_name: user-service
    working_dir: /app
    network_mode: host
    command: >
      bash -c "pip install -q flask &&
               python user-service.py"
    volumes:
      - /tmp/user-service.py:/app/user-service.py

  # Order Service
  order-service:
    image: python:3.11-slim
    container_name: order-service
    working_dir: /app
    network_mode: host
    command: >
      bash -c "pip install -q flask requests &&
               python order-service.py"
    volumes:
      - /tmp/order-service.py:/app/order-service.py

  # Qtap - Traffic capture
  qtap:
    image: us-docker.pkg.dev/qpoint-edge/public/qtap:v0
    container_name: qtap-mesh-lite
    privileged: true
    user: "0:0"
    cap_add:
      - CAP_BPF
      - CAP_SYS_ADMIN
    pid: host
    network_mode: host
    volumes:
      - /sys:/sys
      - /var/run/docker.sock:/var/run/docker.sock
      - /tmp/qtap-mesh-lite.yaml:/app/config/qtap.yaml
    environment:
      - TINI_SUBREAPER=1
    ulimits:
      memlock: -1
    command:
      - --log-level=info
      - --log-encoding=console
      - --config=/app/config/qtap.yaml

Start the stack:

docker compose -f /tmp/docker-compose-mesh-lite.yaml up -d

Wait for services to initialize:

sleep 10

Part 4: Generate Traffic and Observe

Test 1: Simple Service-to-Service Call

curl -s http://localhost:5000/api/users/1 | jq .

Output:

{
  "email": "[email protected]",
  "id": "1",
  "name": "Alice"
}

What Qtap captures:

Client → API Gateway (egress from curl)
API Gateway → User Service (egress from gateway, ingress to user-service)
Complete request chain with timing

Test 2: Service Calling External API

curl -s http://localhost:5000/api/orders/1 | jq .

Output:

{
  "orders": [
    {
      "id": "order-101",
      "status": "completed",
      "total": 99.99,
      "user_id": "1"
    },
    {
      "id": "order-102",
      "status": "pending",
      "total": 49.99,
      "user_id": "1"
    }
  ],
  "payment_api_status": "payment-api-ok",
  "service": "order-service",
  "user_id": "1"
}

What Qtap captures:

Client → API Gateway
API Gateway → Order Service
Order Service → httpbin.org (external API call)
TLS inspection: Qtap sees inside HTTPS to httpbin.org before encryption

Test 3: Fan-Out Request

curl -s http://localhost:5000/api/user-with-orders/2 | jq .

Output:

{
  "orders": {
    "orders": [
      {
        "id": "order-201",
        "status": "completed",
        "total": 199.99,
        "user_id": "2"
      }
    ],
    "payment_api_status": "payment-api-ok",
    "service": "order-service",
    "user_id": "2"
  },
  "user": {
    "email": "[email protected]",
    "id": "2",
    "name": "Bob"
  }
}

What Qtap captures:

API Gateway making parallel calls to User Service AND Order Service
Both service responses with timing
Service dependency mapping (gateway depends on both services)

Test 4: Error Scenario

curl -s http://localhost:5000/api/users/999 | jq .

Output:

{
  "error": "User not found",
  "service": "user-service"
}

What Qtap captures:

404 response captured with level: full (includes headers and body)
Error tracking per service
Useful for identifying which service is failing

Part 5: Analyze Service Mesh Observability

View Qtap Logs

docker logs qtap-mesh-lite 2>&1 | grep '"method"' | jq .

Example 1: Service-to-Service Call (Egress Side)

{
  "metadata": {
    "process_exe": "/usr/local/bin/python3.11",
    "connection_id": "abc123"
  },
  "request": {
    "method": "GET",
    "url": "http://localhost:5001/users/1",
    "authority": "localhost:5001",
    "protocol": "http1",
    "user_agent": "python-requests/2.32.5"
  },
  "response": {
    "status": 200,
    "content_type": "application/json"
  },
  "direction": "egress-external",
  "duration_ms": 2
}

Interpretation:

Python process (API Gateway) made request
Destination: User Service (port 5001)
Direction: egress (leaving API Gateway)
Response time: 2ms

Example 2: Same Call (Ingress Side)

{
  "metadata": {
    "process_exe": "/usr/local/bin/python3.11",
    "connection_id": "abc124"
  },
  "request": {
    "method": "GET",
    "url": "http://localhost:5001/users/1",
    "authority": "localhost:5001",
    "protocol": "http1"
  },
  "response": {
    "status": 200
  },
  "direction": "ingress",
  "duration_ms": 1
}

Interpretation:

Same request, different perspective
Direction: ingress (arriving at User Service)
Qtap captured BOTH sides of the call
This is the core of service mesh observability

Example 3: External API Call

{
  "metadata": {
    "process_exe": "/usr/local/bin/python3.11",
    "endpoint_id": "httpbin.org",
    "is_tls": true
  },
  "request": {
    "method": "GET",
    "url": "https://httpbin.org/json",
    "authority": "httpbin.org",
    "protocol": "http1"
  },
  "response": {
    "status": 200,
    "content_type": "application/json"
  },
  "direction": "egress-external",
  "duration_ms": 1949
}

Interpretation:

Order Service called external API
Qtap saw inside HTTPS (before TLS encryption)
Much slower (1949ms) than internal calls (1-2ms)
Useful for identifying slow external dependencies

Part 6: Service Mesh Features Demonstrated

✅ Service Discovery

Qtap automatically discovered:

API Gateway depends on User Service (port 5001)
API Gateway depends on Order Service (port 5002)
Order Service depends on httpbin.org (external)

No configuration required. Just observe traffic and map dependencies.

✅ Request Tracing

Follow a single request through the system:

1. Client → API Gateway [2ms]
   GET /api/user-with-orders/2

2. API Gateway → User Service [1ms]
   GET /users/2
   Response: {"id": "2", "name": "Bob"}

3. API Gateway → Order Service [5270ms]  ← Slow!
   GET /orders/2

4. Order Service → httpbin.org [1949ms]
   GET https://httpbin.org/json

Total: ~7.2 seconds

Insight: Order Service is slow because it waits for external API. Consider caching or async processing.

✅ Error Monitoring

Errors captured with full details:

docker logs qtap-mesh-lite 2>&1 | grep '"status":404' | jq .

Output shows:

Which service returned 404 (User Service)
Request that caused it (GET /users/999)
Full response body with error message
Captured at level: full due to rulekit rule

✅ Latency Tracking

Calculate latency percentiles from captured data:

docker logs qtap-mesh-lite 2>&1 | \
  grep '^{' | \
  jq -r 'select(has("duration_ms")) | .duration_ms' | \
  sort -n

The grep '^{' filter keeps only the JSON capture lines so jq doesn't choke on INFO prefixes.

Analyze:

p50 (median): ~2ms for internal calls
p95: ~10ms
p99: ~2000ms (external API calls)

Insight: Most internal calls are fast (<10ms), but external APIs add significant latency.

Part 7: Comparison - Service Mesh vs Qtap

Feature

Service Mesh (Istio)

Qtap (Service Mesh Lite)

Observability

Service discovery

✅ Yes

Request tracing

✅ Yes

✅ Yes (both ingress/egress)

Error monitoring

✅ Yes

✅ Yes (with full capture)

Latency metrics

✅ Yes

✅ Yes (per request)

Traffic volume

✅ Yes

TLS inspection

✅ Yes (via mTLS)

✅ Yes (pre-encryption)

Traffic Control

Load balancing

✅ Yes

❌ No

Circuit breaking

✅ Yes

❌ No

Retries/timeouts

✅ Yes

❌ No

Traffic splitting (canary)

✅ Yes

❌ No

Security

mTLS between services

✅ Yes

❌ No

Authorization policies

✅ Yes

❌ No

Operational

Deployment complexity

⚠️ High

✅ Low (DaemonSet)

Resource overhead

⚠️ High (sidecars on every pod)

✅ Minimal (one per node)

Latency impact

⚠️ Yes (adds ~2-5ms per hop)

✅ Zero (out-of-band)

Code changes required

✅ None

Control plane

⚠️ Required (istiod, pilot)

✅ None

Configuration complexity

⚠️ High (CRDs, policies)

✅ Low (one YAML)

When to Use Which

Use Qtap (Service Mesh Lite) When:

✅ You need observability only

Understand service dependencies
Track errors and latency
Debug microservice issues

✅ You want zero latency overhead

Performance-critical applications
High-throughput services
Latency-sensitive workloads

✅ You want operational simplicity

Small team without mesh expertise
Don't want to manage control plane
Avoid sidecar complexity

✅ You have resource constraints

Limited cluster resources
Cost-sensitive deployments
Want to minimize overhead

Use a Service Mesh When:

⚠️ You need traffic control

Canary deployments and traffic splitting
Circuit breaking for resilience
Retry and timeout policies
Advanced load balancing

⚠️ You require mTLS

Zero-trust network requirements
Compliance mandates encryption
Need service-to-service authentication

⚠️ You need authorization

Fine-grained access control between services
Policy-based routing

⚠️ You have dedicated platform team

Team can manage mesh complexity
Already invested in service mesh skills
Have operational capacity

Part 8: Production Considerations

Scaling to Many Services

This demo uses 3 services. In production with 50+ services:

Qtap scales linearly:

One qtap per node (DaemonSet in Kubernetes)
Resource usage independent of service count
No per-service configuration needed

Service Mesh complexity grows:

One sidecar per pod (100 pods = 100 sidecars)
Control plane resource requirements increase
Configuration complexity multiplies

Integration with Observability Tools

Qtap outputs JSON. Pipe to your observability stack:

Send to S3 for long-term storage:

services:
  object_stores:
    - type: s3
      bucket: my-mesh-observability
      region: us-east-1

Send events to ClickHouse/Elasticsearch:

services:
  event_stores:
    - type: axiom
      api_token: $AXIOM_TOKEN
      dataset: mesh-traffic

Build dashboards:

Parse JSON logs to extract metrics
Visualize service dependencies
Alert on error rate increases

Kubernetes Deployment

For production Kubernetes, deploy qtap as a DaemonSet:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: qtap
  namespace: qpoint
spec:
  selector:
    matchLabels:
      app: qtap
  template:
    metadata:
      labels:
        app: qtap
    spec:
      hostNetwork: true
      hostPID: true
      containers:
      - name: qtap
        image: us-docker.pkg.dev/qpoint-edge/public/qtap:v0
        securityContext:
          privileged: true
        volumeMounts:
        - name: sys
          mountPath: /sys
        - name: docker-sock
          mountPath: /var/run/docker.sock
        - name: config
          mountPath: /app/config
        env:
        - name: TINI_SUBREAPER
          value: "1"
        args:
        - --log-level=info
        - --log-encoding=console
        - --config=/app/config/qtap.yaml
      volumes:
      - name: sys
        hostPath:
          path: /sys
      - name: docker-sock
        hostPath:
          path: /var/run/docker.sock
      - name: config
        configMap:
          name: qtap-config

Use Pod Labels for Service Attribution

In Kubernetes, use pod labels to identify services:

rulekit:
  macros:
    - name: is_payment_service
      expr: src.pod.labels.app == "payment-service"
    - name: is_auth_service
      expr: src.pod.labels.app == "auth-service"

stacks:
  mesh_observability:
    plugins:
      - type: http_capture
        config:
          level: summary
          rules:
            - name: "Payment service traffic"
              expr: is_payment_service()
              level: details

            - name: "Auth service traffic"
              expr: is_auth_service()
              level: details

Part 9: Cleanup

docker compose -f /tmp/docker-compose-mesh-lite.yaml down

Key Takeaways

Service mesh observability without service mesh complexity: Qtap provides service discovery, request tracing, error monitoring, and latency tracking without sidecars or control planes.
Zero latency impact: Out-of-band observation means no proxies in the request path. Your services run at full speed.
BOTH sides of every call: direction: all captures egress (leaving Service A) and ingress (arriving at Service B), giving complete visibility.
TLS inspection without certificates: Qtap sees inside HTTPS at the kernel level before encryption happens.
Operational simplicity: One DaemonSet per node vs sidecars on every pod. Simple YAML config vs complex CRDs and policies.
Know when you need more: If you need traffic control (canary, circuit breaking) or mTLS, use a full service mesh. If you just need visibility, Qtap is simpler.

HTTPS Header Capture Without Proxies - Deep dive on TLS inspection
Traffic Capture Settings - Understanding direction options
Traffic Processing with Plugins - Rulekit and capture levels
CI Security Validation - Another advanced use case

PreviousCI Security Validation NextQplane Guides

Last updated 3 days ago

The Service Mesh Complexity Problem

What Qtap Provides Instead

Demo Architecture

Prerequisites

Part 1: Deploy the Multi-Service Application

API Gateway Service

User Service

Order Service

Part 2: Qtap Configuration for Service Mesh Observability

Part 3: Deploy with Docker Compose

Part 4: Generate Traffic and Observe

Test 1: Simple Service-to-Service Call

Test 2: Service Calling External API

Test 3: Fan-Out Request

Test 4: Error Scenario

Part 5: Analyze Service Mesh Observability

View Qtap Logs

Example 1: Service-to-Service Call (Egress Side)

Example 2: Same Call (Ingress Side)

Example 3: External API Call

Part 6: Service Mesh Features Demonstrated

✅ Service Discovery

✅ Request Tracing

✅ Error Monitoring

✅ Latency Tracking

Part 7: Comparison - Service Mesh vs Qtap

When to Use Which

Use Qtap (Service Mesh Lite) When:

Use a Service Mesh When:

Part 8: Production Considerations

Scaling to Many Services

Integration with Observability Tools

Kubernetes Deployment

Use Pod Labels for Service Attribution

Part 9: Cleanup

Key Takeaways

Related Documentation