Alerting

The alerting system provides real-time monitoring and notification capabilities for your API infrastructure, enabling proactive response to performance issues and outages.

System Overview

The alerting system consists of four main components:

  1. Rules - Define monitoring conditions and thresholds

  2. Filters - Narrow down alert scope to specific conditions

  3. Integrations - Configure where alerts are sent

  4. Events - View alert history and patterns


1. Rules Configuration

Rules are the foundation of your alerting system, continuously monitoring metrics and triggering notifications when thresholds are crossed.

Pre-Built Rule Templates

Performance Monitoring:

  • Critical Latency Alert - Triggers when maximum response time exceeds critical threshold

  • Slow Response Time - Detects when API response times degrade beyond acceptable thresholds

  • High Traffic Alert - Monitors for unusually high request volumes that might impact performance

  • Bandwidth Consumption - Alerts when data transfer exceeds expected thresholds

Reliability Monitoring:

  • High Error Rate - Alerts when error rate exceeds threshold, enabling quick API failure detection

  • Low Availability Warning - Early warning when availability starts to drop

  • Vendor Availability Drop - Monitors when vendor service availability falls below acceptable levels

  • Client-specific Error Rate - Monitors errors for specific client integrations

Infrastructure Monitoring:

  • Connection Spike - Alerts on unusual spikes in connection count that might indicate issues

Custom Rule Creation

Step 1: Choose Alert Type

  • Monitor a Metric Against a Threshold (currently available)

  • Check for New Vendors, Endpoints, or Clients (coming soon)

  • Look for Percentage Changes in a Metric (coming soon)

Step 2: Configure Check Interval How frequently the rule evaluates your metrics:

  • 1 Minute (most responsive)

  • 5 Minutes

  • 15 Minutes

  • 30 Minutes

  • 1 Hour

  • 2, 4, 8, 12, 24 Hours (for less volatile metrics)

Step 3: Select Metric

Availability Metrics:

  • Availability - Overall service availability percentage

  • Average Availability - Mean availability across time period

  • P99/P95/P90 Availability - Percentile-based availability metrics

Performance Metrics:

  • Average Duration - Mean response time

  • P99/P95/P90/P50 Duration - Percentile response times

  • Maximum Duration - Worst-case response time

Traffic Metrics:

  • Total Connections - Active connection count

  • Connections per Second - Connection rate

  • Total Requests - Request volume

  • Requests per Second - Request rate

Error Metrics:

  • Errors - Total error count

  • Errors per Second - Error rate

Data Transfer Metrics:

  • Total Bytes In/Out - Combined data transfer

  • Total Bytes Sent/Received - Directional data transfer

  • Bytes Sent/Received per Second - Transfer rates

Step 4: Set Operator

  • Greater Than (>)

  • Less Than (<)

  • Equal (=)

  • Greater Than or Equal (≥)

  • Less Than or Equal (≤)

Step 5: Define Threshold Enter the numeric value that triggers the alert

Step 6: Group By (Optional) Segment alerts by:

  • Vendor - Monitor each vendor separately

  • Endpoint - Track individual endpoints

  • Client - Monitor per-client metrics

Step 7: Set Report Frequency

  • Don't Report - Evaluate only, no notifications

  • On State Change - Notify when alert state changes (recommended)

  • Always - Notify every time condition is met

  • Custom Schedule - Define specific intervals


2. Filters (Advanced Configuration)

Filters allow you to create highly specific alert conditions by narrowing the scope of monitored data.

Available Filter Dimensions:

Network & Infrastructure:

  • Source IP / Destination IP

  • Source Port / Destination Port

  • Protocol (HTTP/HTTPS, etc.)

  • Direction (inbound/outbound)

  • Host / Hostname

Geographic:

  • Country

  • Continent

  • Region

  • City

Application Layer:

  • Endpoint

  • HTTP Method

  • HTTP Status

  • Content Type

  • Error

  • TLS Version

Container/Kubernetes:

  • Container Image

  • Container Name

  • Pod Name

  • Pod Namespace

Business Logic:

  • Vendor

  • Environment

  • Strategy

  • Data Type

Security & Access:

  • Agent

  • System User

  • Bin

  • Executable

  • IP

How Filters Work:

  1. Add a filter by clicking the "+" button

  2. Select the dimension to filter on

  3. Choose the matching criteria

  4. Filters significantly reduce false positives by targeting specific scenarios

Filter operators (AND/OR) are coming soon.


3. Integrations

Integrations define where and how alerts are delivered. The system supports any webhook-based integration.

Creating an Integration

  1. Name: Descriptive identifier for your integration

  2. Description: Optional details about purpose/destination

  3. Status: Enable/Disable toggle

  4. Configuration:

    • Method: HTTP method (typically POST)

    • URL: Webhook endpoint URL

    • Auth Token: Optional authentication header

Alert Payload Format

When an alert fires, the following JSON payload is sent to your configured webhook:

{
  "orgId": "X0ysTipXUDzr9JPkY4W6",
  "contextId": "",
  "alertId": "cv4umbq9io6g00lgdpog",
  "name": "Low Availability Warning",
  "description": "Early warning when availability starts to drop",
  "message": "P99 Availability of 0.00% for cloudflare.com is below threshold of 99.50%",
  "locationId": "cloudflare.com",
  "locationType": "vendor",
  "timestamp": "2025-07-29T19:24:52.726139471Z"
}

Payload Fields:

  • orgId: Your organization identifier

  • contextId: Additional context identifier (if applicable)

  • alertId: Unique identifier for this alert instance

  • name: The rule name that triggered

  • description: The rule description

  • message: Human-readable alert message with current value, threshold, and location

  • locationId: The specific entity that triggered the alert (vendor, endpoint, or client)

  • locationType: Type of entity ("vendor", "endpoint", or "client")

  • timestamp: ISO 8601 timestamp when the alert was triggered


4. Events Dashboard

The Events page provides comprehensive visibility into your alert history.

Overview Features:

  • Time Range Selector: Past 15 minutes to 30 days

  • Summary Table: Total occurrences per rule

  • Activity Timeline: Visual representation of alert patterns

Per-Rule Analytics:

  • Occurrence count

  • Time-series graph showing when alerts fired

  • Pattern identification for recurring issues

  • Peak activity periods

Using Events Data:

  • Identify false positive patterns

  • Adjust thresholds based on historical data

  • Correlate alerts with known incidents

  • Track improvement over time


Common Alerting Scenarios

API Health Monitoring:

Rule: Availability < 99.5%
Filter: Environment = Production
Group By: Endpoint
Check Interval: 1 minute

Vendor SLA Tracking:

Rule: P99 Duration > 500ms
Filter: Vendor exists
Group By: Vendor
Check Interval: 5 minutes

Error Spike Detection:

Rule: Errors per Second > 10
Filter: HTTP Status = 5xx
Group By: Client
Check Interval: 1 minute

Geographic Performance:

Rule: Average Duration > 200ms
Filter: Country = "United States"
Group By: Region
Check Interval: 15 minutes

Last updated