Why use a token bucket instead of a fixed-window or leaky-bucket limiter for fault events?

A fixed window resets its allowance on a hard boundary, so a micro-burst straddling the boundary is admitted at double the intended rate. A pure leaky bucket paces rigidly and starves correlation in the first seconds of an outage. A token bucket gives both burst tolerance up to capacity and a strict long-term ceiling fixed by refill_rate, which matches how alarm storms actually arrive.

Why must the refill clock use time.monotonic_ns() rather than time.time()?

Wall-clock time can step backward when NTP corrects a containerized host, producing a negative elapsed interval and a corrupted token count. time.monotonic_ns() advances at a steady rate independent of wall-clock adjustments and offers sub-microsecond resolution, so the refill arithmetic is always non-negative and deterministic.

How do I size capacity and refill_rate?

Set capacity to absorb a 30-second burst at the peak per-node rate — for a 200 EPS peak, capacity = 6000. Set refill_rate to about 1.5x the downstream ticketing API's documented safe rate — if the API accepts 100 EPS, use refill_rate = 150. This drains steadily without tripping the downstream limiter while leaving headroom for correlation.

How does the limiter avoid dropping Critical alarms during a storm?

The acquire() method checks severity against a CRITICAL set and admits those events unconditionally, even with an empty bucket. The bypass still debits the reservoir so a flood of false Criticals cannot defeat the ceiling indefinitely — it only borrows against future steady-state allowance.

Setting Up Token Bucket Rate Limiters

When network element (NE) telemetry and alarm streams experience sudden volume spikes — a fiber cut, a BGP route flap, or an EMS polling timeout — the downstream ticket routing automation pipeline saturates almost instantly. That saturation introduces queuing latency that directly inflates mean time to resolution (MTTR): a correlation engine sized for a steady-state 200 EPS that suddenly receives 8,000 EPS does not fail loudly, it fails slowly, stretching a sub-second MTTA into tens of seconds while operators wait on tickets that have not yet been created. The operational gap this page closes is narrow and specific: how to admit a bursty, normalized fault stream at a bounded long-term rate without dropping a single Critical alarm during the burst.

Fixed-window counters and leaky-bucket implementations both fail in this domain. A fixed window resets its allowance on a hard boundary, so a micro-burst straddling the boundary is admitted at double the intended rate, then a legitimate sustained storm is shed wholesale. A pure leaky bucket enforces rigid pacing that starves the correlation engine during the first seconds of an outage — exactly when freshness matters most. A correctly configured token bucket gives you both halves of the requirement at once: short-term burst tolerance up to a configured capacity, and a strict long-term throughput ceiling fixed by refill_rate. That pairing is what makes it the default admission primitive for telecom fault pipelines.

Schema Alignment and Taxonomy Anchor

This limiter is one admission primitive inside the Rate Limiting Strategies stage, which itself sits between normalization and correlation in the broader Ingestion & Parsing Workflows data plane. Its responsibility is deliberately thin: consume an already-normalized event, decide admit or defer, and stamp the decision. It does not parse, enrich, deduplicate, or correlate.

That thinness depends on the event already conforming to the canonical contract defined in Event Schema Design, so that asset_id, severity, vendor_alarm_code, and event_time are present and typed before any token is consumed. The bucket keys its quota on asset_id, and it reads severity against the bands defined in Defining Severity Levels for Telecom Faults so that a Critical alarm can bypass the quota entirely rather than being shed with the noise. An event that survived parsing but lacks a mappable asset_id must never silently consume token state — it is routed to quarantine so a schema-drift incident upstream cannot exhaust the budget of a healthy element.

Core Architecture and Clock Safety

The token bucket maintains a virtual reservoir that refills at a deterministic rate. Each admitted event consumes one token; when the bucket is empty the event is deferred, downgraded, or routed to a fallback handler rather than dropped. For a Python automation pipeline the implementation must be non-blocking, observable, and immune to clock anomalies.

Wall-clock time (time.time()) introduces subtle race conditions during refill: in containerized collectors an NTP step adjustment can move the clock backward mid-refill, producing a negative elapsed and a corrupted token count. time.monotonic_ns() is the correct source — it advances at a steady rate independent of wall-clock adjustments and gives sub-microsecond resolution. Because the pipeline is asyncio-native, mutual exclusion over the refill-and-decrement critical section is provided by an asyncio.Lock rather than a thread lock: the section is short and CPU-bound, so it serializes cleanly on the single event loop without ever blocking it.

Production Implementation

The limiter below is asyncio-native and designed for single-node ingestion workers that feed a distributed ticket routing queue. It prioritizes atomicity, explicit typing, a Critical-alarm bypass, and observability hooks. No blocking call sits on the hot path.

import asyncio
import time
import logging
from dataclasses import dataclass
from typing import Tuple

logger = logging.getLogger(__name__)

CRITICAL_SEVERITIES = frozenset({"CRITICAL", "EMERGENCY"})


@dataclass(slots=True)
class _BucketState:
    tokens: float
    last_refill_ns: int
    capacity: int
    refill_rate: float  # tokens per second


class AsyncTokenBucketLimiter:
    """Asyncio-native token bucket for telecom fault-event admission control."""

    def __init__(self, capacity: int, refill_rate: float):
        if capacity <= 0 or refill_rate <= 0:
            raise ValueError("capacity and refill_rate must be positive")

        self._lock = asyncio.Lock()
        self._state = _BucketState(
            tokens=float(capacity),
            last_refill_ns=time.monotonic_ns(),
            capacity=capacity,
            refill_rate=refill_rate,
        )
        self._rejection_count = 0
        self._bypass_count = 0

    async def acquire(self, severity: str) -> Tuple[bool, float]:
        """Admit one event. Returns (admitted, tokens_remaining).

        Critical alarms bypass the quota so a storm of cleared/reraised
        flaps can never shed a genuine outage signal.
        """
        async with self._lock:
            now_ns = time.monotonic_ns()
            elapsed_s = (now_ns - self._state.last_refill_ns) / 1_000_000_000.0

            # Deterministic, monotonic refill — never negative.
            self._state.tokens = min(
                self._state.capacity,
                self._state.tokens + elapsed_s * self._state.refill_rate,
            )
            self._state.last_refill_ns = now_ns

            if severity.upper() in CRITICAL_SEVERITIES:
                # Bypass admits unconditionally but still debits the reservoir
                # so post-storm steady state reflects the spend.
                self._state.tokens = max(0.0, self._state.tokens - 1.0)
                self._bypass_count += 1
                return True, self._state.tokens

            if self._state.tokens >= 1.0:
                self._state.tokens -= 1.0
                return True, self._state.tokens

            self._rejection_count += 1
            return False, self._state.tokens

    def get_metrics(self) -> dict:
        """Snapshot for Prometheus / telemetry export (no await: read-only)."""
        return {
            "tokens_available": self._state.tokens,
            "capacity": self._state.capacity,
            "refill_rate_eps": self._state.refill_rate,
            "total_rejections": self._rejection_count,
            "critical_bypasses": self._bypass_count,
        }

Async Ingestion Hook

The limiter slots into the pipeline as a non-blocking gate between the normalization stage and the correlation queue. The consumer pulls normalized events off the inbound asyncio.Queue, calls acquire(), and either forwards the event or hands it to the deferral path. Because acquire() only ever holds the lock for a few microseconds of arithmetic, a single consumer coroutine sustains tens of thousands of decisions per second while the event loop keeps draining the socket and servicing other tasks. This mirrors the backpressure model used in Implementing Asyncio for High-Volume SNMP, where a bounded queue turns overflow into an explicit, counted decision rather than an out-of-memory kill.

async def admission_worker(
    limiter: AsyncTokenBucketLimiter,
    inbound: asyncio.Queue,   # normalized fault events
    correlation: asyncio.Queue,  # admitted events -> rule tier
    deferral: asyncio.Queue,     # deferred events -> priority DLQ
) -> None:
    while True:
        event = await inbound.get()
        try:
            admitted, remaining = await limiter.acquire(event["severity"])
            if admitted:
                event["rate_limit_action"] = "admit"
                await correlation.put(event)
            else:
                event["rate_limit_action"] = "defer"
                await deferral.put(event)
                if remaining < 1.0:
                    logger.warning(
                        "bucket empty for asset=%s sev=%s",
                        event["asset_id"], event["severity"],
                    )
        finally:
            inbound.task_done()

For multi-replica deployments behind a partitioned bus, the in-process reservoir is replaced by a shared key-value store: a per-asset_id token count in Redis, refilled with the same monotonic arithmetic inside an atomic Lua script so that consumer replicas coordinate on one authoritative quota instead of each enforcing its own. The admission contract — (admitted, remaining) — stays identical, so nothing downstream changes.

Parameter Calibration and Sizing

Sizing the two parameters requires baseline telemetry from the Ingestion & Parsing Workflows layer. In steady state, NEs emit roughly 0.5 to 2 EPS per managed element; under fault conditions a single node can spike to 50–200 EPS.

capacity — size it to absorb a 30-second burst at the peak per-node rate. For a 200 EPS peak, capacity = 6000. This admits legitimate micro-bursts without immediate backpressure while bounding the worst-case memory footprint of the downstream queue.
refill_rate — set it to roughly 1.5× the ticketing API’s documented safe rate. If the ServiceNow or Remedy API safely accepts 100 EPS, configure refill_rate = 150. The bucket then drains steadily without tripping the downstream API’s own limits, while leaving headroom for correlation logic to run.

Cross-reference these against the stage-level Rate Limiting Strategies defaults so upstream parser concurrency and downstream API quotas stay aligned. Misaligned sizing is the primary cause of HTTP 429 cascades and ticket-deduplication failures — the limiter shields one stage only to overrun the next.

Mitigation and Hardening

When admission is refused, the event must be handled, not discarded. Concrete failure paths, in order of precedence:

Deferral, never silent drop. A refused event is tagged rate_limit_action="defer" and routed to a priority-aware dead-letter queue (DLQ) that preserves the original NE event_time and severity. Retry uses exponential backoff with full jitter so the recovered correlation API is not hit by a synchronized thundering herd.
Critical bypass. The CRITICAL_SEVERITIES check guarantees a genuine outage signal is admitted even with an empty bucket. The bypass still debits the reservoir, so a flood of false Criticals cannot be used to defeat the ceiling indefinitely — it merely borrows against future steady-state allowance.
Quarantine for unmappable payloads. An event lacking a resolvable asset_id is routed to quarantine before acquire() is ever called, so upstream schema drift cannot exhaust a healthy element’s quota.
Backpressure escalation. When the rejection rate exceeds 5% over a rolling 60-second window, the worker emits a backpressure signal to the upstream collector and, for sustained storms, trips a circuit breaker that shifts non-critical telemetry (performance counters, debug syslog) to a lower-priority tier — reserving bucket capacity for fault alarms and state-change events.

Operational Hardening Notes

A few tuning details separate a limiter that holds under a real storm from one that quietly corrupts its own state:

Always refill before you read. Compute elapsed and refill on every acquire(), not on a timer. A timer-driven refill races the decrement and lets the token count drift; the read-then-refill-then-decrement order under one lock is the only sequence immune to it.
Use slots=True on the state dataclass. Per-bucket footprint matters when you hold one bucket per asset_id across tens of thousands of managed elements; slots removes the per-instance __dict__ and meaningfully shrinks resident memory during a wide storm.
Keep the critical section arithmetic-only. No logging, no metric emission, and no await of any I/O inside the async with self._lock block. Emit metrics from the lock-free get_metrics() snapshot so observability never serializes the hot path.
Export the right gauges and counters. Surface token_bucket_tokens_available (gauge), token_bucket_rejections_total (counter), token_bucket_critical_bypasses_total (counter), and ticket_routing_api_429_rate (gauge). A rising 429 rate against a still-draining bucket is the canonical signal that refill_rate is set above the true downstream ceiling.
Bound dynamic adjustment. If you adapt refill_rate from downstream latency and 429 rates, clamp the change to ±20% per interval and drive it from an exponential moving average so the controller cannot oscillate the ceiling during a flapping outage.

Up to the parent stage: Rate Limiting Strategies — the admission-control layer this token bucket plugs into
Implementing Asyncio for High-Volume SNMP — the bounded-queue backpressure model the admission worker reuses
Event Schema Design — the canonical contract every event must satisfy before a token is consumed
Defining Severity Levels for Telecom Faults — the severity bands that drive the Critical-alarm bypass

Setting Up Token Bucket Rate Limiters #

Schema Alignment and Taxonomy Anchor #

Core Architecture and Clock Safety #

Production Implementation #

Async Ingestion Hook #

Parameter Calibration and Sizing #

Mitigation and Hardening #

Operational Hardening Notes #

Related #