Docs / Concepts / Rate limiting

Rate limiting

Per-tenant token bucket via Cloudflare Durable Objects. Strongly consistent, no race conditions, sub-millisecond check.

Defaults

Response when rate-limited

HTTP 429
HTTP/2 429
Retry-After: 12
Content-Type: application/json

{
  "success": false,
  "code":    "RATE_LIMITED",
  "message": "Per-tenant rate limit exceeded",
  "requestId": "..."
}

Why Durable Objects?

Token buckets need strong consistency — you can't double-spend tokens. Each tenant gets one Durable Object instance that holds the bucket state. Cloudflare routes all that tenant's requests to the same instance, so the check is local and sub-millisecond.

The alternative — KV with read-modify-write — is eventually consistent and racy at burst. Durable Objects give us linearizable semantics with no extra round-trips.

Tuning

To raise a single tenant's limit (e.g. an Enterprise plan):

d1 update
wrangler d1 execute inboxos --remote --command \
  "UPDATE tenants SET rate_limit_per_minute = 6000 WHERE id = '...'"

The middleware reads the tenant config on every request (via a KV cache, sub-ms), so changes apply within seconds.

Batch endpoint

POST /api/emails/batch counts as a single rate-limit token but dispatches up to 100 messages. This is intentional — batch sends are explicitly opted-in and limited by a separate AGENT_CONCURRENCY setting.