Rate limiting
Per-tenant token bucket via Cloudflare Durable Objects. Strongly consistent, no race conditions, sub-millisecond check.
Defaults
- 600 requests / minute per tenant (refill rate 10/sec).
- Bucket size = burst capacity = same 600.
- Configurable via
RATE_LIMIT_PER_MINUTEinwrangler.tomlor per-tenant intenants.rate_limit_per_minute.
Response when rate-limited
HTTP/2 429
Retry-After: 12
Content-Type: application/json
{
"success": false,
"code": "RATE_LIMITED",
"message": "Per-tenant rate limit exceeded",
"requestId": "..."
}Why Durable Objects?
Token buckets need strong consistency — you can't double-spend tokens. Each tenant gets one Durable Object instance that holds the bucket state. Cloudflare routes all that tenant's requests to the same instance, so the check is local and sub-millisecond.
The alternative — KV with read-modify-write — is eventually consistent and racy at burst. Durable Objects give us linearizable semantics with no extra round-trips.
Tuning
To raise a single tenant's limit (e.g. an Enterprise plan):
wrangler d1 execute inboxos --remote --command \ "UPDATE tenants SET rate_limit_per_minute = 6000 WHERE id = '...'"
The middleware reads the tenant config on every request (via a KV cache, sub-ms), so changes apply within seconds.
Batch endpoint
POST /api/emails/batch counts as a single rate-limit token but dispatches up to 100 messages. This is intentional — batch sends are explicitly opted-in and limited by a separate AGENT_CONCURRENCY setting.