API Reference · Rate Limits

Rate Limits

The Noxy relay enforces two layers of rate limits — per app and per connection — across both the Agent API and the Client API. Limits are pre-quota: they protect the relay before any decision counts against your monthly pool.

Why two layers

  • Per-app limits cap traffic for a whole application — every device, every connection, every region — keyed by your APP_ID. They protect the shared relay from a single tenant.
  • Per-connection limits cap traffic on a single open stream — one Agent SDK process, one device WebSocket, one gRPC bi-di. They protect against runaway loops and bursty clients without affecting the rest of your fleet.

A request that breaks either layer is rejected; a request that survives both is then metered against your monthly decision quota.

Current limits

Defaults are tuned for typical HITL workloads — they apply to every app on the public relay.

ScopeTriggerMessagesBytes
Per app (authenticated)Any request after a session is established200 msg/s10 MB/s
Per app (unauthenticated)Authenticate and RegisterDevice only5 msg/s8 KB/s
Per connectionEach gRPC stream or WebSocket session20 msg/s1 MB/s

Both message and byte counts use a token-bucket implementation (governor) with a 1-second refill window — short bursts are allowed; sustained traffic above the rate is rejected.

Per-app limits are shared. They count against your APP_ID across all connections, agent backends, and devices. If your agent backend pushes 200 msg/s through RouteDecision, every device using the same app shares the remaining headroom.

What you see when limits are hit

The relay returns a structured error on the same channel as the original request. The exact surface depends on which API you are calling:

SurfaceHow the limit is reported
Agent API (gRPC)gRPC status RESOURCE_EXHAUSTED on the unary call. SDKs raise the language's standard rate-limit error type.
Client API (gRPC stream)A DeviceResponse with status = ERROR and error.code = RATE_LIMITED on the same stream. error.retryable is true.
Client API (WebSocket / JSON)A JSON frame: {"status":"ERROR","error":{"code":"RATE_LIMITED","message":"Rate limit exceeded","retryable":true}}.

Error envelope

{
  "requestId": "req-1731700000000",
  "status": "ERROR",
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded",
    "retryable": true
  }
}

Order of checks

The relay applies limits in a fixed order. Use this when you debug a 429-style failure:

  1. Connection rate limits — message count, then byte count of the inbound frame.
  2. App rate limits — using the unauthenticated bucket if the message is Authenticate or RegisterDevice, otherwise the authenticated bucket.
  3. App existence — unknown APP_ID short-circuits with APP_NOT_FOUND.
  4. Decision quota — only consumed on successful RouteDecision; APP_QUOTA_EXCEEDED if the monthly pool is empty.

The error you see therefore tells you which guard tripped — RATE_LIMITED means you exceeded one of the per-second buckets, not the monthly quota.

Handling rate limits

SDKs already implement most of this; if you call the API directly, follow the same patterns:

  • Back off exponentially. Retry on RATE_LIMITED / RESOURCE_EXHAUSTED with jitter. A starting delay of 250 ms doubling up to a 5 s cap works well at the default per-app rate.
  • Cap concurrency in your agent. If your agent fans out to many users in parallel, bound it to ~150 concurrent RouteDecision calls per app to leave headroom for retries and devices.
  • Reuse one Client SDK per device. The Client SDK keeps a single bi-di stream — opening parallel streams against the same identity does not help and uses up the per-connection budget faster.
  • Batch where it makes sense. Group device fan-out for one decision into one logical attempt; the relay returns one DeliveryOutcome per device but it is still a single user-visible decision against your quota.
  • Watch retryable. The flag is true for RATE_LIMITED — safe to retry after a delay. Errors with retryable: false (e.g. UNAUTHORIZED) won't succeed on retry.

Planning headroom

For a quick capacity check:

  • One synchronous decision exchange (route + outcome) is roughly 4 messages on the relay: RouteDecision + DecisionEvent to the device + DecisionOutcome from the device + a final SDK poll. At 200 msg/s per app that is well over 50 user decisions per second sustained.
  • Heavy RegisterDevice bursts on launch days are limited by the unauthenticated bucket (5 msg/s). If you onboard thousands of devices in minutes, consider staggering the rollout so registrations don't queue behind one another.
  • The per-connection ceiling matters for chatty bots — a Telegram bot delivering decisions to many users from one process should keep request rate per connection well under 20 msg/s.

Related