API Reference · Rate Limits
Rate Limits
The Noxy relay enforces two layers of rate limits — per app and per connection — across both the Agent API and the Client API. Limits are pre-quota: they protect the relay before any decision counts against your monthly pool.
Why two layers
- Per-app limits cap traffic for a whole application — every device, every connection, every region — keyed by your
APP_ID. They protect the shared relay from a single tenant. - Per-connection limits cap traffic on a single open stream — one Agent SDK process, one device WebSocket, one gRPC bi-di. They protect against runaway loops and bursty clients without affecting the rest of your fleet.
A request that breaks either layer is rejected; a request that survives both is then metered against your monthly decision quota.
Current limits
Defaults are tuned for typical HITL workloads — they apply to every app on the public relay.
| Scope | Trigger | Messages | Bytes |
|---|---|---|---|
| Per app (authenticated) | Any request after a session is established | 200 msg/s | 10 MB/s |
| Per app (unauthenticated) | Authenticate and RegisterDevice only | 5 msg/s | 8 KB/s |
| Per connection | Each gRPC stream or WebSocket session | 20 msg/s | 1 MB/s |
Both message and byte counts use a token-bucket implementation (governor) with a 1-second refill window — short bursts are allowed; sustained traffic above the rate is rejected.
Per-app limits are shared. They count against your APP_ID across all connections, agent backends, and devices. If your agent backend pushes 200 msg/s through RouteDecision, every device using the same app shares the remaining headroom.
What you see when limits are hit
The relay returns a structured error on the same channel as the original request. The exact surface depends on which API you are calling:
| Surface | How the limit is reported |
|---|---|
| Agent API (gRPC) | gRPC status RESOURCE_EXHAUSTED on the unary call. SDKs raise the language's standard rate-limit error type. |
| Client API (gRPC stream) | A DeviceResponse with status = ERROR and error.code = RATE_LIMITED on the same stream. error.retryable is true. |
| Client API (WebSocket / JSON) | A JSON frame: {"status":"ERROR","error":{"code":"RATE_LIMITED","message":"Rate limit exceeded","retryable":true}}. |
Error envelope
{
"requestId": "req-1731700000000",
"status": "ERROR",
"error": {
"code": "RATE_LIMITED",
"message": "Rate limit exceeded",
"retryable": true
}
}Order of checks
The relay applies limits in a fixed order. Use this when you debug a 429-style failure:
- Connection rate limits — message count, then byte count of the inbound frame.
- App rate limits — using the unauthenticated bucket if the message is
AuthenticateorRegisterDevice, otherwise the authenticated bucket. - App existence — unknown
APP_IDshort-circuits withAPP_NOT_FOUND. - Decision quota — only consumed on successful
RouteDecision;APP_QUOTA_EXCEEDEDif the monthly pool is empty.
The error you see therefore tells you which guard tripped — RATE_LIMITED means you exceeded one of the per-second buckets, not the monthly quota.
Handling rate limits
SDKs already implement most of this; if you call the API directly, follow the same patterns:
- Back off exponentially. Retry on
RATE_LIMITED/RESOURCE_EXHAUSTEDwith jitter. A starting delay of 250 ms doubling up to a 5 s cap works well at the default per-app rate. - Cap concurrency in your agent. If your agent fans out to many users in parallel, bound it to ~150 concurrent
RouteDecisioncalls per app to leave headroom for retries and devices. - Reuse one Client SDK per device. The Client SDK keeps a single bi-di stream — opening parallel streams against the same identity does not help and uses up the per-connection budget faster.
- Batch where it makes sense. Group device fan-out for one decision into one logical attempt; the relay returns one
DeliveryOutcomeper device but it is still a single user-visible decision against your quota. - Watch
retryable. The flag istrueforRATE_LIMITED— safe to retry after a delay. Errors withretryable: false(e.g.UNAUTHORIZED) won't succeed on retry.
Planning headroom
For a quick capacity check:
- One synchronous decision exchange (route + outcome) is roughly 4 messages on the relay:
RouteDecision+DecisionEventto the device +DecisionOutcomefrom the device + a final SDK poll. At 200 msg/s per app that is well over 50 user decisions per second sustained. - Heavy
RegisterDevicebursts on launch days are limited by the unauthenticated bucket (5 msg/s). If you onboard thousands of devices in minutes, consider staggering the rollout so registrations don't queue behind one another. - The per-connection ceiling matters for chatty bots — a Telegram bot delivering decisions to many users from one process should keep request rate per connection well under 20 msg/s.