Skip to content

Error Codes

All Casola API errors use a consistent envelope:

{
"error": {
"code": "not_found",
"message": "Resource not found"
}
}

The error object always contains code (machine-readable) and message (human-readable). Some errors include a details object with additional context, such as validation issues.

CodeHTTP StatusDescriptionRetryable
bad_request400Malformed requestNo
validation_error400Request body failed schema validationNo — fix request body
invalid_json400JSON parse failureNo
unauthorized401Missing or invalid authentication tokenNo — check credentials
forbidden403Token lacks required scope or roleNo
platform_invite_required403Platform invite needed to create an accountNo
not_found404Resource does not existNo
conflict409Resource already exists (duplicate)No
gone410Resource has expired (e.g. invite link)No
rate_limit429Too many requestsYes — honor Retry-After header
quota_exceeded429Plan usage limit reachedNo — upgrade plan or wait for reset
backlog_full429Job queue is at capacityYes — backoff and retry
internal_error500Unexpected server errorYes — retry with backoff
bad_gateway502Upstream provider returned an errorYes — retry
worker_error502GPU worker reported a failureYes — retry
no_capacity503No GPU workers available for this modelYes — backoff and retry
model_warming_up503Model is loading onto a GPUYes — wait 30s-5min, then retry
not_configured503Model is not configured on the platformNo
timeout504Job exceeded its time limitYes — retry may help

When you receive a 429 response with code rate_limit, the response includes headers to help you pace requests:

HeaderDescription
X-RateLimit-LimitMaximum requests allowed in the window
X-RateLimit-RemainingRequests remaining (always 0 on a 429)
X-RateLimit-ResetUnix timestamp (seconds) when the window resets
Retry-AfterSeconds to wait before retrying

Do retry (with exponential backoff):

  • 429 rate_limit — wait for the Retry-After duration
  • 429 backlog_full — the queue is temporarily full; retry after 5-10 seconds
  • 502 worker_error / 502 bad_gateway — transient upstream failure
  • 503 no_capacity — no workers are free; retry after 10-30 seconds
  • 503 model_warming_up — a GPU is loading the model; retry after 30-60 seconds
  • 504 timeout — the job took too long; retry or switch to async submission
  • 500 internal_error — unexpected failure; retry with backoff

Do not retry:

  • 400 errors — fix the request
  • 401 / 403 — check your token and scopes
  • 404 — resource does not exist
  • 409 — duplicate resource
  • 429 quota_exceeded — your plan’s usage limit is reached

Backoff strategy: Start with a 1-second delay, double on each retry, and cap at 60 seconds. Add jitter (random 0-500ms) to avoid thundering herd.