Skip to content

Troubleshooting

This guide covers common issues when using the Casola API. For the full list of error codes, see the Error Codes reference.

{ "error": { "code": "model_warming_up", "message": "Model is loading" } }

A GPU worker is loading the model weights into memory. This typically takes 30 seconds to 5 minutes depending on model size.

What to do:

  • Wait 30-60 seconds and retry.
  • For sync requests, the API will hold the connection open if dispatch_wait_ms is configured for the model — you may simply get a delayed response.
  • Use async submission for models with long warm-up times and poll for the result.
{ "error": { "code": "no_capacity", "message": "No workers available for this job type" } }

All GPU workers for this model are busy or none are provisioned. The scheduler auto-scales based on demand — new workers will be provisioned within 1-5 minutes.

What to do:

  • Retry after 10-30 seconds with exponential backoff.
  • For time-sensitive workloads, submit async requests that queue automatically.
{ "error": { "code": "rate_limit", "message": "Rate limit exceeded" } }

You’ve exceeded the request rate for your token or organization.

What to do:

  • Read the Retry-After header and wait that many seconds before retrying.
  • Spread requests across time or use batch submission for bulk workloads.
  • See Rate limit headers for all available headers.
{ "error": { "code": "quota_exceeded", "message": "Monthly usage quota exceeded" } }

Your organization has reached its spend limit.

What to do:

  • Check your usage on the Billing & Usage page.
  • Contact your organization owner to adjust the spend limit.
  • Do not retry — requests will continue to fail until the limit is raised or the billing period resets.
{ "error": { "code": "timeout", "message": "Worker did not respond in time" } }

The GPU worker didn’t complete processing within the time limit. This is common for sync requests on long-running tasks (video generation, large batch inference).

What to do:

  • Switch to async submission — async jobs have longer time limits and are more resilient to transient failures.
  • Retry the request — a different worker may be faster.
  • Reduce generation parameters (fewer inference steps, lower resolution) if applicable.

401 Unauthorized:

  • Missing or invalid Authorization: Bearer <token> header.
  • Token may have been deleted or expired.
  • Verify your token at the API Tokens page.

403 Forbidden:

  • Your token doesn’t have the required scope for this endpoint. See Scopes.
  • Your organization may restrict access to certain regions.

After submitting an async request (202 response), poll the status URL with exponential backoff:

  1. Wait 2 seconds after submission, then poll.
  2. If still IN_QUEUE or IN_PROGRESS, wait 5 seconds and poll again.
  3. Double the interval on each subsequent poll, up to 30 seconds.
  4. Stop after 30 minutes — the job has likely failed silently.
Terminal window
# Submit async request
REQUEST_ID=$(curl -s -X POST https://api.casola.ai/fal/queue/submit/... | jq -r '.request_id')
# Poll for result
curl -s https://api.casola.ai/fal/requests/$REQUEST_ID/status

The status response contains status (one of IN_QUEUE, IN_PROGRESS, COMPLETED, FAILED) and, once complete, the full result payload.