Troubleshooting

This guide covers common issues when using the Casola API. For the full list of error codes, see the Error Codes reference.

Model warming up (503)

{ "error": { "code": "model_warming_up", "message": "Model is loading" } }

A GPU worker is loading the model weights into memory. This typically takes 30 seconds to 5 minutes depending on model size.

What to do:

Wait 30-60 seconds and retry.
For sync requests, the API will hold the connection open if dispatch_wait_ms is configured for the model — you may simply get a delayed response.
Use async submission for models with long warm-up times and poll for the result.

No capacity (503)

{ "error": { "code": "no_capacity", "message": "No workers available for this job type" } }

All GPU workers for this model are busy or none are provisioned. The scheduler auto-scales based on demand — new workers will be provisioned within 1-5 minutes.

What to do:

Retry after 10-30 seconds with exponential backoff.
For time-sensitive workloads, submit async requests that queue automatically.

Rate limited (429)

{ "error": { "code": "rate_limit", "message": "Rate limit exceeded" } }

You’ve exceeded the request rate for your token or organization.

What to do:

Read the Retry-After header and wait that many seconds before retrying.
Spread requests across time or use batch submission for bulk workloads.
See Rate limit headers for all available headers.

Quota exceeded (429)

{ "error": { "code": "quota_exceeded", "message": "Monthly usage quota exceeded" } }

Your organization has reached its spend limit.

What to do:

Check your usage on the Billing & Usage page.
Contact your organization owner to adjust the spend limit.
Do not retry — requests will continue to fail until the limit is raised or the billing period resets.

Job timeout (504)

{ "error": { "code": "timeout", "message": "Worker did not respond in time" } }

The GPU worker didn’t complete processing within the time limit. This is common for sync requests on long-running tasks (video generation, large batch inference).

What to do:

Switch to async submission — async jobs have longer time limits and are more resilient to transient failures.
Retry the request — a different worker may be faster.
Reduce generation parameters (fewer inference steps, lower resolution) if applicable.

Authentication errors (401 / 403)

401 Unauthorized:

Missing or invalid Authorization: Bearer <token> header.
Token may have been deleted or expired.
Verify your token at the API Tokens page.

403 Forbidden:

Your token doesn’t have the required scope for this endpoint. See Scopes.
Your organization may restrict access to certain regions.

Polling async jobs

After submitting an async request (202 response), poll the status URL with exponential backoff:

Wait 2 seconds after submission, then poll.
If still IN_QUEUE or IN_PROGRESS, wait 5 seconds and poll again.
Double the interval on each subsequent poll, up to 30 seconds.
Stop after 30 minutes — the job has likely failed silently.

# Submit async request
REQUEST_ID=$(curl -s -X POST https://api.casola.ai/fal/queue/submit/... | jq -r '.request_id')

# Poll for result
curl -s https://api.casola.ai/fal/requests/$REQUEST_ID/status

The status response contains status (one of IN_QUEUE, IN_PROGRESS, COMPLETED, FAILED) and, once complete, the full result payload.