API Reference
Overview of the Casola API. For the full interactive reference with request/response schemas, see the API docs (served from /docs on the API).
Base URL
Section titled “Base URL”https://api.casola.aiAuthentication
Section titled “Authentication”All API requests require a Bearer token in the Authorization header:
Authorization: Bearer csl_your_token_hereTokens are created in the dashboard under Settings → API Tokens, or via the API itself. Each token has scopes that control what it can access.
OpenAI-Compatible Endpoints
Section titled “OpenAI-Compatible Endpoints”These endpoints follow the OpenAI API format. Most OpenAI client libraries work out of the box — just change the base URL.
| Method | Path | Description |
|---|---|---|
POST | /openai/v1/chat/completions | Chat completion (streaming supported) |
POST | /openai/v1/embeddings | Text embeddings |
POST | /openai/v1/audio/speech | Text-to-speech |
POST | /openai/v1/audio/transcriptions | Speech-to-text (file upload or URL) |
POST | /openai/v1/images/generations | Image generation |
POST | /openai/v1/images/edits | Image editing / inpainting |
POST | /openai/v1/rerank | Document reranking |
POST | /openai/v1/score | Text pair scoring |
GET | /openai/v1/models | List available models |
POST | /openai/v1/files | Upload a file |
GET | /openai/v1/files | List files |
POST | /openai/v1/batches | Create a batch job |
GET | /openai/v1/batches | List batches |
Example: Chat Completion
Section titled “Example: Chat Completion”curl https://api.casola.ai/openai/v1/chat/completions \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen3.5-4B", "messages": [{"role": "user", "content": "Hello!"}] }'Response:
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1711234567, "model": "Qwen/Qwen3.5-4B", "choices": [ { "index": 0, "message": {"role": "assistant", "content": "Hello! How can I help you today?"}, "finish_reason": "stop" } ], "usage": {"prompt_tokens": 10, "completion_tokens": 9, "total_tokens": 19}}Example: Text-to-Speech
Section titled “Example: Text-to-Speech”curl https://api.casola.ai/openai/v1/audio/speech \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen3-TTS", "input": "Hello, welcome to Casola!", "voice": "alloy", "response_format": "mp3" }' \ --output speech.mp3The response is the binary audio file directly.
Example: Speech-to-Text
Section titled “Example: Speech-to-Text”curl https://api.casola.ai/openai/v1/audio/transcriptions \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -F model="whisper-large-v3" \ -F file=@recording.mp3 \ -F response_format="verbose_json"Response:
{ "task": "transcribe", "language": "en", "duration": 45.2, "text": "Hello, this is a sample recording...", "segments": [ {"start": 0.0, "end": 2.5, "text": "Hello, this is a sample recording..."} ]}Example: Embeddings
Section titled “Example: Embeddings”curl https://api.casola.ai/openai/v1/embeddings \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "BAAI/bge-large-en-v1.5", "input": "The quick brown fox jumps over the lazy dog" }'Response:
{ "object": "list", "data": [ {"object": "embedding", "index": 0, "embedding": [0.0123, -0.0456, 0.0789, "..."]} ], "model": "BAAI/bge-large-en-v1.5", "usage": {"prompt_tokens": 10, "total_tokens": 10}}Example: Image Generation
Section titled “Example: Image Generation”curl https://api.casola.ai/openai/v1/images/generations \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "flux", "prompt": "a neon-lit alley in the rain", "size": "1024x1024" }'Response:
{ "created": 1711234567, "data": [{"url": "https://cdn.casola.ai/outputs/img_abc123.png"}]}Example: Document Reranking
Section titled “Example: Document Reranking”curl https://api.casola.ai/openai/v1/rerank \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "BAAI/bge-reranker-v2-m3", "query": "What is machine learning?", "documents": [ "Machine learning is a subset of artificial intelligence.", "The weather today is sunny.", "Deep learning uses neural networks with many layers." ], "top_n": 2, "return_documents": true }'Response:
{ "object": "list", "data": [ {"index": 0, "relevance_score": 0.95, "document": "Machine learning is a subset of artificial intelligence."}, {"index": 2, "relevance_score": 0.82, "document": "Deep learning uses neural networks with many layers."} ]}Example: List Models
Section titled “Example: List Models”curl https://api.casola.ai/openai/v1/models \ -H "Authorization: Bearer $CASOLA_API_KEY"Response:
{ "object": "list", "data": [ {"id": "Qwen/Qwen3.5-4B", "object": "model", "created": 0, "owned_by": "casola"}, {"id": "flux", "object": "model", "created": 0, "owned_by": "casola"} ]}Fal-Compatible Endpoints
Section titled “Fal-Compatible Endpoints”Slug-based endpoints for image, video, and audio generation. These follow the Fal.ai request format.
| Method | Path | Description |
|---|---|---|
POST | /fal/{slug} | Submit a job (sync or async) |
GET | /fal/requests/{requestId} | Poll job status / get result |
POST | /fal/requests/batch | Batch status query (max 50 IDs) |
Fal slugs are model-specific — check GET /api/model-status for the slug mapping, or see the Models reference.
Sync vs Async
Section titled “Sync vs Async”Sync mode (sync_mode: true): The request blocks until the result is ready (up to 120s). Best for fast tasks like image generation.
curl https://api.casola.ai/fal/fal-ai/flux1-schnell-nunchaku \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "A cat in space", "sync_mode": true}'Async mode (default): Returns immediately with a request_id. Poll for the result.
# Submitcurl -X POST https://api.casola.ai/fal/fal-ai/wan/v2.2-5b/text-to-video \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "A sunset over the ocean"}'
# Pollcurl https://api.casola.ai/fal/requests/{requestId} \ -H "Authorization: Bearer $CASOLA_API_KEY"Async Job Flow
Section titled “Async Job Flow”For long-running tasks (video generation, batches), use the async job API:
POST /api/jobs → { id, queue_id } # 1. Create jobGET /api/jobs/{id} → { status, result, ... } # 2. Poll until completedPOST /api/jobs/{id}/cancel # 3. Cancel if neededJob Statuses
Section titled “Job Statuses”| Status | Meaning |
|---|---|
pending | Queued, waiting for a worker |
processing | Worker is executing the job |
completed | Result is ready |
failed | Job failed (check error field) |
cancelled | Cancelled by the user |
Core API Endpoints
Section titled “Core API Endpoints”| Method | Path | Description | Scope |
|---|---|---|---|
GET | /api/model-status | Model availability and status | user:read |
GET | /api/voice/models | List voice models with available voices | user:read |
POST | /api/workflows | Create a workflow | user:write |
POST | /api/workflows/{id}/execute | Execute a workflow | user:write |
POST | /api/prompt-rewrite | AI-assisted prompt enhancement | user:write |
GET | /api/organizations/{orgId}/usage | Usage aggregates | admin:read |
POST | /api/organizations/{orgId}/tokens | Create an API token | admin:write |
Example: Model Status
Section titled “Example: Model Status”curl https://api.casola.ai/api/model-status \ -H "Authorization: Bearer $CASOLA_API_KEY"Response:
{ "models": [ { "model_id": "Qwen/Qwen3.5-4B", "spec_id": "spec_abc", "enabled": true, "tasks": ["openai/chat-completion"] } ]}Example: Create a Workflow
Section titled “Example: Create a Workflow”curl -X POST https://api.casola.ai/api/workflows \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Image Generator", "dag": { "nodes": { "gen": { "model_id": "flux", "task": "fal/text-to-image", "inputs": {"prompt": "${input.prompt}"}, "outputs": ["images[0].url"] } }, "edges": [] } }'Example: Execute a Workflow
Section titled “Example: Execute a Workflow”curl -X POST https://api.casola.ai/api/workflows/wf_abc123/execute \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{"input_params": {"prompt": "a sunset over the ocean"}}'Response:
{ "execution": { "id": "exec_def456", "workflow_id": "wf_abc123", "status": "pending", "input_params": {"prompt": "a sunset over the ocean"}, "created_at": 1711234567 }}Example: Create an API Token
Section titled “Example: Create an API Token”curl -X POST https://api.casola.ai/api/organizations/org_xyz/tokens \ -H "Authorization: Bearer $CASOLA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "CI Pipeline Token", "scopes": ["user:read", "user:write"] }'Response:
{ "token": { "id": "tok_abc123", "name": "CI Pipeline Token", "scopes": ["user:read", "user:write"], "status": "active" }, "secret": "csl_sk_abc123..."}The secret is only returned once — store it securely.
See the interactive API docs for the full endpoint list.
Rate Limiting
Section titled “Rate Limiting”Requests are rate-limited per organization based on the plan:
| Plan | API Requests | Job Submissions |
|---|---|---|
| Free | 60/min | 10/min |
| Pro | 600/min | 100/min |
| Enterprise | 6,000/min | 1,000/min |
Job submissions are POST/PUT requests to /openai/v1/*, /fal/*, /api/jobs, or workflow execution endpoints. Everything else counts as an API request.
When rate-limited, the API returns 429 Too Many Requests:
{ "error": { "code": "rate_limit", "message": "Rate limit exceeded", "type": "rate_limit_error" }}Error Format
Section titled “Error Format”All errors use a consistent envelope:
{ "error": { "code": "not_found", "message": "Job not found", "type": "not_found_error" }}OpenAI-compatible endpoints return errors in the OpenAI error format for client library compatibility.