Skip to content

API Reference

Overview of the Casola API. For the full interactive reference with request/response schemas, see the API docs (served from /docs on the API).

https://api.casola.ai

All API requests require a Bearer token in the Authorization header:

Authorization: Bearer csl_your_token_here

Tokens are created in the dashboard under Settings → API Tokens, or via the API itself. Each token has scopes that control what it can access.

These endpoints follow the OpenAI API format. Most OpenAI client libraries work out of the box — just change the base URL.

MethodPathDescription
POST/openai/v1/chat/completionsChat completion (streaming supported)
POST/openai/v1/embeddingsText embeddings
POST/openai/v1/audio/speechText-to-speech
POST/openai/v1/audio/transcriptionsSpeech-to-text (file upload or URL)
POST/openai/v1/images/generationsImage generation
POST/openai/v1/images/editsImage editing / inpainting
POST/openai/v1/rerankDocument reranking
POST/openai/v1/scoreText pair scoring
GET/openai/v1/modelsList available models
POST/openai/v1/filesUpload a file
GET/openai/v1/filesList files
POST/openai/v1/batchesCreate a batch job
GET/openai/v1/batchesList batches
Terminal window
curl https://api.casola.ai/openai/v1/chat/completions \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3.5-4B",
"messages": [{"role": "user", "content": "Hello!"}]
}'

Response:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1711234567,
"model": "Qwen/Qwen3.5-4B",
"choices": [
{
"index": 0,
"message": {"role": "assistant", "content": "Hello! How can I help you today?"},
"finish_reason": "stop"
}
],
"usage": {"prompt_tokens": 10, "completion_tokens": 9, "total_tokens": 19}
}
Terminal window
curl https://api.casola.ai/openai/v1/audio/speech \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen3-TTS",
"input": "Hello, welcome to Casola!",
"voice": "alloy",
"response_format": "mp3"
}' \
--output speech.mp3

The response is the binary audio file directly.

Terminal window
curl https://api.casola.ai/openai/v1/audio/transcriptions \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-F model="whisper-large-v3" \
-F file=@recording.mp3 \
-F response_format="verbose_json"

Response:

{
"task": "transcribe",
"language": "en",
"duration": 45.2,
"text": "Hello, this is a sample recording...",
"segments": [
{"start": 0.0, "end": 2.5, "text": "Hello, this is a sample recording..."}
]
}
Terminal window
curl https://api.casola.ai/openai/v1/embeddings \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "BAAI/bge-large-en-v1.5",
"input": "The quick brown fox jumps over the lazy dog"
}'

Response:

{
"object": "list",
"data": [
{"object": "embedding", "index": 0, "embedding": [0.0123, -0.0456, 0.0789, "..."]}
],
"model": "BAAI/bge-large-en-v1.5",
"usage": {"prompt_tokens": 10, "total_tokens": 10}
}
Terminal window
curl https://api.casola.ai/openai/v1/images/generations \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "flux",
"prompt": "a neon-lit alley in the rain",
"size": "1024x1024"
}'

Response:

{
"created": 1711234567,
"data": [{"url": "https://cdn.casola.ai/outputs/img_abc123.png"}]
}
Terminal window
curl https://api.casola.ai/openai/v1/rerank \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "BAAI/bge-reranker-v2-m3",
"query": "What is machine learning?",
"documents": [
"Machine learning is a subset of artificial intelligence.",
"The weather today is sunny.",
"Deep learning uses neural networks with many layers."
],
"top_n": 2,
"return_documents": true
}'

Response:

{
"object": "list",
"data": [
{"index": 0, "relevance_score": 0.95, "document": "Machine learning is a subset of artificial intelligence."},
{"index": 2, "relevance_score": 0.82, "document": "Deep learning uses neural networks with many layers."}
]
}
Terminal window
curl https://api.casola.ai/openai/v1/models \
-H "Authorization: Bearer $CASOLA_API_KEY"

Response:

{
"object": "list",
"data": [
{"id": "Qwen/Qwen3.5-4B", "object": "model", "created": 0, "owned_by": "casola"},
{"id": "flux", "object": "model", "created": 0, "owned_by": "casola"}
]
}

Slug-based endpoints for image, video, and audio generation. These follow the Fal.ai request format.

MethodPathDescription
POST/fal/{slug}Submit a job (sync or async)
GET/fal/requests/{requestId}Poll job status / get result
POST/fal/requests/batchBatch status query (max 50 IDs)

Fal slugs are model-specific — check GET /api/model-status for the slug mapping, or see the Models reference.

Sync mode (sync_mode: true): The request blocks until the result is ready (up to 120s). Best for fast tasks like image generation.

Terminal window
curl https://api.casola.ai/fal/fal-ai/flux1-schnell-nunchaku \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "A cat in space", "sync_mode": true}'

Async mode (default): Returns immediately with a request_id. Poll for the result.

Terminal window
# Submit
curl -X POST https://api.casola.ai/fal/fal-ai/wan/v2.2-5b/text-to-video \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "A sunset over the ocean"}'
# Poll
curl https://api.casola.ai/fal/requests/{requestId} \
-H "Authorization: Bearer $CASOLA_API_KEY"

For long-running tasks (video generation, batches), use the async job API:

POST /api/jobs → { id, queue_id } # 1. Create job
GET /api/jobs/{id} → { status, result, ... } # 2. Poll until completed
POST /api/jobs/{id}/cancel # 3. Cancel if needed
StatusMeaning
pendingQueued, waiting for a worker
processingWorker is executing the job
completedResult is ready
failedJob failed (check error field)
cancelledCancelled by the user
MethodPathDescriptionScope
GET/api/model-statusModel availability and statususer:read
GET/api/voice/modelsList voice models with available voicesuser:read
POST/api/workflowsCreate a workflowuser:write
POST/api/workflows/{id}/executeExecute a workflowuser:write
POST/api/prompt-rewriteAI-assisted prompt enhancementuser:write
GET/api/organizations/{orgId}/usageUsage aggregatesadmin:read
POST/api/organizations/{orgId}/tokensCreate an API tokenadmin:write
Terminal window
curl https://api.casola.ai/api/model-status \
-H "Authorization: Bearer $CASOLA_API_KEY"

Response:

{
"models": [
{
"model_id": "Qwen/Qwen3.5-4B",
"spec_id": "spec_abc",
"enabled": true,
"tasks": ["openai/chat-completion"]
}
]
}
Terminal window
curl -X POST https://api.casola.ai/api/workflows \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Image Generator",
"dag": {
"nodes": {
"gen": {
"model_id": "flux",
"task": "fal/text-to-image",
"inputs": {"prompt": "${input.prompt}"},
"outputs": ["images[0].url"]
}
},
"edges": []
}
}'
Terminal window
curl -X POST https://api.casola.ai/api/workflows/wf_abc123/execute \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input_params": {"prompt": "a sunset over the ocean"}}'

Response:

{
"execution": {
"id": "exec_def456",
"workflow_id": "wf_abc123",
"status": "pending",
"input_params": {"prompt": "a sunset over the ocean"},
"created_at": 1711234567
}
}
Terminal window
curl -X POST https://api.casola.ai/api/organizations/org_xyz/tokens \
-H "Authorization: Bearer $CASOLA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "CI Pipeline Token",
"scopes": ["user:read", "user:write"]
}'

Response:

{
"token": {
"id": "tok_abc123",
"name": "CI Pipeline Token",
"scopes": ["user:read", "user:write"],
"status": "active"
},
"secret": "csl_sk_abc123..."
}

The secret is only returned once — store it securely.

See the interactive API docs for the full endpoint list.

Requests are rate-limited per organization based on the plan:

PlanAPI RequestsJob Submissions
Free60/min10/min
Pro600/min100/min
Enterprise6,000/min1,000/min

Job submissions are POST/PUT requests to /openai/v1/*, /fal/*, /api/jobs, or workflow execution endpoints. Everything else counts as an API request.

When rate-limited, the API returns 429 Too Many Requests:

{
"error": {
"code": "rate_limit",
"message": "Rate limit exceeded",
"type": "rate_limit_error"
}
}

All errors use a consistent envelope:

{
"error": {
"code": "not_found",
"message": "Job not found",
"type": "not_found_error"
}
}

OpenAI-compatible endpoints return errors in the OpenAI error format for client library compatibility.