API Reference

Overview of the Casola API. For the full interactive reference with request/response schemas, see the API docs (served from /docs on the API).

Base URL

https://api.casola.ai

Authentication

All API requests require a Bearer token in the Authorization header:

Authorization: Bearer csl_your_token_here

Tokens are created in the dashboard under Settings → API Tokens, or via the API itself. Each token has scopes that control what it can access.

OpenAI-Compatible Endpoints

These endpoints follow the OpenAI API format. Most OpenAI client libraries work out of the box — just change the base URL.

Method	Path	Description
`POST`	`/openai/v1/chat/completions`	Chat completion (streaming supported)
`POST`	`/openai/v1/embeddings`	Text embeddings
`POST`	`/openai/v1/audio/speech`	Text-to-speech
`POST`	`/openai/v1/audio/transcriptions`	Speech-to-text (file upload or URL)
`POST`	`/openai/v1/images/generations`	Image generation
`POST`	`/openai/v1/images/edits`	Image editing / inpainting
`POST`	`/openai/v1/rerank`	Document reranking
`POST`	`/openai/v1/score`	Text pair scoring
`GET`	`/openai/v1/models`	List available models
`POST`	`/openai/v1/files`	Upload a file
`GET`	`/openai/v1/files`	List files
`POST`	`/openai/v1/batches`	Create a batch job
`GET`	`/openai/v1/batches`	List batches

Example: Chat Completion

curl https://api.casola.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.5-4B",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1711234567,
  "model": "Qwen/Qwen3.5-4B",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "Hello! How can I help you today?"},
      "finish_reason": "stop"
    }
  ],
  "usage": {"prompt_tokens": 10, "completion_tokens": 9, "total_tokens": 19}
}

Example: Text-to-Speech

curl https://api.casola.ai/openai/v1/audio/speech \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3-TTS",
    "input": "Hello, welcome to Casola!",
    "voice": "alloy",
    "response_format": "mp3"
  }' \
  --output speech.mp3

The response is the binary audio file directly.

Example: Speech-to-Text

curl https://api.casola.ai/openai/v1/audio/transcriptions \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -F model="whisper-large-v3" \
  -F file=@recording.mp3 \
  -F response_format="verbose_json"

Response:

{
  "task": "transcribe",
  "language": "en",
  "duration": 45.2,
  "text": "Hello, this is a sample recording...",
  "segments": [
    {"start": 0.0, "end": 2.5, "text": "Hello, this is a sample recording..."}
  ]
}

Example: Embeddings

curl https://api.casola.ai/openai/v1/embeddings \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "BAAI/bge-large-en-v1.5",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Response:

{
  "object": "list",
  "data": [
    {"object": "embedding", "index": 0, "embedding": [0.0123, -0.0456, 0.0789, "..."]}
  ],
  "model": "BAAI/bge-large-en-v1.5",
  "usage": {"prompt_tokens": 10, "total_tokens": 10}
}

Example: Image Generation

curl https://api.casola.ai/openai/v1/images/generations \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "flux",
    "prompt": "a neon-lit alley in the rain",
    "size": "1024x1024"
  }'

Response:

{
  "created": 1711234567,
  "data": [{"url": "https://cdn.casola.ai/outputs/img_abc123.png"}]
}

Example: Document Reranking

curl https://api.casola.ai/openai/v1/rerank \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "BAAI/bge-reranker-v2-m3",
    "query": "What is machine learning?",
    "documents": [
      "Machine learning is a subset of artificial intelligence.",
      "The weather today is sunny.",
      "Deep learning uses neural networks with many layers."
    ],
    "top_n": 2,
    "return_documents": true
  }'

Response:

{
  "object": "list",
  "data": [
    {"index": 0, "relevance_score": 0.95, "document": "Machine learning is a subset of artificial intelligence."},
    {"index": 2, "relevance_score": 0.82, "document": "Deep learning uses neural networks with many layers."}
  ]
}

Example: List Models

curl https://api.casola.ai/openai/v1/models \
  -H "Authorization: Bearer $CASOLA_API_KEY"

Response:

{
  "object": "list",
  "data": [
    {"id": "Qwen/Qwen3.5-4B", "object": "model", "created": 0, "owned_by": "casola"},
    {"id": "flux", "object": "model", "created": 0, "owned_by": "casola"}
  ]
}

Fal-Compatible Endpoints

Slug-based endpoints for image, video, and audio generation. These follow the Fal.ai request format.

Method	Path	Description
`POST`	`/fal/{slug}`	Submit a job (sync or async)
`GET`	`/fal/requests/{requestId}`	Poll job status / get result
`POST`	`/fal/requests/batch`	Batch status query (max 50 IDs)

Fal slugs are model-specific — check GET /api/model-status for the slug mapping, or see the Models reference.

Sync vs Async

Sync mode (sync_mode: true): The request blocks until the result is ready (up to 120s). Best for fast tasks like image generation.

curl https://api.casola.ai/fal/fal-ai/flux1-schnell-nunchaku \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A cat in space", "sync_mode": true}'

Async mode (default): Returns immediately with a request_id. Poll for the result.

# Submit
curl -X POST https://api.casola.ai/fal/fal-ai/wan/v2.2-5b/text-to-video \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A sunset over the ocean"}'

# Poll
curl https://api.casola.ai/fal/requests/{requestId} \
  -H "Authorization: Bearer $CASOLA_API_KEY"

Async Job Flow

For long-running tasks (video generation, batches), use the async job API:

POST /api/jobs          →  { id, queue_id }           # 1. Create job
GET  /api/jobs/{id}     →  { status, result, ... }    # 2. Poll until completed
POST /api/jobs/{id}/cancel                             # 3. Cancel if needed

Job Statuses

Status	Meaning
`pending`	Queued, waiting for a worker
`processing`	Worker is executing the job
`completed`	Result is ready
`failed`	Job failed (check `error` field)
`cancelled`	Cancelled by the user

Core API Endpoints

Method	Path	Description	Scope
`GET`	`/api/model-status`	Model availability and status	`user:read`
`GET`	`/api/voice/models`	List voice models with available voices	`user:read`
`POST`	`/api/workflows`	Create a workflow	`user:write`
`POST`	`/api/workflows/{id}/execute`	Execute a workflow	`user:write`
`POST`	`/api/prompt-rewrite`	AI-assisted prompt enhancement	`user:write`
`GET`	`/api/organizations/{orgId}/usage`	Usage aggregates	`admin:read`
`POST`	`/api/organizations/{orgId}/tokens`	Create an API token	`admin:write`

Example: Model Status

curl https://api.casola.ai/api/model-status \
  -H "Authorization: Bearer $CASOLA_API_KEY"

Response:

{
  "models": [
    {
      "model_id": "Qwen/Qwen3.5-4B",
      "spec_id": "spec_abc",
      "enabled": true,
      "tasks": ["openai/chat-completion"]
    }
  ]
}

Example: Create a Workflow

curl -X POST https://api.casola.ai/api/workflows \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Image Generator",
    "dag": {
      "nodes": {
        "gen": {
          "model_id": "flux",
          "task": "fal/text-to-image",
          "inputs": {"prompt": "${input.prompt}"},
          "outputs": ["images[0].url"]
        }
      },
      "edges": []
    }
  }'

Example: Execute a Workflow

curl -X POST https://api.casola.ai/api/workflows/wf_abc123/execute \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input_params": {"prompt": "a sunset over the ocean"}}'

Response:

{
  "execution": {
    "id": "exec_def456",
    "workflow_id": "wf_abc123",
    "status": "pending",
    "input_params": {"prompt": "a sunset over the ocean"},
    "created_at": 1711234567
  }
}

Example: Create an API Token

curl -X POST https://api.casola.ai/api/organizations/org_xyz/tokens \
  -H "Authorization: Bearer $CASOLA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "CI Pipeline Token",
    "scopes": ["user:read", "user:write"]
  }'

Response:

{
  "token": {
    "id": "tok_abc123",
    "name": "CI Pipeline Token",
    "scopes": ["user:read", "user:write"],
    "status": "active"
  },
  "secret": "csl_sk_abc123..."
}

The secret is only returned once — store it securely.

See the interactive API docs for the full endpoint list.

Rate Limiting

Requests are rate-limited per organization based on the plan:

Plan	API Requests	Job Submissions
Free	60/min	10/min
Pro	600/min	100/min
Enterprise	6,000/min	1,000/min

Job submissions are POST/PUT requests to /openai/v1/*, /fal/*, /api/jobs, or workflow execution endpoints. Everything else counts as an API request.

When rate-limited, the API returns 429 Too Many Requests:

{
  "error": {
    "code": "rate_limit",
    "message": "Rate limit exceeded",
    "type": "rate_limit_error"
  }
}

Error Format

All errors use a consistent envelope:

{
  "error": {
    "code": "not_found",
    "message": "Job not found",
    "type": "not_found_error"
  }
}

OpenAI-compatible endpoints return errors in the OpenAI error format for client library compatibility.