Complete catalog of models available on Casola. Models scale from zero — instances launch on demand and shut down when idle.
| Model | Spec ID | Task Types | GPU | Quantization | Concurrency | Warm-up |
|---|
| Qwen 3-0.6B | qwen3-0.6b | openai/chat-completion | RTX 4090 | FP8 | 32 | ~30s |
| Qwen 3.5-4B | qwen3.5-4b | openai/chat-completion, openai/chat-completion/vision | RTX 4090 | — | 16 | ~30s |
| Qwen 3.5-9B | qwen3.5-9b | openai/chat-completion, openai/chat-completion/vision | RTX 4090 | — | 8 | ~20 min |
| GPT-OSS 20B | gpt-oss-20b | openai/chat-completion | RTX 4090 / 5090 | MXFP4 | 32 | ~30s |
Qwen 3.5 models support vision — pass images in message content using the OpenAI vision format.
| Model | Spec ID | Task Types | GPU | Concurrency | Warm-up |
|---|
| Qwen 3-0.6B Embed | qwen3-0.6b-embed | openai/embeddings | RTX 4090 | 32 | ~30s |
| GPT-OSS 20B Embed | gpt-oss-20b-embed | openai/embeddings | RTX 4090 / 5090 | 32 | ~30s |
| Model | Spec ID | Task Types | GPU | Concurrency | Warm-up |
|---|
| Qwen 3-0.6B Score | qwen3-0.6b-score | openai/rerank, openai/score | RTX 4090 | 32 | ~30s |
| GPT-OSS 20B Score | gpt-oss-20b-score | openai/rerank, openai/score | RTX 4090 / 5090 | 32 | ~30s |
| Model | Spec ID | Task Types | GPU | Quantization | Concurrency | Warm-up |
|---|
| FLUX.1-schnell | nunchaku-flux1-schnell | fal/text-to-image, fal/image-edit | RTX 4090 / 5090 | INT4 | 1 | ~10 min |
| FLUX.2-Klein 4B | sglang-diffusion-flux2-klein-4b | fal/text-to-image, fal/image-edit | RTX 4090 | FP8 | 1 | ~10 min |
| Qwen Image 2512 | qwen-image-2512 | fal/text-to-image, openai/image-generation | L40S / H100 / H200 | — | 1 | ~5 min |
| Qwen Image 2512 Lightning | qwen-image-2512-lightx2v-fp8 | fal/text-to-image | L40S | FP8 | 1 | ~5 min |
| Qwen Image 2512 FP8 | sglang-diffusion-qwen-image-2512-fp8 | fal/text-to-image | L40S / RTX PRO 6000 | FP8 | 1 | ~5 min |
| Model | Spec ID | Task Types | GPU | Quantization | Concurrency | Warm-up |
|---|
| Qwen Image Edit 2511 | qwen-image-edit-2511 | fal/image-edit | L40S / H100 / H200 | — | 1 | ~5 min |
| Qwen Image Edit 2511 Lightning | qwen-image-edit-2511-lightx2v-fp8 | fal/image-edit | L40S | FP8 | 1 | ~5 min |
| Qwen Image Edit 2511 FP8 | sglang-diffusion-qwen-image-edit-2511-fp8 | fal/image-edit | L40S / RTX PRO 6000 | FP8 | 1 | ~5 min |
FLUX.1-schnell and FLUX.2-Klein also support image editing — see the Image Generation table above.
| Model | Spec ID | Task Types | GPU | Quantization | Concurrency | Warm-up |
|---|
| Wan 2.2 TI2V 5B | wan22-ti2v | fal/text-to-video, fal/image-to-video | RTX 5090 | FP8 | 1 | ~10 min |
| Wan 2.2 T2V A14B | sglang-diffusion-wan22-t2v-a14b-fp8 | fal/text-to-video | RTX 5090 | FP8 | 1 | ~10 min |
| Wan 2.2 S2V 14B | wan22-s2v | fal/speech-to-video | H100 / H200 | — | 1 | ~10 min |
| LTX-2.3 Distilled | ltx2-distilled | fal/text-to-video, fal/image-to-video | RTX 5090 | FP8 | 1 | ~40 min |
All video generation is async only — submit a request and poll for the result.
| Model | Spec ID | Task Types | GPU | Concurrency | Warm-up | Notes |
|---|
| Qwen3 TTS | qwen3-tts | openai/audio-speech | RTX 4090 | — | ~30s | 9 built-in voices (multilingual) |
| Fox TTS | fox-tts | openai/audio-speech | External | — | — | 150+ voices, always-on |
| Whisper Large v3 | whisper-large-v3 | openai/audio-transcription | RTX 4090 | — | ~30s | OpenAI Whisper, transcription + translation |
| Voice | Language | Description |
|---|
| Vivian | Chinese | Bright, slightly edgy young female |
| Serena | Chinese | Warm, gentle young female |
| Uncle Fu | Chinese | Seasoned male, low mellow timbre |
| Dylan | Chinese | Youthful Beijing male, clear natural timbre |
| Eric | Chinese | Lively Chengdu male, slightly husky |
| Ryan | English | Dynamic male with strong rhythmic drive |
| Aiden | English | Sunny American male, clear midrange |
| Ono Anna | Japanese | Playful female, light nimble timbre |
| Sohee | Korean | Warm female with rich emotion |
Fox TTS provides 150+ character and celebrity voices — query GET /api/voice/models for the full list.
| Model | Spec ID | Task Types | GPU | Concurrency | Warm-up |
|---|
| DeepSeek OCR v1 | deepseek-ocr-v1 | openai/chat-completion, openai/chat-completion/ocr | RTX 4090 | 2 | ~3 min |
| DeepSeek OCR v2 | deepseek-ocr-v2 | openai/chat-completion, openai/chat-completion/ocr | RTX 4090 | 2 | ~3 min |
OCR models accept images via the Fal-compatible POST /fal/{slug} endpoint or the OpenAI vision format. See the OCR guide for details.