Skip to content

Models

Complete catalog of models available on Casola. Models scale from zero — instances launch on demand and shut down when idle.

ModelSpec IDTask TypesGPUQuantizationConcurrencyWarm-up
Qwen 3-0.6Bqwen3-0.6bopenai/chat-completionRTX 4090FP832~30s
Qwen 3.5-4Bqwen3.5-4bopenai/chat-completion, openai/chat-completion/visionRTX 409016~30s
Qwen 3.5-9Bqwen3.5-9bopenai/chat-completion, openai/chat-completion/visionRTX 40908~20 min
GPT-OSS 20Bgpt-oss-20bopenai/chat-completionRTX 4090 / 5090MXFP432~30s

Qwen 3.5 models support vision — pass images in message content using the OpenAI vision format.

ModelSpec IDTask TypesGPUConcurrencyWarm-up
Qwen 3-0.6B Embedqwen3-0.6b-embedopenai/embeddingsRTX 409032~30s
GPT-OSS 20B Embedgpt-oss-20b-embedopenai/embeddingsRTX 4090 / 509032~30s
ModelSpec IDTask TypesGPUConcurrencyWarm-up
Qwen 3-0.6B Scoreqwen3-0.6b-scoreopenai/rerank, openai/scoreRTX 409032~30s
GPT-OSS 20B Scoregpt-oss-20b-scoreopenai/rerank, openai/scoreRTX 4090 / 509032~30s
ModelSpec IDTask TypesGPUQuantizationConcurrencyWarm-up
FLUX.1-schnellnunchaku-flux1-schnellfal/text-to-image, fal/image-editRTX 4090 / 5090INT41~10 min
FLUX.2-Klein 4Bsglang-diffusion-flux2-klein-4bfal/text-to-image, fal/image-editRTX 4090FP81~10 min
Qwen Image 2512qwen-image-2512fal/text-to-image, openai/image-generationL40S / H100 / H2001~5 min
Qwen Image 2512 Lightningqwen-image-2512-lightx2v-fp8fal/text-to-imageL40SFP81~5 min
Qwen Image 2512 FP8sglang-diffusion-qwen-image-2512-fp8fal/text-to-imageL40S / RTX PRO 6000FP81~5 min
ModelSpec IDTask TypesGPUQuantizationConcurrencyWarm-up
Qwen Image Edit 2511qwen-image-edit-2511fal/image-editL40S / H100 / H2001~5 min
Qwen Image Edit 2511 Lightningqwen-image-edit-2511-lightx2v-fp8fal/image-editL40SFP81~5 min
Qwen Image Edit 2511 FP8sglang-diffusion-qwen-image-edit-2511-fp8fal/image-editL40S / RTX PRO 6000FP81~5 min

FLUX.1-schnell and FLUX.2-Klein also support image editing — see the Image Generation table above.

ModelSpec IDTask TypesGPUQuantizationConcurrencyWarm-up
Wan 2.2 TI2V 5Bwan22-ti2vfal/text-to-video, fal/image-to-videoRTX 5090FP81~10 min
Wan 2.2 T2V A14Bsglang-diffusion-wan22-t2v-a14b-fp8fal/text-to-videoRTX 5090FP81~10 min
Wan 2.2 S2V 14Bwan22-s2vfal/speech-to-videoH100 / H2001~10 min
LTX-2.3 Distilledltx2-distilledfal/text-to-video, fal/image-to-videoRTX 5090FP81~40 min

All video generation is async only — submit a request and poll for the result.

ModelSpec IDTask TypesGPUConcurrencyWarm-upNotes
Qwen3 TTSqwen3-ttsopenai/audio-speechRTX 4090~30s9 built-in voices (multilingual)
Fox TTSfox-ttsopenai/audio-speechExternal150+ voices, always-on
Whisper Large v3whisper-large-v3openai/audio-transcriptionRTX 4090~30sOpenAI Whisper, transcription + translation
VoiceLanguageDescription
VivianChineseBright, slightly edgy young female
SerenaChineseWarm, gentle young female
Uncle FuChineseSeasoned male, low mellow timbre
DylanChineseYouthful Beijing male, clear natural timbre
EricChineseLively Chengdu male, slightly husky
RyanEnglishDynamic male with strong rhythmic drive
AidenEnglishSunny American male, clear midrange
Ono AnnaJapanesePlayful female, light nimble timbre
SoheeKoreanWarm female with rich emotion

Fox TTS provides 150+ character and celebrity voices — query GET /api/voice/models for the full list.

ModelSpec IDTask TypesGPUConcurrencyWarm-up
DeepSeek OCR v1deepseek-ocr-v1openai/chat-completion, openai/chat-completion/ocrRTX 40902~3 min
DeepSeek OCR v2deepseek-ocr-v2openai/chat-completion, openai/chat-completion/ocrRTX 40902~3 min

OCR models accept images via the Fal-compatible POST /fal/{slug} endpoint or the OpenAI vision format. See the OCR guide for details.