Models

Complete catalog of models available on Casola. Models scale from zero — instances launch on demand and shut down when idle.

Chat & Text

Model	Spec ID	Task Types	GPU	Quantization	Concurrency	Warm-up
Qwen 3-0.6B	`qwen3-0.6b`	`openai/chat-completion`	RTX 4090	FP8	32	~30s
Qwen 3.5-4B	`qwen3.5-4b`	`openai/chat-completion`, `openai/chat-completion/vision`	RTX 4090	—	16	~30s
Qwen 3.5-9B	`qwen3.5-9b`	`openai/chat-completion`, `openai/chat-completion/vision`	RTX 4090	—	8	~20 min
GPT-OSS 20B	`gpt-oss-20b`	`openai/chat-completion`	RTX 4090 / 5090	MXFP4	32	~30s

Qwen 3.5 models support vision — pass images in message content using the OpenAI vision format.

Model	Spec ID	Task Types	GPU	Concurrency	Warm-up
Qwen 3-0.6B Embed	`qwen3-0.6b-embed`	`openai/embeddings`	RTX 4090	32	~30s
GPT-OSS 20B Embed	`gpt-oss-20b-embed`	`openai/embeddings`	RTX 4090 / 5090	32	~30s

Model	Spec ID	Task Types	GPU	Concurrency	Warm-up
Qwen 3-0.6B Score	`qwen3-0.6b-score`	`openai/rerank`, `openai/score`	RTX 4090	32	~30s
GPT-OSS 20B Score	`gpt-oss-20b-score`	`openai/rerank`, `openai/score`	RTX 4090 / 5090	32	~30s

Image Generation

Model	Spec ID	Task Types	GPU	Quantization	Concurrency	Warm-up
FLUX.1-schnell	`nunchaku-flux1-schnell`	`fal/text-to-image`, `fal/image-edit`	RTX 4090 / 5090	INT4	1	~10 min
FLUX.2-Klein 4B	`sglang-diffusion-flux2-klein-4b`	`fal/text-to-image`, `fal/image-edit`	RTX 4090	FP8	1	~10 min
Qwen Image 2512	`qwen-image-2512`	`fal/text-to-image`, `openai/image-generation`	L40S / H100 / H200	—	1	~5 min
Qwen Image 2512 Lightning	`qwen-image-2512-lightx2v-fp8`	`fal/text-to-image`	L40S	FP8	1	~5 min
Qwen Image 2512 FP8	`sglang-diffusion-qwen-image-2512-fp8`	`fal/text-to-image`	L40S / RTX PRO 6000	FP8	1	~5 min

Image Editing

Model	Spec ID	Task Types	GPU	Quantization	Concurrency	Warm-up
Qwen Image Edit 2511	`qwen-image-edit-2511`	`fal/image-edit`	L40S / H100 / H200	—	1	~5 min
Qwen Image Edit 2511 Lightning	`qwen-image-edit-2511-lightx2v-fp8`	`fal/image-edit`	L40S	FP8	1	~5 min
Qwen Image Edit 2511 FP8	`sglang-diffusion-qwen-image-edit-2511-fp8`	`fal/image-edit`	L40S / RTX PRO 6000	FP8	1	~5 min

FLUX.1-schnell and FLUX.2-Klein also support image editing — see the Image Generation table above.

Video Generation

Model	Spec ID	Task Types	GPU	Quantization	Concurrency	Warm-up
Wan 2.2 TI2V 5B	`wan22-ti2v`	`fal/text-to-video`, `fal/image-to-video`	RTX 5090	FP8	1	~10 min
Wan 2.2 T2V A14B	`sglang-diffusion-wan22-t2v-a14b-fp8`	`fal/text-to-video`	RTX 5090	FP8	1	~10 min
Wan 2.2 S2V 14B	`wan22-s2v`	`fal/speech-to-video`	H100 / H200	—	1	~10 min
LTX-2.3 Distilled	`ltx2-distilled`	`fal/text-to-video`, `fal/image-to-video`	RTX 5090	FP8	1	~40 min

All video generation is async only — submit a request and poll for the result.

Audio

Model	Spec ID	Task Types	GPU	Concurrency	Warm-up	Notes
Qwen3 TTS	`qwen3-tts`	`openai/audio-speech`	RTX 4090	—	~30s	9 built-in voices (multilingual)
Fox TTS	`fox-tts`	`openai/audio-speech`	External	—	—	150+ voices, always-on
Whisper Large v3	`whisper-large-v3`	`openai/audio-transcription`	RTX 4090	—	~30s	OpenAI Whisper, transcription + translation

Voice	Language	Description
Vivian	Chinese	Bright, slightly edgy young female
Serena	Chinese	Warm, gentle young female
Uncle Fu	Chinese	Seasoned male, low mellow timbre
Dylan	Chinese	Youthful Beijing male, clear natural timbre
Eric	Chinese	Lively Chengdu male, slightly husky
Ryan	English	Dynamic male with strong rhythmic drive
Aiden	English	Sunny American male, clear midrange
Ono Anna	Japanese	Playful female, light nimble timbre
Sohee	Korean	Warm female with rich emotion

Fox TTS provides 150+ character and celebrity voices — query GET /api/voice/models for the full list.

Model	Spec ID	Task Types	GPU	Concurrency	Warm-up
DeepSeek OCR v1	`deepseek-ocr-v1`	`openai/chat-completion`, `openai/chat-completion/ocr`	RTX 4090	2	~3 min
DeepSeek OCR v2	`deepseek-ocr-v2`	`openai/chat-completion`, `openai/chat-completion/ocr`	RTX 4090	2	~3 min

OCR models accept images via the Fal-compatible POST /fal/{slug} endpoint or the OpenAI vision format. See the OCR guide for details.