Skip to content

Video Generation

Casola can generate short video clips from text prompts or reference images. Navigate to /video in Studio to get started.

Video generation page

Text-to-video — Describe the scene you want and Casola generates a video from scratch. Write specific, descriptive prompts for best results (e.g. “A golden retriever running through autumn leaves in slow motion, cinematic lighting”).

Image-to-video — Upload a reference image (JPG, PNG; max 10 MB) and Casola animates it. Great for bringing still photos or illustrations to life.

Three presets control the speed/quality trade-off:

PresetStepsFramesFPSBest for
Fast Draft104116Quick idea validation
Balanced308116General use
High Quality5016124Final output

You can also fine-tune individual settings under Advanced:

  • Resolution — 480p or 720p
  • Aspect ratio — 16:9 (landscape), 9:16 (portrait), or 1:1 (square)
  • Quality (inference steps) — 1–50; more steps = sharper detail
  • Prompt strength (guidance scale) — 1–20; higher values follow your prompt more closely
  • Frames — 1–300
  • FPS — 1–60
  • Seed — Set a specific seed to reproduce the same output

Click Generate All Formats to produce three videos simultaneously — one in each aspect ratio (16:9, 9:16, 1:1). This is useful when you need content for different platforms (e.g. YouTube, Instagram Reels, and social feeds) from a single prompt.

Some models support automatic prompt enhancement. When available, a toggle appears above the prompt field. Enable it to let the model expand your short prompt into a more detailed description, which often improves output quality.

Casola currently supports WAN 2.1 T2V/I2V for video generation. Check the Models reference for the latest availability and capabilities.

Video generation takes significantly longer than image generation — expect 1–5 minutes depending on the quality preset, resolution, and current demand. Higher quality settings and more frames increase processing time.

Each completed video shows its dimensions, duration, frame count, FPS, seed, and inference time. You can:

  • Play the video directly in Studio
  • Download the file
  • Reuse settings to generate a new video with the same parameters
  • Copy the prompt for iteration

All generated videos are automatically saved to your Library for later access.

Video generation always uses async mode — submit a request and poll for the result.

Terminal window
# Submit
curl -X POST https://api.casola.ai/fal/fal-ai/wan/v2.2-5b/text-to-video \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A golden retriever running through autumn leaves in slow motion, cinematic lighting",
"num_frames": 81,
"fps": 16,
"num_inference_steps": 30,
"guidance_scale": 7.5
}'

Response (202):

{
"request_id": "req_abc123",
"status": "processing"
}
Terminal window
curl https://api.casola.ai/fal/requests/req_abc123 \
-H "Authorization: Bearer YOUR_API_TOKEN"

Completed response:

{
"request_id": "req_abc123",
"status": "completed",
"video": {"url": "https://cdn.casola.ai/outputs/vid_abc123.mp4"},
"duration_seconds": 5.06,
"width": 1280,
"height": 720,
"num_frames": 81,
"fps": 16,
"seed_used": 98765
}

Animate a reference image by providing image_url:

Terminal window
curl -X POST https://api.casola.ai/fal/fal-ai/wan/v2.2-5b/image-to-video \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prompt": "the subject slowly turns to face the camera",
"image_url": "https://example.com/photo.jpg",
"num_frames": 81,
"fps": 16
}'
Terminal window
REQUEST_ID="req_abc123"
while true; do
RESPONSE=$(curl -s https://api.casola.ai/fal/requests/$REQUEST_ID \
-H "Authorization: Bearer YOUR_API_TOKEN")
STATUS=$(echo "$RESPONSE" | jq -r '.status')
if [ "$STATUS" = "completed" ]; then
echo "$RESPONSE" | jq '.video.url'
break
elif [ "$STATUS" = "failed" ]; then
echo "Job failed:" && echo "$RESPONSE" | jq '.error'
break
fi
sleep 5
done
  • Start with Fast Draft to iterate on your prompt, then switch to High Quality for the final version.
  • For image-to-video, choose source images with a clear subject and simple background for the most coherent animation.
  • Use a fixed seed when you want to compare the effect of changing other parameters.
  • Keep prompts descriptive but concise — mention the subject, action, camera angle, and mood.