Video Generation

Casola can generate short video clips from text prompts or reference images. Navigate to /video in Studio to get started.

Video generation page

Input modes

Text-to-video — Describe the scene you want and Casola generates a video from scratch. Write specific, descriptive prompts for best results (e.g. “A golden retriever running through autumn leaves in slow motion, cinematic lighting”).

Image-to-video — Upload a reference image (JPG, PNG; max 10 MB) and Casola animates it. Great for bringing still photos or illustrations to life.

Quality presets

Three presets control the speed/quality trade-off:

Preset	Steps	Frames	FPS	Best for
Fast Draft	10	41	16	Quick idea validation
Balanced	30	81	16	General use
High Quality	50	161	24	Final output

You can also fine-tune individual settings under Advanced:

Resolution — 480p or 720p
Aspect ratio — 16:9 (landscape), 9:16 (portrait), or 1:1 (square)
Quality (inference steps) — 1–50; more steps = sharper detail
Prompt strength (guidance scale) — 1–20; higher values follow your prompt more closely
Frames — 1–300
FPS — 1–60
Seed — Set a specific seed to reproduce the same output

Generate all formats

Click Generate All Formats to produce three videos simultaneously — one in each aspect ratio (16:9, 9:16, 1:1). This is useful when you need content for different platforms (e.g. YouTube, Instagram Reels, and social feeds) from a single prompt.

Prompt rewriting

Some models support automatic prompt enhancement. When available, a toggle appears above the prompt field. Enable it to let the model expand your short prompt into a more detailed description, which often improves output quality.

Available models

Casola currently supports WAN 2.1 T2V/I2V for video generation. Check the Models reference for the latest availability and capabilities.

Processing times

Video generation takes significantly longer than image generation — expect 1–5 minutes depending on the quality preset, resolution, and current demand. Higher quality settings and more frames increase processing time.

Working with results

Each completed video shows its dimensions, duration, frame count, FPS, seed, and inference time. You can:

Play the video directly in Studio
Download the file
Reuse settings to generate a new video with the same parameters
Copy the prompt for iteration

All generated videos are automatically saved to your Library for later access.

API usage

Video generation always uses async mode — submit a request and poll for the result.

Text-to-video

# Submit
curl -X POST https://api.casola.ai/fal/fal-ai/wan/v2.2-5b/text-to-video \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A golden retriever running through autumn leaves in slow motion, cinematic lighting",
    "num_frames": 81,
    "fps": 16,
    "num_inference_steps": 30,
    "guidance_scale": 7.5
  }'

Response (202):

{
  "request_id": "req_abc123",
  "status": "processing"
}

Polling for the result

curl https://api.casola.ai/fal/requests/req_abc123 \
  -H "Authorization: Bearer YOUR_API_TOKEN"

Completed response:

{
  "request_id": "req_abc123",
  "status": "completed",
  "video": {"url": "https://cdn.casola.ai/outputs/vid_abc123.mp4"},
  "duration_seconds": 5.06,
  "width": 1280,
  "height": 720,
  "num_frames": 81,
  "fps": 16,
  "seed_used": 98765
}

Image-to-video

Animate a reference image by providing image_url:

curl -X POST https://api.casola.ai/fal/fal-ai/wan/v2.2-5b/image-to-video \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "the subject slowly turns to face the camera",
    "image_url": "https://example.com/photo.jpg",
    "num_frames": 81,
    "fps": 16
  }'

Polling with a script

REQUEST_ID="req_abc123"

while true; do
  RESPONSE=$(curl -s https://api.casola.ai/fal/requests/$REQUEST_ID \
    -H "Authorization: Bearer YOUR_API_TOKEN")
  STATUS=$(echo "$RESPONSE" | jq -r '.status')
  if [ "$STATUS" = "completed" ]; then
    echo "$RESPONSE" | jq '.video.url'
    break
  elif [ "$STATUS" = "failed" ]; then
    echo "Job failed:" && echo "$RESPONSE" | jq '.error'
    break
  fi
  sleep 5
done

Tips

Start with Fast Draft to iterate on your prompt, then switch to High Quality for the final version.
For image-to-video, choose source images with a clear subject and simple background for the most coherent animation.
Use a fixed seed when you want to compare the effect of changing other parameters.
Keep prompts descriptive but concise — mention the subject, action, camera angle, and mood.