Skip to content

Task Types

Task types define the input/output contract for jobs and workflow nodes. Each model supports one or more task types — see the Models reference for which models support which tasks.

Task TypeCategorySyncDescription
openai/chat-completionTextYesChat completion (LLM)
openai/chat-completion/visionTextYesChat completion with image input
openai/chat-completion/ocrOCRYesOCR via vision model
openai/embeddingsTextYesText embeddings
openai/rerankTextYesDocument reranking
openai/scoreTextYesText pair similarity scoring
openai/audio-speechAudioYesText-to-speech
openai/audio-transcriptionAudioYesSpeech-to-text
openai/image-generationImageYesImage generation (OpenAI format)
fal/text-to-imageImageYesImage generation (Fal format)
fal/image-editImageYesImage editing / inpainting
fal/text-to-videoVideoNoText-to-video generation
fal/image-to-videoVideoNoImage-to-video animation
fal/speech-to-videoVideoNoSpeech-driven video (talking head)
fal/audio-transcriptionAudioYesSpeech-to-text (Fal format)
fal/video-interpolateVideoNoFrame interpolation (slow motion)
fal/video-upscaleVideoNoVideo super-resolution

Sync = can be dispatched synchronously (result returned inline). Async tasks return a request_id for polling.

Standard chat completion following the OpenAI API format.

FieldTypeRequiredDescription
messagesarrayYesArray of {role, content} message objects
modelstringNoModel ID (set automatically when using a specific endpoint)
temperaturenumberNoSampling temperature (0-2)
top_pnumberNoNucleus sampling
max_tokensnumberNoMaximum tokens to generate
streambooleanNoEnable streaming response
stopstring/arrayNoStop sequences
toolsarrayNoTool/function definitions

Output: choices[0].message.content (text)

Models: qwen3-0.6b, qwen3.5-4b, qwen3.5-9b, gpt-oss-20b, deepseek-ocr-v1, deepseek-ocr-v2

Chat completion with image input. Same parameters as openai/chat-completion, but messages can include image content:

{
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://..."}}
]
}]
}

Models: qwen3.5-4b, qwen3.5-9b

OCR via a vision-capable model. Uses the same message format as vision, optimized for text extraction from images.

Models: deepseek-ocr-v1, deepseek-ocr-v2

Generate vector embeddings for text.

FieldTypeRequiredDescription
inputstring/arrayYesText or array of texts to embed
encoding_formatstringNofloat or base64
dimensionsnumberNoOutput dimensions

Output: data[0].embedding (float array)

Models: qwen3-0.6b-embed, gpt-oss-20b-embed

Rerank documents by relevance to a query.

FieldTypeRequiredDescription
modelstringYesModel ID
querystringYesSearch query
documentsarrayYesArray of strings or {text} objects
top_nnumberNoNumber of top results to return
return_documentsbooleanNoInclude document text in results

Output: Ranked documents with relevance scores

Models: qwen3-0.6b-score, gpt-oss-20b-score

Compute similarity scores between text pairs.

FieldTypeRequiredDescription
modelstringYesModel ID
text_1string/arrayYesFirst text(s)
text_2string/arrayYesSecond text(s)

Output: Similarity scores

Models: qwen3-0.6b-score, gpt-oss-20b-score

Convert text to spoken audio.

FieldTypeRequiredDescription
inputstringYesText to speak
voicestringNoVoice ID (model-specific)
response_formatstringNomp3, opus, aac, flac, wav, pcm
speednumberNoPlayback speed multiplier

Output: audio_url (audio file URL)

Models: qwen3-tts (9 voices), fox-tts (150+ voices)

Transcribe audio to text.

FieldTypeRequiredDescription
audio_urlstringYesURL to the audio file
languagestringNoLanguage code (ISO 639-1)
taskstringNotranscribe or translate

Also supports multipart file upload with a file field.

Output: text (transcribed text)

Models: whisper-large-v3

Same as openai/audio-transcription but using the Fal request format.

FieldTypeRequiredDescription
audio_urlstringYesURL to the audio file
languagestringNoLanguage code
taskstringNotranscribe or translate

Output: text

Models: whisper-large-v3

Generate images using the OpenAI-compatible format.

FieldTypeRequiredDescription
promptstringYesImage description
nnumberNoNumber of images
sizestringNoImage dimensions
qualitystringNostandard or hd
response_formatstringNourl or b64_json

Output: data[0].url (image URL)

Models: qwen-image-2512

Generate images using the Fal format. More parameters than the OpenAI format.

FieldTypeRequiredDescription
promptstringYesImage description
negative_promptstringNoWhat to avoid
image_sizestringNosquare_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9
num_inference_stepsnumberNoDenoising steps
guidance_scalenumberNoPrompt adherence strength
num_imagesnumberNoNumber of images (max 10)
seednumberNoReproducibility seed
lorasarrayNoLoRA configs [{path, scale}] (scale 0-4)
enable_safety_checkerbooleanNoContent safety filter
output_formatstringNoOutput image format

Output: images[0].url (image URL)

Models: nunchaku-flux1-schnell, sglang-diffusion-flux2-klein-4b, qwen-image-2512, qwen-image-2512-lightx2v-fp8, sglang-diffusion-qwen-image-2512-fp8

Edit or inpaint images.

FieldTypeRequiredDescription
promptstringYesEdit instruction
image_urlstringYesSource image URL
mask_urlstringNoMask for inpainting
strengthnumberNoEdit strength (0-1)
num_inference_stepsnumberNoDenoising steps
guidance_scalenumberNoPrompt adherence
num_imagesnumberNoNumber of results (max 10)
seednumberNoReproducibility seed
lorasarrayNoLoRA configs

Output: images[0].url (image URL)

Models: nunchaku-flux1-schnell, sglang-diffusion-flux2-klein-4b, qwen-image-edit-2511, qwen-image-edit-2511-lightx2v-fp8, sglang-diffusion-qwen-image-edit-2511-fp8

All video tasks are async only — they return a request_id for polling.

Generate video from a text prompt.

FieldTypeRequiredDescription
promptstringYesVideo description
resolutionstringNo480p, 720p
aspect_ratiostringNo16:9, 9:16, 1:1
num_inference_stepsnumberNoDenoising steps
guidance_scalenumberNoPrompt adherence
num_framesnumberNoNumber of frames
fpsnumberNoFrames per second
seednumberNoReproducibility seed
output_formatstringNoVideo format

Output: video.url (video URL)

Models: wan22-ti2v, sglang-diffusion-wan22-t2v-a14b-fp8, ltx2-distilled

Animate a still image into video.

FieldTypeRequiredDescription
promptstringYesMotion description
image_urlstringYesSource image URL
resolutionstringNo480p, 720p
aspect_ratiostringNo16:9, 9:16, 1:1
num_inference_stepsnumberNoDenoising steps
guidance_scalenumberNoPrompt adherence
num_framesnumberNoNumber of frames
fpsnumberNoFrames per second
seednumberNoReproducibility seed

Output: video.url (video URL)

Models: wan22-ti2v, ltx2-distilled

Generate talking-head video driven by audio.

FieldTypeRequiredDescription
audio_urlstringYesAudio file URL
image_urlstringYesFace/character image URL
promptstringNoAdditional scene description
num_inference_stepsnumberNoDenoising steps
guidance_scalenumberNoPrompt adherence
seednumberNoReproducibility seed

Output: video.url (video URL)

Models: wan22-s2v

Increase video frame rate (slow motion effect).

FieldTypeRequiredDescription
video_urlstringYesSource video URL
multipliernumberNoFrame rate multiplier (2-8, default 2)
scene_detectbooleanNoDetect scene changes (default true)

Output: video.url (video URL)

Upscale video resolution.

FieldTypeRequiredDescription
video_urlstringYesSource video URL
upscale_factornumberNoScale factor (1-10)
seednumberNoReproducibility seed

Output: video.url (video URL)

Task types are used as the task field in workflow DAG nodes. Each node specifies a task type and a model, and can wire outputs from upstream nodes into its inputs.

{
"nodes": [
{
"id": "generate",
"task": "fal/text-to-image",
"model": "nunchaku-flux1-schnell",
"payload": {
"prompt": "{{input.prompt}}"
}
}
]
}

See the Workflows guide for details on building multi-step pipelines.