image-to-video

stylized

transform

lipsync

Seedance 2 Image to Video

ByteDance's most advanced video generation model. Cinematic output with native audio, real-world physics, and director-level camera control. Accepts text, image, audio, and video inputs.

Try it now!See docs

text-to-image

Nano Banana 2

Nano Banana 2 is Google's new state-of-the-art fast image generation and editing model

Try it now!See docs

image-to-video

Kling Video v3 Image to Video [Pro]

Kling 3.0 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.

Try it now!See docs

image-to-video

video

happy-horse

Happy Horse

Alibaba's #1-ranked Happy Horse 1.0 — generate 1080p video with synchronized native audio and multilingual lip-sync from text prompts or images.

Try it now!See docs

image-to-video

PixVerse V6

PixVerse V6 delivers lifelike physics and striking visuals to elevate your video creation.

Try it now!See docs

Seedance 2 Image to Video

Nano Banana 2

Kling Video v3 Image to Video [Pro]

Happy Horse

PixVerse V6

Try:

Newest image to video models

Per section

Model Labs

Explore the AI labs powering models on fal

Seedance 2.0

The new sota video model by Bytedance. Access the new stunning video generation model today.

bytedance/seedance-2.0/text-to-video

text-to-video

ByteDance's most advanced text-to-video model. Cinematic output with native audio, multi-shot editing, real-world physics, and director-level camera control.

stylized

transform

lipsync

ByteDance's most advanced reference-to-video model. Generate video from up to 9 images, 3 videos, and 3 audio clips with native audio and cinematic camera control.

bytedance/seedance-2.0/reference-to-video

image-to-video

ByteDance's most advanced reference-to-video model. Generate video from up to 9 images, 3 videos, and 3 audio clips with native audio and cinematic camera control.

stylized

transform

lipsync

bytedance/seedance-2.0/fast/text-to-video

text-to-video

ByteDance's most advanced text-to-video model, fast tier. Lower latency and cost with cinematic output, native audio, multi-shot editing, and director-level camera control.

stylized

transform

lipsync

bytedance/seedance-2.0/image-to-video

image-to-video

ByteDance's most advanced image-to-video model. Animate still images into cinematic video with synchronized audio, start and end frame control, and motion prompts.

stylized

transform

lipsync

ByteDance's most advanced reference-to-video model, fast tier. Lower latency and cost with up to 9 images, 3 videos, and 3 audio clips as inputs.

bytedance/seedance-2.0/fast/reference-to-video

image-to-video

ByteDance's most advanced reference-to-video model, fast tier. Lower latency and cost with up to 9 images, 3 videos, and 3 audio clips as inputs.

stylized

transform

lipsync

bytedance/seedance-2.0/fast/image-to-video

image-to-video

ByteDance's most advanced image-to-video model, fast tier. Lower latency and cost with synchronized audio, start and end frame control, and motion prompts.

stylized

transform

lipsync

Grok Imagine

xai/grok-imagine-image/quality/edit

image-to-image

Grok Imagine Pro is an advanced AI model from xAI that creates high-quality visuals from text prompts and allows you to edit or analyze existing images.

stylized

transform

typography

xai/grok-imagine-image/quality/text-to-image

text-to-image

Grok Imagine Pro is an advanced AI model from xAI that creates high-quality visuals from text prompts and allows you to edit or analyze existing images.

stylized

transform

typography

Extend videos with xAI's Grok Imagine video model

xai/grok-imagine-video/extend-video

video-to-video

Extend videos with xAI's Grok Imagine video model

Generate speech with expressive and realistic voices from xAI

xai/grok-imagine-video/edit-video

video-to-video

Edit videos using xAI's Grok Imagine

video-edit

v2v

grok

xai/grok-imagine-video/reference-to-video

image-to-video

Generate videos using multiple reference images with xAI's Grok Imagine video model

video-edit

v2v

grok

xai/grok-imagine-video/text-to-video

text-to-video

Generate videos with audio from text using Grok Imagine Video.

xai

grok

t2v

xai/grok-imagine-video/image-to-video

image-to-video

Generate videos from images with audio using xAI's Grok Imagine Video model.

grok

xai

i2v

Generate highly aesthetic images with xAI's Grok Imagine Image generation model.

xai/grok-imagine-image

text-to-image

Generate highly aesthetic images with xAI's Grok Imagine Image generation model.

xai

grok

xai/grok-imagine-image/edit

image-to-image

Edit images precisely with xAI's Grok Imagine model

xai/grok-imagine-video/v1.5/image-to-video

image-to-video

Generate videos from images with audio using xAI's Grok Imagine 1.5 Video model.

stylized

transform

lipsync

New and Noteworthy

State-of-the-art models we think you'll love!

Image editing endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent image editing with multiple inputs.

bytedance/seedream/v5/lite/edit

image-to-image

Image editing endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent image editing with multiple inputs.

Nano Banana 2 is Google's new state-of-the-art image generation and editing model

kling-video/v3/pro/image-to-video

image-to-video

Kling 3.0 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.

Recently Added

Newly added models across image, video, audio, and more.

Happy Horse 1.1 is Alibaba's #1-ranked video model. This image-to-video endpoint animates a still image into 1080p video with synchronized native audio and multilingual lip-sync

new

alibaba/happy-horse/v1.1/image-to-video

image-to-video

Happy Horse 1.1 is Alibaba's #1-ranked video model. This image-to-video endpoint animates a still image into 1080p video with synchronized native audio and multilingual lip-sync

alibaba/happy-horse/v1.1/reference-to-video

image-to-video

Happy Horse 1.1 is Alibaba's #1-ranked video model. This reference-to-video endpoint turns up to 9 reference images into 1080p video with synchronized native audio and multilingual lip-sync for consistent characters.

alibaba/happy-horse/v1.1/text-to-video

text-to-video

Happy Horse 1.1 is Alibaba's #1-ranked video model. This text-to-video endpoint generates 1080p video with synchronized native audio and multilingual lip-sync from a text prompt alone.

hyper3d/rodin/v2.5/text-to-3d/fast

text-to-3d

Rodin V2.5 by Hyper3D generates realistic and production ready 3D models from text or images. Do fast prototyping using the fast model.

new

hyper3d/rodin/v2.5/fast

image-to-3d

Rodin V2.5 by Hyper3D generates realistic and production ready 3D models from text or images. Do fast prototyping using the fast model.

Generate high-quality video with audio from reference, character sheet, storyboard using LTX-2.3

new

ltx-2.3-quality/ingredient

image-to-video

Generate high-quality video with audio from reference, character sheet, storyboard using LTX-2.3

SCAIL-2 is an end-to-end character animation model that drives a reference character from a source video without relying on intermediate pose representations like skeleton maps.

Generate professional-quality voiceovers in seconds with Async TTS Pro model text-based control over pauses, emphasis, and timing. Voice ids can be found at https://async.com/developer/voice-library

voice-clone

lipsync

Inpaint high-quality video using LTX-2.3 with lora

new

ltx-2.3-quality/inpaint/lora

video-to-video

Inpaint high-quality video using LTX-2.3 with lora

inpaint

new

ltx-2.3-quality/inpaint

video-to-video

Inpaint high-quality video using LTX-2.3

new

boogu-image

text-to-image

Text To Image Model using Boogu-Image

new

boogu-image/edit

image-to-image

Image To Image Model using Boogu-Image

Generate Infographic Image with Sensenova U1

new

sensenova-u1-infographic

text-to-image

Generate Infographic Image with Sensenova U1

Train a LoRA that regenerates only the masked region of a video, guided by both the kept pixels and a separate reference/control video.

new

ltx23-trainer-v2/v2v-masked

training

Train a LoRA that regenerates only the masked region of a video, guided by both the kept pixels and a separate reference/control video.

Train a LoRA that learns a video-to-video transformation from paired before/after clips, steered at inference by a reference (control) video.

new

ltx23-trainer-v2/v2v

training

Train a LoRA that learns a video-to-video transformation from paired before/after clips, steered at inference by a reference (control) video.

Train a LoRA that generates audio (foley / sound design) for a silent video, learning a soundtrack that matches the on-screen action.

new

ltx23-trainer-v2/v2a

training

Train a LoRA that generates audio (foley / sound design) for a silent video, learning a soundtrack that matches the on-screen action.

Fine-tune LTX 2.3 on your own clips to teach it a new subject, character, object, or visual style, then generate full videos from a text prompt.

new

ltx23-trainer-v2/t2v

training

Fine-tune LTX 2.3 on your own clips to teach it a new subject, character, object, or visual style, then generate full videos from a text prompt.

Train a LoRA that generates audio from a text prompt — the audio counterpart of text-to-video — learning a sound or style from your clips.

new

ltx23-trainer-v2/t2a

training

Train a LoRA that generates audio from a text prompt — the audio counterpart of text-to-video — learning a sound or style from your clips.

Train a LoRA that expands the video frame outward, keeping an inner rectangle fixed and generating the surrounding region.

new

ltx23-trainer-v2/outpaint

training

Train a LoRA that expands the video frame outward, keeping an inner rectangle fixed and generating the surrounding region.

Train a LoRA that generates the video between keyframes — supply first/last (and optional middle) frames at inference and the model fills the in-between motion.

new

ltx23-trainer-v2/interpolate

training

Train a LoRA that generates the video between keyframes — supply first/last (and optional middle) frames at inference and the model fills the in-between motion.

Train a LoRA that regenerates a masked region of a video while keeping the rest unchanged, blending the new content with its surroundings.

new

ltx23-trainer-v2/inpaint

training

Train a LoRA that regenerates a masked region of a video while keeping the rest unchanged, blending the new content with its surroundings.

Train an IC-LoRA that regenerates only the masked region of a video, guided by the kept pixels and a separate reference/control video.

new

ltx23-trainer-v2/ic-lora/v2v-masked

training

Train an IC-LoRA that regenerates only the masked region of a video, guided by the kept pixels and a separate reference/control video.

Train an IC-LoRA that learns a video-to-video transformation from paired before/after clips, conditioned at inference on a reference (control) video.

new

ltx23-trainer-v2/ic-lora/v2v

training

Train an IC-LoRA that learns a video-to-video transformation from paired before/after clips, conditioned at inference on a reference (control) video.

Train an IC-LoRA that regenerates a masked video region (guided by kept pixels and a video reference) while jointly generating audio from an audio reference.

new

ltx23-trainer-v2/ic-lora/av2av-masked

training

Train an IC-LoRA that regenerates a masked video region (guided by kept pixels and a video reference) while jointly generating audio from an audio reference.

Train an IC-LoRA for a joint audio+video transformation, conditioned on a reference clip's video and audio to produce a matching target.

new

ltx23-trainer-v2/ic-lora/av2av

training

Train an IC-LoRA for a joint audio+video transformation, conditioned on a reference clip's video and audio to produce a matching target.

Train an IC-LoRA that transforms one audio clip into another, conditioned at inference on a reference audio clip.

new

ltx23-trainer-v2/ic-lora/a2a

training

Train an IC-LoRA that transforms one audio clip into another, conditioned at inference on a reference audio clip.

Fine-tune LTX 2.3 to animate a starting image — supply a still plus a prompt at inference and the model generates a video that begins from that frame.

new

ltx23-trainer-v2/i2v

training

Fine-tune LTX 2.3 to animate a starting image — supply a still plus a prompt at inference and the model generates a video that begins from that frame.

Train a LoRA that generates the lead-in to a video, extending a clip backward in time from its ending.

new

ltx23-trainer-v2/extend-suffix

training

Train a LoRA that generates the lead-in to a video, extending a clip backward in time from its ending.

Best AI Image Generators

Unlock the future of creativity with these text to image, AI image generator models.

FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.

flux/schnell

text-to-image

FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.

nano-banana-2

text-to-image

Nano Banana 2 is Google's new state-of-the-art fast image generation and editing model

GPT Image 2, OpenAI's latest image model, is capable of creating extremely detailed images with fine typography.

openai/gpt-image-2

text-to-image

GPT Image 2, OpenAI's latest image model, is capable of creating extremely detailed images with fine typography.

FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.

nano-banana-pro

text-to-image

Nano Banana Pro is Google's new state-of-the-art image generation and editing model

Image editing with FLUX.2 [pro] from Black Forest Labs. Ideal for high-quality image manipulation, style transfer, and sequential editing workflows

FLUX1.1 [pro] is an enhanced version of FLUX.1 [pro], improved image generation capabilities, delivering superior composition, detail, and artistic fidelity compared to its predecessor.

flux-pro/v1.1

text-to-image

FLUX1.1 [pro] is an enhanced version of FLUX.1 [pro], improved image generation capabilities, delivering superior composition, detail, and artistic fidelity compared to its predecessor.

Google's famous original image generation and editing model

nano-banana

text-to-image

Google's famous original image generation and editing model

image-generation

FLUX1.1 [pro] ultra is the newest version of FLUX1.1 [pro], maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.

flux-pro/v1.1-ultra

text-to-image

FLUX1.1 [pro] ultra is the newest version of FLUX1.1 [pro], maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.

xai/grok-imagine-image

text-to-image

Generate highly aesthetic images with xAI's Grok Imagine Image generation model.

Text-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

A new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture.

bytedance/seedream/v4.5/text-to-image

text-to-image

A new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture.

Z-Image Turbo is a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

Generate high-quality images, posters, and logos with Ideogram V3. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.

realism

typography

A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.

bytedance/seedream/v4/text-to-image

text-to-image

A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

bytedance/seedream/v5/lite/text-to-image

text-to-image

Text to Image endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent text-to-image generation.

GPT Image 1.5 generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail.

Run SDXL at the speed of light

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities—all at turbo speed.

Generate high-fidelity images from text with Krea 2 Large, supporting aspect ratio, creativity, seed controls, and optional style references.

new

krea/v2/large/text-to-image

text-to-image

Generate high-fidelity images from text with Krea 2 Large, supporting aspect ratio, creativity, seed controls, and optional style references.

image-generation

style-reference

krea

recraft/v3/text-to-image

text-to-image

Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.

Generate high-quality images, posters, and logos with Ideogram's latest V4.0q — producing crisp visuals with accurate text rendering, fine detail, and full creative control for polished, ready-to-use designs.

realism

typography

stylized

Gemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model

gemini-3-pro-image-preview

text-to-image

Gemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities— in a flash.

flux-1/schnell

text-to-image

Fastest inference in the world for the 12 billion parameter FLUX.1 [schnell] text-to-image model.

Google's famous original image generation and editing model, a.k.a Nano Banana

gemini-25-flash-image

text-to-image

Google's famous original image generation and editing model, a.k.a Nano Banana

Best Image Editing Models

The fan favorite best image editing models on the market

nano-banana-pro/edit

image-to-image

Nano Banana Pro is Google's new state-of-the-art image generation and editing model

Reve’s edit model lets you upload an existing image and then transform it via a text prompt

High-fidelity image editing model with state-of-the-art controllability. Combines JSON + Mask + Image for precise, fine-grained edits ideal for production and enterprise workflows. Trained on licensed data - safe for commercial use.

bria/fibo-edit/edit

image-to-image

High-fidelity image editing model with state-of-the-art controllability. Combines JSON + Mask + Image for precise, fine-grained edits ideal for production and enterprise workflows. Trained on licensed data - safe for commercial use.

bria

fibo-edit

image-editing

bytedance/seedream/v4/edit

image-to-image

A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.

FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.

flux-kontext-lora

image-to-image

Fast endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image editing using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.

image-editing

bria/fibo/generate

text-to-image

SOTA open-source text-to-image model delivering high-fidelity outputs with accurate typography. JSON-structured prompts provide production-ready controllability for enterprise and agentic workflows. Trained exclusively on licensed data.

bria

fibo

prompt-adherence

Best of Open Source

Some of our favorite open source media models

flux-kontext-trainer

training

LoRA trainer for FLUX.1 Kontext [dev]

Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA

ltx-video-13b-distilled/image-to-video

image-to-video

Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA

Wan 2.2 text to image LoRA trainer. Fine-tune Wan 2.2 for subjects and styles with unprecedented detail.

wan/v2.2-a14b/image-to-video/lora

image-to-video

Wan-2.2 image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and images. This endpoint supports LoRAs made for Wan 2.2

Qwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing.

flux-krea-lora/stream

text-to-image

lora

personalization

Text To Speech APIs

Create lifelike speech with our AI text to speech APIs

Newest audio model from Google introduces granular audio tags that give you precise control to direct AI speech for expressive audio generation.

gemini-3.1-flash-tts

text-to-speech

Newest audio model from Google introduces granular audio tags that give you precise control to direct AI speech for expressive audio generation.

lipsync

avatar

minimax/speech-2.8-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.8 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

minimax/speech-2.8-turbo

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.8 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.

xai/tts/v1

text-to-speech

Generate speech with expressive and realistic voices from xAI

Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model

qwen-3-tts/text-to-speech/1.7b

text-to-speech

Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model

inworld-tts

text-to-speech

Text to Speech Endpoint for Inworld's TTS-1.5 Max.

High-quality voice cloning TTS model that generates 48kHz speech from text and a reference audio. Distilled to 4 steps for fast inference.

tts

voice-cloning

speech-synthesis

elevenlabs/tts/turbo-v2.5

text-to-speech

Generate high-speed text-to-speech audio using ElevenLabs TTS Turbo v2.5.

audio

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

chatterbox/text-to-speech

text-to-speech

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.

maya/batch

text-to-speech

Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.

tts

minimax/speech-02-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

speech

minimax/voice-clone

text-to-speech

Clone a voice from a sample audio and generate speech from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality text-to-speech.

speech

minimax/speech-02-turbo

text-to-speech

Generate fast speech from text prompts and different voices using the MiniMax Speech-02 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.

speech

minimax/speech-2.6-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

Generate natural, clear speeches using Index TTS 2.0 from IndexTeam

index-tts-2/text-to-speech

text-to-speech

Generate natural, clear speeches using Index TTS 2.0 from IndexTeam

minimax/speech-2.6-turbo

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

Generate long, expressive multi-voice speech using Microsoft's powerful TTS

vibevoice/7b

text-to-speech

Generate long, expressive multi-voice speech using Microsoft's powerful TTS

Generate speech from text prompts and different voices using the Kling TTS model, which leverages advanced AI techniques to create high-quality text-to-speech.

audio

Create custom voices using Qwen3-TTS Voice Design model and later use Clone Voice model to create your own voices!

qwen-3-tts/voice-design/1.7b

text-to-speech

Create custom voices using Qwen3-TTS Voice Design model and later use Clone Voice model to create your own voices!

voice-design

qwen-3-tts/text-to-speech/0.6b

text-to-speech

Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model

AI Image Generator APIs

Generate a variety of stunning images using our AI Image Generator APIs

flux/schnell

text-to-image

FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.

nano-banana-2

text-to-image

Nano Banana 2 is Google's new state-of-the-art fast image generation and editing model

openai/gpt-image-2

text-to-image

GPT Image 2, OpenAI's latest image model, is capable of creating extremely detailed images with fine typography.

FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.

nano-banana-pro

text-to-image

Nano Banana Pro is Google's new state-of-the-art image generation and editing model

Image editing with FLUX.2 [pro] from Black Forest Labs. Ideal for high-quality image manipulation, style transfer, and sequential editing workflows

flux-pro/v1.1

text-to-image

FLUX1.1 [pro] is an enhanced version of FLUX.1 [pro], improved image generation capabilities, delivering superior composition, detail, and artistic fidelity compared to its predecessor.

nano-banana

text-to-image

Google's famous original image generation and editing model

image-generation

flux-pro/v1.1-ultra

text-to-image

FLUX1.1 [pro] ultra is the newest version of FLUX1.1 [pro], maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.

xai/grok-imagine-image

text-to-image

Generate highly aesthetic images with xAI's Grok Imagine Image generation model.

Text-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

bytedance/seedream/v4.5/text-to-image

text-to-image

A new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture.

Z-Image Turbo is a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

Generate high-quality images, posters, and logos with Ideogram V3. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.

realism

typography

bytedance/seedream/v4/text-to-image

text-to-image

A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

bytedance/seedream/v5/lite/text-to-image

text-to-image

Text to Image endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent text-to-image generation.

GPT Image 1.5 generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail.

Run SDXL at the speed of light

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities—all at turbo speed.

new

krea/v2/large/text-to-image

text-to-image

Generate high-fidelity images from text with Krea 2 Large, supporting aspect ratio, creativity, seed controls, and optional style references.

image-generation

style-reference

krea

recraft/v3/text-to-image

gemini-3-pro-image-preview

text-to-image

Gemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities— in a flash.

flux-1/schnell

text-to-image

Fastest inference in the world for the 12 billion parameter FLUX.1 [schnell] text-to-image model.

gemini-25-flash-image

text-to-image

Google's famous original image generation and editing model, a.k.a Nano Banana

Text to Video APIs

Access the top Text to Video APIs with lightning fast inference speeds

bytedance/seedance-2.0/text-to-video

text-to-video

ByteDance's most advanced text-to-video model. Cinematic output with native audio, multi-shot editing, real-world physics, and director-level camera control.

stylized

transform

lipsync

kling-video/v3/pro/text-to-video

text-to-video

Kling 3.0 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.

bytedance/seedance-2.0/fast/text-to-video

text-to-video

ByteDance's most advanced text-to-video model, fast tier. Lower latency and cost with cinematic output, native audio, multi-shot editing, and director-level camera control.

Veo 3.1 by Google, the most advanced AI video generation model in the world. With sound on!

xai/grok-imagine-video/text-to-video

text-to-video

Generate videos with audio from text using Grok Imagine Video.

Faster and more cost effective version of Google's Veo 3.1!

kling-video/v2.5-turbo/pro/text-to-video

text-to-video

Kling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

animation

stylized

kling-video/v3/standard/text-to-video

text-to-video

Kling 3.0 Standard: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.

sora-2/text-to-video

text-to-video

Text-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

text to video

audio

sora

kling-video/v2.6/pro/text-to-video

text-to-video

Kling 2.6 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation.

Generate videos with audio with Seedance 1.5

bytedance/seedance/v1.5/pro/text-to-video

text-to-video

Generate videos with audio with Seedance 1.5

bytedance

seedance

audio

Generate video clips from your prompts using Kling 1.6 (std)

kling-video/v1.6/standard/text-to-video

text-to-video

Generate video clips from your prompts using Kling 1.6 (std)

veo3.1/lite

text-to-video

Veo 3.1 Lite balances practical utility with professional capabilities, supporting Text-to-Video and Image-to-Video

stylized

transform

lipsync

Generate 1080p video with synchronized native audio from a text prompt. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s.

alibaba/happy-horse/text-to-video

text-to-video

Generate 1080p video with synchronized native audio from a text prompt. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s.

happy-horse

veo3

text-to-video

Veo 3 by Google, the most advanced AI video generation model in the world. With sound on!

Faster and more cost effective version of Google's Veo 3!

veo3/fast

text-to-video

Faster and more cost effective version of Google's Veo 3!

Generate realistic videos using Kling O3 from Kling Team!

kling-video/o3/pro/text-to-video

text-to-video

Generate realistic videos using Kling O3 from Kling Team!

Seedance 1.0 Pro, a high quality video generation model developed by Bytedance.

bytedance/seedance/v1/pro/text-to-video

text-to-video

Seedance 1.0 Pro, a high quality video generation model developed by Bytedance.

sora-2/text-to-video/pro

text-to-video

Text-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

audio

sora-2-pro

Wan 2.7 is the latest generation AI video model, delivering enhanced motion smoothness, superior scene fidelity, and greater visual coherence.

wan/v2.7/text-to-video

text-to-video

Wan 2.7 is the latest generation AI video model, delivering enhanced motion smoothness, superior scene fidelity, and greater visual coherence.

stylized

transform

lipsync

kling-video/lipsync/audio-to-video

text-to-video

Kling LipSync is an audio-to-video model that generates realistic lip movements from audio input.

audio to video

lipsync

kling-video/o3/standard/text-to-video

text-to-video

Generate realistic videos using Kling O3 from Kling Team!

pixverse/v6/text-to-video

text-to-video

Pixverse's latest v6 Model.

minimax/hailuo-02/standard/text-to-video

text-to-video

MiniMax Hailuo-02 Text To Video API (Standard, 768p): Advanced video generation model with 768p resolution

wan-25-preview/text-to-video

text-to-video

Wan 2.5 text-to-video model.

ltx-2.3/text-to-video/fast

text-to-video

LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.

stylized

transform

lipsync

wan/v2.2-a14b/text-to-video

text-to-video

Wan-2.2 text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts.

Generate videos from prompts using LTX Video

Image to Video APIs

kling-video/v3/pro/image-to-video

image-to-video

Kling 3.0 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.

bytedance/seedance-2.0/image-to-video

image-to-video

ByteDance's most advanced image-to-video model. Animate still images into cinematic video with synchronized audio, start and end frame control, and motion prompts.

stylized

transform

lipsync

bytedance/seedance-2.0/reference-to-video

image-to-video

ByteDance's most advanced reference-to-video model. Generate video from up to 9 images, 3 videos, and 3 audio clips with native audio and cinematic camera control.

stylized

transform

lipsync

kling-video/v2.5-turbo/pro/image-to-video

image-to-video

Kling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

stylized

transform

kling-video/v3/standard/image-to-video

image-to-video

Kling 3.0 Standard: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.

kling-video/v2.6/pro/image-to-video

image-to-video

Kling 2.6 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation.

xai/grok-imagine-video/image-to-video

image-to-video

Generate videos from images with audio using xAI's Grok Imagine Video model.

grok

xai

i2v

Generate videos from your image prompts using Veo 3.1 fast.

veo3.1/fast/image-to-video

image-to-video

Generate videos from your image prompts using Veo 3.1 fast.

bytedance/seedance-2.0/fast/image-to-video

image-to-video

ByteDance's most advanced image-to-video model, fast tier. Lower latency and cost with synchronized audio, start and end frame control, and motion prompts.

stylized

transform

lipsync

bytedance/seedance-2.0/fast/reference-to-video

image-to-video

ByteDance's most advanced reference-to-video model, fast tier. Lower latency and cost with up to 9 images, 3 videos, and 3 audio clips as inputs.

stylized

transform

lipsync

Generate videos with audio with Seedance 1.5 (supports start & end frame)

bytedance/seedance/v1.5/pro/image-to-video

image-to-video

Generate videos with audio with Seedance 1.5 (supports start & end frame)

bytedance

seedance

audio

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

veo3.1/image-to-video

image-to-video

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

kling-video/v2.1/standard/image-to-video

image-to-video

Kling 2.1 Standard is a cost-efficient endpoint for the Kling 2.1 model, delivering high-quality image-to-video generation

bytedance/seedance/v1/pro/image-to-video

image-to-video

Seedance 1.0 Pro, a high quality video generation model developed by Bytedance.

new

xai/grok-imagine-video/v1.5/image-to-video

image-to-video

Generate videos from images with audio using xAI's Grok Imagine 1.5 Video model.

stylized

transform

lipsync

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

kling-video/o3/pro/image-to-video

image-to-video

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

veo3.1/lite/image-to-video

image-to-video

Veo 3.1 Lite balances practical utility with professional capabilities, supporting Text-to-Video and Image-to-Video

stylized

transform

lipsync

Generate video clips from your images using Kling 1.6 (std)

kling-video/v1.6/standard/image-to-video

image-to-video

Generate video clips from your images using Kling 1.6 (std)

kling-video/o3/standard/image-to-video

image-to-video

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

minimax/hailuo-02/standard/image-to-video

image-to-video

MiniMax Hailuo-02 Image To Video API (Standard, 768p, 512p): Advanced image-to-video generation model with 768p and 512p resolutions

Alibaba's #1-ranked Happy Horse 1.0 — generate 1080p video with synchronized native audio and multilingual lip-sync from text prompts or images.

alibaba/happy-horse/image-to-video

image-to-video

Alibaba's #1-ranked Happy Horse 1.0 — generate 1080p video with synchronized native audio and multilingual lip-sync from text prompts or images.

video

happy-horse

kling-video/o3/pro/reference-to-video

image-to-video

Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.

reference-to-video

Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.

kling-video/v2.1/pro/image-to-video

image-to-video

Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.

sora-2/image-to-video

image-to-video

Image-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video

lipsync

avatar

wan/v2.7/image-to-video

image-to-video

Wan 2.7 is the latest generation AI video model, delivering enhanced motion smoothness, superior scene fidelity, and greater visual coherence.

stylized

transform

lipsync

kling-video/v2.5-turbo/standard/image-to-video

image-to-video

Kling 2.5 Turbo Standard: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

stylized

transform

Omnihuman v1.5 is a new and improved version of Omnihuman. It generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

bytedance/omnihuman/v1.5

image-to-video

Omnihuman v1.5 is a new and improved version of Omnihuman. It generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

lipsync

Best Image Models

Top-performing models for high-quality image generation and editing.

openai/gpt-image-2

text-to-image

GPT Image 2, OpenAI's latest image model, is capable of creating extremely detailed images with fine typography.

Nano Banana Pro is Google's new state-of-the-art image generation and editing model

Nano Banana 2 is Google's new state-of-the-art fast image generation and editing model

recraft/v4/pro/text-to-image

text-to-image

Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy — delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing.

imagineart/imagineart-2.0-preview/text-to-image

text-to-image

ImagineArt 2.0 is ImagineArt's latest state-of-the-art visual reasoning text-to-image model, generating high-fidelity, professional-grade visuals with lifelike realism, cinematic effects, and strong aesthetic quality.

FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.

flux-krea-lora/stream

text-to-image

lora

personalization

recraft/v3/text-to-image

Background Remover APIs

Find the API of your choice to remove a background from your image or video

Pixelcut’s Background Remover enables fast, ultra high-quality removal of backgrounds from images. Perfect for e-commerce and image editing workflows. Powered by advanced AI for clean, perfect cutouts every time.

pixelcut/background-removal

image-to-image

Pixelcut’s Background Remover enables fast, ultra high-quality removal of backgrounds from images. Perfect for e-commerce and image editing workflows. Powered by advanced AI for clean, perfect cutouts every time.

background removal

utility

remove background

ideogram/remove-background

image-to-image

Remove backgrounds from existing images with Ideogram's remove background feature. Isolate subjects cleanly for compositing and creative reuse.

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

birefnet/v2

image-to-image

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

bria/video/background-removal/v3

video-to-video

Remove backgrounds from any video with Bria's VRMBG 3.0. Fast, accurate background removal across talking heads, podcasts, product videos, commercials, and cinematic footage.

imageutils/rembg

image-to-image

Remove the background from an image.

background removal

utility

editing

bria/background/remove

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

background removal

segmentation

high-res

Remove background from videos filmed using chromakey, with automatic green spill suppression for clean, professional edges.

veed/video-background-removal/green-screen

video-to-video

Remove background from videos filmed using chromakey, with automatic green spill suppression for clean, professional edges.

Remove background from any video with people and objects. No green screen needed.

veed/video-background-removal/fast

video-to-video

Remove background from any video with people and objects. No green screen needed.

veed/video-background-removal

video-to-video

Remove background from any video with people and objects. No green screen needed.

Veo 3.1

veo3.1/fast/first-last-frame-to-video

image-to-video

Generate videos from a first/last frame using Google's Veo 3.1 Fast

Generate videos from a first and last framed using Google's Veo 3.1

veo3.1/first-last-frame-to-video

image-to-video

Generate videos from a first and last framed using Google's Veo 3.1

veo3.1/fast

text-to-video

Faster and more cost effective version of Google's Veo 3.1!

veo3.1/fast/image-to-video

image-to-video

Generate videos from your image prompts using Veo 3.1 fast.

veo3.1/image-to-video

image-to-video

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

veo3.1

text-to-video

Veo 3.1 by Google, the most advanced AI video generation model in the world. With sound on!

Generate Videos from images using Google's Veo 3.1

veo3.1/reference-to-video

image-to-video

Generate Videos from images using Google's Veo 3.1

Marquee Video Models

Flagship video generation models known for top-tier quality, motion control, and cinematic results.

kling-video/o3/standard/image-to-video

image-to-video

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

kling-video/v2.5-turbo/pro/image-to-video

image-to-video

Kling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

stylized

transform

pixverse/v6/image-to-video

image-to-video

Pixverse's latest V6 Model

kling-video/v2.5-turbo/pro/text-to-video

text-to-video

Kling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

animation

stylized

decart/lucy-14b/image-to-video

image-to-video

deprecated

Lucy-14B delivers lightning fast performance that redefines what's possible with image-to-video AI

kling-video/v2.1/pro/image-to-video

image-to-video

minimax/hailuo-02/standard/image-to-video

image-to-video

MiniMax Hailuo-02 Image To Video API (Standard, 768p, 512p): Advanced image-to-video generation model with 768p and 512p resolutions

wan/v2.2-a14b/image-to-video

image-to-video

fal-ai/wan/v2.2-A14B/image-to-video

veo3.1/image-to-video

image-to-video

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

Generate video with audio from images using LTX-2

ltx-2-19b/image-to-video

image-to-video

Generate video with audio from images using LTX-2

Best Avatar Models

Top models for generating talking avatars, lip-sync videos, and expressive character performances.

creatify/aurora

image-to-video

Generate high fidelity, studio quality videos of your avatar speaking or singing using the Aurora from Creatify team!

lipsync

veed/fabric-1.0

image-to-video

VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video

lipsync

avatar

heygen/avatar5/digital-twin

text-to-video

Create natural HeyGen Avatar V digital twin videos from text or audio, with lip-sync, optional backgrounds, captions, and MP4/WebM output.

sync-3 most powerful lipsync model yet, featuring native visual intelligence for professional-quality video.

stylized

transform

lipsync

bytedance/omnihuman/v1.5

image-to-video

lipsync

ai-avatar/single-text

image-to-video

MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with Sync Lipsync 2.0 model

animation

lip sync

kling-video/v2.1/master/image-to-video

image-to-video

Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

pixverse/lipsync

video-to-video

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with PixVerse Lipsync model

animation

lip sync

kling-video/v1/pro/ai-avatar

image-to-video

Kling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

stylized

transform

Audio Models

Models for speech, music, sound effects, and audio generation across a wide range of use cases.

inworld-tts

text-to-speech

Text to Speech Endpoint for Inworld's TTS-1.5 Max.

inworld

tts

chatterbox/text-to-speech

text-to-speech

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.

playai/tts/dialog

text-to-audio

deprecated

Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.

audio

minimax/speech-02-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

speech

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

dia-tts/voice-clone

audio-to-audio

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

speech

Generate synced sounds for any video, and return the new sound track (like MMAudio)

mirelo-ai/sfx-v1/video-to-audio

video-to-audio

Generate synced sounds for any video, and return the new sound track (like MMAudio)

sfx

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

mirelo-ai/sfx-v1/video-to-video

video-to-video

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

sfx

Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.

beatoven/music-generation

text-to-audio

deprecated

Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.

speech

audio

music

Create professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content.

beatoven/sound-effect-generation

text-to-audio

Create professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content.

sfx

audio

effects

minimax/speech-2.8-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.8 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

minimax/speech-2.8-turbo

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.8 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.

qwen-3-tts/text-to-speech/1.7b

text-to-speech

Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model

minimax/speech-02-turbo

text-to-speech

Generate fast speech from text prompts and different voices using the MiniMax Speech-02 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.

speech

minimax/speech-2.6-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

index-tts-2/text-to-speech

text-to-speech

Generate natural, clear speeches using Index TTS 2.0 from IndexTeam

minimax/speech-2.6-turbo

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

minimax/preview/speech-2.5-hd

text-to-speech

Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

speech

chatterbox/text-to-speech/multilingual

text-to-speech

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

multilingual

Generate expressive, natural speech with Resemble AI's Chatterbox. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.

resemble-ai/chatterboxhd/text-to-speech

text-to-speech

Generate expressive, natural speech with Resemble AI's Chatterbox. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.

qwen-3-tts/text-to-speech/0.6b

text-to-speech

Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model

minimax/preview/speech-2.5-turbo

text-to-speech

Generate fast speech from text prompts and different voices using the MiniMax Speech-02 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.

Generate music from text prompts using the MiniMax Music 2.0 model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

minimax-music/v2

text-to-audio

Generate music from text prompts using the MiniMax Music 2.0 model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

Generate high quality, realistic music with fine controls using Elevenlabs Music!

music

text-to-music

CassetteAI’s model generates a 30-second sample in under 2 seconds and a full 3-minute track in under 10 seconds. At 44.1 kHz stereo audio, expect a level of professional consistency with no breaks, no squeaks, and no random interruptions in your creations.

cassetteai/music-generator

text-to-audio

CassetteAI’s model generates a 30-second sample in under 2 seconds and a full 3-minute track in under 10 seconds. At 44.1 kHz stereo audio, expect a level of professional consistency with no breaks, no squeaks, and no random interruptions in your creations.

MiniMax Music 2.6 creates complete tracks with singing, backing music, and detailed arrangements from lyrics and a style description.

MiniMax Music 2.5 creates complete tracks with singing, backing music, and detailed arrangements from lyrics and a style description.

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

music

minimax-music/v1.5

text-to-audio

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

music

Text to Music APIs

Everything you need to start making music with AI

lyria2

text-to-audio

Lyria 2 is Google's latest music generation model, you can generate any type of music with this model.

Generate high quality, realistic music with fine controls using Elevenlabs Music!

MiniMax Music 2.6 creates complete tracks with singing, backing music, and detailed arrangements from lyrics and a style description.

MiniMax Music 2.5 creates complete tracks with singing, backing music, and detailed arrangements from lyrics and a style description.

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

music

minimax-music/v2

text-to-audio

Generate music from text prompts using the MiniMax Music 2.0 model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

music

audio

cassetteai/music-generator

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

music

sonauto/v2/text-to-music

text-to-audio

Create full songs in any style

music

text-to-music

stable-audio-25/text-to-audio

text-to-audio

Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI

audio

ace-step

text-to-audio

Generate music with lyrics from text using ACE-Step

text-to-music

Generate music from a simple prompt using ACE-Step

ace-step/prompt-to-audio

text-to-audio

Generate music from a simple prompt using ACE-Step

YuE is a groundbreaking series of open-source foundation models designed for music generation, specifically for transforming lyrics into full songs.

music

stable-audio

text-to-audio

Open source text-to-audio model.

music

Best Lora Trainers

Training endpoints for creating and fine-tuning custom LoRA models for personalization and style adaptation.

flux-lora-portrait-trainer

training

FLUX LoRA training optimized for portrait generation, with bright highlights, excellent prompt following and highly detailed results.

LoRA trainer for FLUX.1 Kontext [dev]

Train styles, people and other subjects at blazing speeds.

flux-lora-fast-training

training

Train styles, people and other subjects at blazing speeds.

Train custom LoRAs for Wan-2.1 T2V 14B

Qwen Image LoRA training

lora

personalization

flux-2-klein-4b-base-trainer

training

deprecated

Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.

Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

flux-2-klein-9b-base-trainer

training

Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.

Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.

flux-2-trainer-v2

training

Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.

Train LoRAs on Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

z-image-trainer

training

Train LoRAs on Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

turbo

z-image

fast

qwen-image-edit-2511-trainer

training

deprecated

LoRA trainer for Qwen Image Edit 2511

Virtual Try On APIs

Virtually try on different outfits and character styles with our collection of APIs.

Realtime Try On experience with Decart Lucy 2.1 VTON

decart/lucy2-vton/realtime

video-to-video

Realtime Try On experience with Decart Lucy 2.1 VTON

Try on clothes virtually by combining person and clothing images.

image-apps-v2/virtual-try-on

image-to-image

Try on clothes virtually by combining person and clothing images.

fashion

try-on

virtual-try-on

kling/v1-5/kolors-virtual-try-on

image-to-image

Kling Kolors Virtual TryOn v1.5 is a high quality image based Try-On endpoint which can be used for commercial try on.

FASHN v1.6 delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 864x1296 resolution from both on-model and flat-lay photo references.

FASHN v1.5 delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 576x864 resolution from both on-model and flat-lay photo references.

try-on

fashion

clothing

flux-2-lora-gallery/virtual-tryon

image-to-image

Virtual clothing try-on (2 images: person + garment)

Leffa Virtual TryOn is a high quality image based Try-On endpoint which can be used for commercial try on.

Image based high quality Virtual Try-On

try-on

fashion

clothing

Image to 3D Model APIs

Run the best image-to-3D models on fal

Generate 3D models from images with Hunyuan 3D Pro

hunyuan-3d/v3.1/pro/image-to-3d

image-to-3d

Generate 3D models from images with Hunyuan 3D Pro

hunyuan

trellis-2

image-to-3d

Generate 3D models from your images using Trellis 2. A native 3D generative model enabling versatile and high-quality 3D asset creation.

image-to-3d

trellis

image-to-3d

Generate 3D models from your images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.

stylized

Transform your photos into ultra-high-resolution 3D models in seconds. Film-quality geometry with PBR textures, ready for games, e-commerce, and 3D printing.

hunyuan3d-v3/image-to-3d

image-to-3d

Transform your photos into ultra-high-resolution 3D models in seconds. Film-quality geometry with PBR textures, ready for games, e-commerce, and 3D printing.

Generate high-quality 3D models from a single image using Tripo H3.1.

tripo3d/h3.1/image-to-3d

image-to-3d

Generate high-quality 3D models from a single image using Tripo H3.1.

Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.

SAM 3D enables precise 3D reconstruction of objects from real images, while accurately reconstructing their geometry and texture.

sam-3/3d-objects

image-to-3d

SAM 3D enables precise 3D reconstruction of objects from real images, while accurately reconstructing their geometry and texture.

object

tripo3d/tripo/v2.5/image-to-3d

image-to-3d

State of the art Image to 3D Object generation. Generate 3D model from a single image!

Rodin V2.5 by Hyper3D generates realistic and production ready 3D models from text or images.

Rodin by Hyper3D generates realistic and production ready 3D models from text or images.

hyper3d/rodin

image-to-3d

Rodin by Hyper3D generates realistic and production ready 3D models from text or images.

stylized

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

hunyuan3d/v2

image-to-3d

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

stylized

Meshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models.

meshy/v6-preview/image-to-3d

image-to-3d

Meshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models.

TripoSplat is an open-source model from TripoAI / VAST AI Research that converts a single 2D image into high-quality 3D Gaussians using a novel learned density-control approach

new

tripo3d/triposplat

image-to-3d

TripoSplat is an open-source model from TripoAI / VAST AI Research that converts a single 2D image into high-quality 3D Gaussians using a novel learned density-control approach

gaussian-splat

hyper3d/rodin/v2

image-to-3d

Rodin by Hyper3D generates realistic and production ready 3D models from text or images.

text-to-3d

Generate 3D models from multiple view images using Tripo H3.1.

tripo3d/h3.1/multiview-to-3d

image-to-3d

Generate 3D models from multiple view images using Tripo H3.1.

SAM 3D allows for accurate 3D reconstruction of human body shape and position from a single image.

human

pose

meshy/v6/multi-image-to-3d

image-to-3d

Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.

Generate 3D models from multiple images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.

trellis/multi

image-to-3d

Generate 3D models from multiple images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.

stylized

hunyuan3d/v2/multi-view

image-to-3d

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

stylized

Rapidly generate 3D models from images using Hunyuan 3D.

hunyuan-3d/v3.1/rapid/image-to-3d

image-to-3d

Rapidly generate 3D models from images using Hunyuan 3D.

hunyuan

Generate 3D models from a single image using Tripo P1.

tripo3d/p1/image-to-3d

image-to-3d

Generate 3D models from a single image using Tripo P1.

3d-generation

tripo

State of the art Multiview to 3D Object generation. Generate 3D models from multiple images!

tripo3d/tripo/v2.5/multiview-to-3d

image-to-3d

State of the art Multiview to 3D Object generation. Generate 3D models from multiple images!

Pixal3D turns a single image into a high-fidelity 3D model with detailed geometry and realistic textures.

State of the art Image to 3D Object generation

new

hyper3d/rodin/v2.5/fast

image-to-3d

Rodin V2.5 by Hyper3D generates realistic and production ready 3D models from text or images. Do fast prototyping using the fast model.

Meshy-5 multi image generates realistic and production ready 3D models from multiple images.

meshy/v5/multi-image-to-3d

image-to-3d

Meshy-5 multi image generates realistic and production ready 3D models from multiple images.

multi-image-to-3d

Hunyuan World 1.0 turns a single image into a panorama or a 3D world. It creates realistic scenes from the image, allowing you to explore and view it from different angles.

hunyuan_world/image-to-world

image-to-3d

Hunyuan World 1.0 turns a single image into a panorama or a 3D world. It creates realistic scenes from the image, allowing you to explore and view it from different angles.

hunyuan3d/v2/mini/turbo

image-to-3d

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

stylized

Text to 3D Model APIs

This is our collection of the best text-to-3D model APIs available on fal.

Generate 3D models from text prompts with Hunyuan 3D Pro

hunyuan-3d/v3.1/pro/text-to-3d

text-to-3d

Generate 3D models from text prompts with Hunyuan 3D Pro

hunyuan

meshy/v6/text-to-3d

text-to-3d

Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.

meshy/v6-preview/text-to-3d

text-to-3d

Meshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models.

Generate 3D models from text descriptions using Tripo H3.1.

tripo3d/h3.1/text-to-3d

text-to-3d

Generate 3D models from text descriptions using Tripo H3.1.

3d-generation

tripo

Create detailed, fully-textured 3D models with text

hunyuan-3d/v3.1/rapid/text-to-3d

text-to-3d

Create detailed, fully-textured 3D models with text

hunyuan-motion

text-to-3d

Generate 3D human motions via text-to-generation interface of Hunyuan Motion!

motion

new

hyper3d/rodin/v2.5/text-to-3d/fast

text-to-3d

Rodin V2.5 by Hyper3D generates realistic and production ready 3D models from text or images. Do fast prototyping using the fast model.

Turn simple sketches into detailed, fully-textured 3D models. Instantly convert your concept designs into formats ready for Unity, Unreal, and Blender.

hunyuan3d-v3/text-to-3d

text-to-3d

Turn simple sketches into detailed, fully-textured 3D models. Instantly convert your concept designs into formats ready for Unity, Unreal, and Blender.

new

hyper3d/rodin/v2.5/text-to-3d

text-to-3d

Rodin V2.5 by Hyper3D generates realistic and production ready 3D models from text or images.

hunyuan-motion/fast

text-to-3d

Generate 3D human motions via text-to-generation interface of Hunyuan Motion!

motion

Generate 3D models from text descriptions using Tripo P1.

tripo3d/p1/text-to-3d

text-to-3d

Generate 3D models from text descriptions using Tripo P1.

3d-generation

tripo

Best Utility Models

Specialized models for supporting tasks like background removal, nsfw detection, upscaling and much more.

Predict whether an image is NSFW or SFW.

x-ailab/nsfw

vision

Predict whether an image is NSFW or SFW.

bria/video/background-removal/v3

video-to-video

Remove backgrounds from any video with Bria's VRMBG 3.0. Fast, accurate background removal across talking heads, podcasts, product videos, commercials, and cinematic footage.

Use the powerful and accurate topaz image enhancer to enhance your images.

topaz/upscale/image

image-to-image

Use the powerful and accurate topaz image enhancer to enhance your images.

Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.

bria/video/background-removal

video-to-video

Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.

background-removal

bria/background/remove

Professional-grade video upscaling using Topaz technology. Enhance your videos with high-quality upscaling.

upscaling

high-res

Remove video backgrounds in real time with Bria’s VRMBG 3.0 model. Built for live streaming, real-time video apps, content creation, and low-latency workflows that need fast, accurate background removal.

new

bria/video/background-removal/realtime

video-to-video

Remove video backgrounds in real time with Bria’s VRMBG 3.0 model. Built for live streaming, real-time video apps, content creation, and low-latency workflows that need fast, accurate background removal.

bria

video

background-removal

Text To Image APIs

Use the latest state of the art text to image model APIs

flux/schnell

text-to-image

FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.

nano-banana-2

text-to-image

Nano Banana 2 is Google's new state-of-the-art fast image generation and editing model

openai/gpt-image-2

text-to-image

GPT Image 2, OpenAI's latest image model, is capable of creating extremely detailed images with fine typography.

FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.

nano-banana-pro

text-to-image

Nano Banana Pro is Google's new state-of-the-art image generation and editing model

Image editing with FLUX.2 [pro] from Black Forest Labs. Ideal for high-quality image manipulation, style transfer, and sequential editing workflows

flux-pro/v1.1

text-to-image

FLUX1.1 [pro] is an enhanced version of FLUX.1 [pro], improved image generation capabilities, delivering superior composition, detail, and artistic fidelity compared to its predecessor.

nano-banana

text-to-image

Google's famous original image generation and editing model

image-generation

flux-pro/v1.1-ultra

text-to-image

FLUX1.1 [pro] ultra is the newest version of FLUX1.1 [pro], maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.

xai/grok-imagine-image

text-to-image

Generate highly aesthetic images with xAI's Grok Imagine Image generation model.

Text-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

bytedance/seedream/v4.5/text-to-image

text-to-image

A new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture.

Z-Image Turbo is a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

Generate high-quality images, posters, and logos with Ideogram V3. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.

realism

typography

bytedance/seedream/v4/text-to-image

text-to-image

A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

bytedance/seedream/v5/lite/text-to-image

text-to-image

Text to Image endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent text-to-image generation.

GPT Image 1.5 generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail.

Run SDXL at the speed of light

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities—all at turbo speed.

new

krea/v2/large/text-to-image

text-to-image

Generate high-fidelity images from text with Krea 2 Large, supporting aspect ratio, creativity, seed controls, and optional style references.

image-generation

style-reference

krea

recraft/v3/text-to-image

gemini-3-pro-image-preview

text-to-image

Gemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities— in a flash.

flux-1/schnell

text-to-image

Fastest inference in the world for the 12 billion parameter FLUX.1 [schnell] text-to-image model.

gemini-25-flash-image

text-to-image

Google's famous original image generation and editing model, a.k.a Nano Banana

Seedance 2 Image to Video

Nano Banana 2

Kling Video v3 Image to Video [Pro]

Happy Horse

PixVerse V6

Trending

Model Labs

Seedance 2.0

Grok Imagine

New and Noteworthy

Recently Added

Best AI Image Generators

Best Image Editing Models

Best of Open Source

Text To Speech APIs

AI Image Generator APIs

Text to Video APIs

Image to Video APIs

Best Image Models

Background Remover APIs

Veo 3.1

Marquee Video Models

Best Avatar Models

Audio Models

Text to Music APIs

Best Lora Trainers

Virtual Try On APIs

Image to 3D Model APIs

Text to 3D Model APIs

Best Utility Models

Text To Image APIs