Models

Last updated April 5, 2026

Choose the right
model for the agent job.

AnyCap exposes multimodal models through one capability runtime and one CLI. This page helps teams choose the right model for a given agent workflow instead of treating every image or video request the same way.

Answer-first summary

The current public AnyCap model catalog includes image generation models for first-pass output and revision loops, video generation models for premium or production-friendly motion work, and a prompt-based music model for soundtrack drafts. The right choice usually depends on whether the job starts from a blank prompt or an existing asset, how much polish the first pass needs, and how much speed or cost efficiency matters in the workflow.

How to choose the right model

Start with the output type: image, video, or music.
Then decide whether the task needs a polished first pass, faster iteration, or revision from an existing asset.
Use the model guide pages when the choice depends on motion style, editing workflow, or cost tradeoffs.

Visual guide

Illustrated overview of image, video, and music model categories inside the AnyCap model hub.

This illustration is a quick visual map of the current catalog: image models on one side, video models on another, and music generation as a separate capability lane inside the same agent runtime. It was generated with Nano Banana 2 to keep the page's visual language aligned with the model catalog itself.

Current model comparison

These are the current public models exposed through AnyCap. Credit ranges come from the same pricing inventory used on the pricing page, so the hub and pricing page stay aligned.

Image generation

Charged per call. Supports text-to-image and image-to-image modes.

Model	Modes	Credits / call	Best for
FLUX.1 Kontext Max	text-to-image, image-to-image	varies	Design-heavy image generation and contextual edits where prompt adherence, visual richness, and iterative refinement matter.
GPT Image 2	text-to-image, image-to-image	varies	General-purpose image generation and image edits when the workflow benefits from OpenAI's multimodal image model family.
Nano Banana Pro	text-to-image, image-to-image	~7	Targeted image editing and revision loops from an existing visual.
Nano Banana 2	text-to-image, image-to-image	~4	Fast, scalable image generation and high-volume iteration.
Qwen Image	text-to-image, image-to-image	varies	Bilingual or instruction-heavy visual work, especially when an agent needs a model associated with the Qwen multimodal family.
Seedream 4.5	text-to-image, image-to-image	varies	Everyday image generation, image transformation, and iterative editing where stable structure preservation matters.
Seedream 5	text-to-image, image-to-image	~2	Polished first-pass image generation from a text prompt.

Video generation

Charged per second of generated output. Supports text-to-video and image-to-video modes.

Model	Modes	Credits / sec	Best for
Hailuo 2.3	text-to-video, image-to-video	varies	Short narrative clips, expressive character motion, visual storytelling, and reference-image animation.
Veo 3.1	text-to-video, image-to-video	~20	Premium text-to-video output when the first pass needs to look stronger.
Veo 3.1 Fast	text-to-video, image-to-video	varies	Rapid creative iteration and preview generation when an agent wants the Veo family with faster turnaround.
Sora 2 Pro	text-to-video, image-to-video	varies	High-end narrative, cinematic, product, and realistic video generation when teams want an OpenAI video model through the same CLI.
Seedance 1.5 Pro	text-to-video, image-to-video	~14	Steady production-friendly video workflows and repeatable image-to-video jobs.
Seedance 2.0	text-to-video, image-to-video	varies	High-quality cinematic and product video workflows where agents need the newer Seedance model entry.
Seedance 2.0 Fast	text-to-video, image-to-video	varies	Previewing, ideation, and high-volume video iteration when an agent needs faster turnaround.
Kling 3.0	text-to-video, image-to-video	~9	Cinematic motion and flexible image-to-video workflows.
Kling O1	image-to-video	varies	Product demos, stylized motion design, and image-conditioned clips where the source frame should drive the video.

Music generation

Charged per second of generated audio.

Model	Modes	Credits / sec	Best for
ElevenLabs Music	text-to-music	~1	Prompt-based soundtrack drafts inside the same agent runtime.
Mureka V8	text-to-music	varies	Songwriting, vocal-oriented drafts, and audio content production when an agent needs an alternative to Suno or ElevenLabs Music.
Suno V5	text-to-music	varies	Structured songs, vocal demos, and full-track concepts that need lyrics, mood, and arrangement guidance.
Suno V5.5	text-to-music	varies	Current Suno music generation workflows, complete track drafts, vocal concepts, and high-iteration song ideas.

Image generation

Seedream 5

A strong default for polished first-pass image generation tasks.

Nano Banana Pro

A better fit for revision loops and prompt-based image editing.

Nano Banana 2

A faster fit for scalable image generation and high-volume iteration loops.

Video generation

Veo 3.1

The current video generation model for text-to-video workflows through AnyCap.

Kling 3.0

A strong fit for realistic motion and cinematic image-to-video workflows.

Seedance 1.5 Pro

A dependable default for production-friendly text-to-video and image-to-video work.

Music generation

ElevenLabs Music

A prompt-based music model for soundtrack drafts inside the same agent runtime.

FAQ

How do I choose between Seedream 5, Nano Banana Pro, and Nano Banana 2?

Use Seedream 5 when the workflow needs a stronger first-pass image from a prompt, Nano Banana Pro when the job starts from an existing image and needs revisions, and Nano Banana 2 when speed, throughput, or repeated iteration matters more.

How do I choose between Veo 3.1, Kling 3.0, and Seedance 1.5 Pro?

Use Veo 3.1 when the first video pass needs to look more premium from a text brief, Kling 3.0 when the workflow leans more on cinematic motion or flexible image-to-video work, and Seedance 1.5 Pro when the team wants a steadier production-oriented default.

Do all AnyCap models use the same CLI and auth flow?

Yes. AnyCap exposes these models through the same capability runtime, CLI, and auth flow, so teams do not need a separate provider integration path for each model page listed here.

Any Capability Context Guide

Models

Last updated April 5, 2026

Choose the right
model for the agent job.

Answer-first summary

How to choose the right model

Start with the output type: image, video, or music.
Then decide whether the task needs a polished first pass, faster iteration, or revision from an existing asset.
Use the model guide pages when the choice depends on motion style, editing workflow, or cost tradeoffs.

Visual guide

Current model comparison

These are the current public models exposed through AnyCap. Credit ranges come from the same pricing inventory used on the pricing page, so the hub and pricing page stay aligned.

Image generation

Charged per call. Supports text-to-image and image-to-image modes.

Model	Modes	Credits / call	Best for
FLUX.1 Kontext Max	text-to-image, image-to-image	varies	Design-heavy image generation and contextual edits where prompt adherence, visual richness, and iterative refinement matter.
GPT Image 2	text-to-image, image-to-image	varies	General-purpose image generation and image edits when the workflow benefits from OpenAI's multimodal image model family.
Nano Banana Pro	text-to-image, image-to-image	~7	Targeted image editing and revision loops from an existing visual.
Nano Banana 2	text-to-image, image-to-image	~4	Fast, scalable image generation and high-volume iteration.
Qwen Image	text-to-image, image-to-image	varies	Bilingual or instruction-heavy visual work, especially when an agent needs a model associated with the Qwen multimodal family.
Seedream 4.5	text-to-image, image-to-image	varies	Everyday image generation, image transformation, and iterative editing where stable structure preservation matters.
Seedream 5	text-to-image, image-to-image	~2	Polished first-pass image generation from a text prompt.

Video generation

Charged per second of generated output. Supports text-to-video and image-to-video modes.

Model	Modes	Credits / sec	Best for
Hailuo 2.3	text-to-video, image-to-video	varies	Short narrative clips, expressive character motion, visual storytelling, and reference-image animation.
Veo 3.1	text-to-video, image-to-video	~20	Premium text-to-video output when the first pass needs to look stronger.
Veo 3.1 Fast	text-to-video, image-to-video	varies	Rapid creative iteration and preview generation when an agent wants the Veo family with faster turnaround.
Sora 2 Pro	text-to-video, image-to-video	varies	High-end narrative, cinematic, product, and realistic video generation when teams want an OpenAI video model through the same CLI.
Seedance 1.5 Pro	text-to-video, image-to-video	~14	Steady production-friendly video workflows and repeatable image-to-video jobs.
Seedance 2.0	text-to-video, image-to-video	varies	High-quality cinematic and product video workflows where agents need the newer Seedance model entry.
Seedance 2.0 Fast	text-to-video, image-to-video	varies	Previewing, ideation, and high-volume video iteration when an agent needs faster turnaround.
Kling 3.0	text-to-video, image-to-video	~9	Cinematic motion and flexible image-to-video workflows.
Kling O1	image-to-video	varies	Product demos, stylized motion design, and image-conditioned clips where the source frame should drive the video.

Music generation

Charged per second of generated audio.

Model	Modes	Credits / sec	Best for
ElevenLabs Music	text-to-music	~1	Prompt-based soundtrack drafts inside the same agent runtime.
Mureka V8	text-to-music	varies	Songwriting, vocal-oriented drafts, and audio content production when an agent needs an alternative to Suno or ElevenLabs Music.
Suno V5	text-to-music	varies	Structured songs, vocal demos, and full-track concepts that need lyrics, mood, and arrangement guidance.
Suno V5.5	text-to-music	varies	Current Suno music generation workflows, complete track drafts, vocal concepts, and high-iteration song ideas.

Music generation

ElevenLabs Music

A prompt-based music model for soundtrack drafts inside the same agent runtime.

FAQ

How do I choose between Seedream 5, Nano Banana Pro, and Nano Banana 2?

How do I choose between Veo 3.1, Kling 3.0, and Seedance 1.5 Pro?

Do all AnyCap models use the same CLI and auth flow?

Yes. AnyCap exposes these models through the same capability runtime, CLI, and auth flow, so teams do not need a separate provider integration path for each model page listed here.

Any Capability Context Guide

Choose the rightmodel for the agent job.

How to choose the right model

Visual guide

Current model comparison

Image generation

Video generation

Music generation

Image generation

Seedream 5

Nano Banana Pro

Nano Banana 2

Video generation

Veo 3.1

Kling 3.0

Seedance 1.5 Pro

Music generation

ElevenLabs Music

FAQ

How do I choose between Seedream 5, Nano Banana Pro, and Nano Banana 2?

How do I choose between Veo 3.1, Kling 3.0, and Seedance 1.5 Pro?

Do all AnyCap models use the same CLI and auth flow?

Choose the rightmodel for the agent job.

How to choose the right model

Visual guide

Current model comparison

Image generation

Video generation

Music generation

Image generation

Seedream 5

Nano Banana Pro

Nano Banana 2

Video generation

Veo 3.1

Kling 3.0

Seedance 1.5 Pro

Music generation

ElevenLabs Music

FAQ

How do I choose between Seedream 5, Nano Banana Pro, and Nano Banana 2?

How do I choose between Veo 3.1, Kling 3.0, and Seedance 1.5 Pro?

Do all AnyCap models use the same CLI and auth flow?

Choose the right
model for the agent job.

Choose the right
model for the agent job.