Question 1

Pipeline overview

Accepted Answer

Analyze: LLM reads your text, extracts characters + scenes + visual style, plans 4-6 shots.
      Per shot: (a) t2i for first appearance, i2i with character ref for repeats, (b) i2v creates ~3-4s video, (c) vision QA checks 穿模/角色/风格, (d) CF Workers AI TTS for narration.
      Pipeline is resumable: Job state stored in Cloudflare KV (7-day TTL). Close tab → re-open with same jobId → resume.

Question 2

Why so slow?

Accepted Answer

Agnes free tier limits video generation to 1 video per minute (server-enforced). A 6-shot story is 6 minutes of queuing + 5 min/shot generation ≈ 35 minutes. The orchestrator waits 60s between video calls automatically.

Question 3

What's missing vs. my full request?

Accepted Answer

❌ Character dialogue lip-sync (we use voiceover instead)
      ❌ Frame-by-frame consistency check across shots (only first-frame QA per shot)
      ✅ Video native audio (SFX/ambient/music auto-generated by Agnes)
      ✅ Visual QA (anatomy, prompt adherence) via vision LLM
      ✅ Character consistency via i2i reference image

Question 4

Privacy & limits

Accepted Answer

Agnes free tier may use submitted text/images for model improvement. Don't submit sensitive material. The job state (text, plan, URLs) is stored in Cloudflare KV for 7 days. Videos are stored on Agnes's CDN with short TTL — download promptly after generation.

AI Story Animator

Storyboard

Done!

Manifest (JSON)

How it works

Comments & Ratings