# SANA-WM：一个用于生成1分钟720p视频的26亿级开源世界模型

- 来源：Hacker News 热门（buzzing.cc 中文翻译）
- 作者：mjgil
- 发布时间：2026-05-16 23:25
- AIHOT 分数：73
- AIHOT 标记：精选
- AIHOT 链接：https://aihot.virxact.com/items/cmp8i7zhb0isoslnzyvufvdtc
- 原文链接：https://nvlabs.github.io/Sana/WM

## 精选理由

开源且能跑 1 分钟 720p，NVIDIA 这个 2.6B 世界模型在物理一致性上跨了一大步，做视频生成和物理仿真的同行该坐不住了。

## AI 摘要

NVIDIA研究团队发布了SANA-WM，这是一个参数规模达26亿的开源世界模型，专门用于生成长达1分钟、分辨率为720p的视频。该模型已在GitHub页面开源，旨在推动高质量长视频生成的研发。其在Hacker News社区获得了107点热度，显示出业界对该技术进展的关注。

## 正文

SANA-WM | Efficient Minute-Scale World Modeling

SANA-WM: Efficient Minute-Scale World Modeling

with Hybrid Linear Diffusion Transformer

A 2.6B open-source world model that turns one image and a camera trajectory into 720p, minute-long, controllable video on a single GPU.

Haoyi Zhu*

Haozhe Liu*

Yuyang Zhao*

Tian Ye*

Junsong Chen*

Jincheng Yu

Tong He

Song Han

Enze Xie

NVIDIA

Online Demo livePaperCodeModels

Key features

Long horizon

Minute-scale rollouts

Hybrid linear attention pairs frame-wise Gated DeltaNet with periodic softmax to hold a coherent world for a full minute.

Action control

Precise 6-DoF camera trajectories

A coarse global pose branch and a fine pixel-aligned geometric branch jointly follow metric camera paths with high fidelity.

Two-stage fidelity

Second-stage long-video refiner

A dedicated 17B long-video refiner sharpens texture, motion, and late-window quality on top of the long-rollout backbone.

Lean compute

64 GPUs to train. One to deploy.

15 days on 64 H100s to train; a single H100 generates a one-minute 720p video at inference.

Abstract

We introduce SANA-WM, an efficient 2.6B-parameter open-source world model natively trained for one-minute generation, synthesizing high-fidelity, 720p, minute-scale videos with precise camera control. SANA-WM achieves visual quality comparable to large-scale industrial baselines such as LingBot-World and HY-WorldPlay, while significantly improving efficiency. Four core designs drive our architecture: (1) Hybrid Linear Attention combines frame-wise Gated DeltaNet with softmax attention for memory-efficient long-context modeling; (2) Dual-Branch Camera Control ensures precise 6-DoF trajectory adherence; (3) Two-Stage Generation Pipeline applies a long-video refiner to stage-1 outputs, improving quality and consistency across sequences; and (4) Robust Annotation Pipeline extracts accurate metric-scale 6-DoF camera poses from public videos to yield high-quality, spatiotemporally consistent action labels. Driven by these designs, SANA-WM uses only ~213K public video clips with metric-scale pose supervision, completes training in 15 days on 64 H100s, and generates each 60-second clip on a single GPU; its distilled variant runs on a single RTX 5090 with NVFP4 quantization to denoise a 60s 720p clip in 34s. On our one-minute world-model benchmark, SANA-WM demonstrates stronger action-following accuracy than prior open-source baselines and achieves comparable visual quality at 36x higher throughput.

Efficiency at a glance

Figure 1. Efficiency ablation and scaling. (a) 60 s single-GPU VAE/DiT latency by stage; bars are scaled for readability. (b) H100 latency and memory scaling: recurrent variants grow compactly, while all-softmax OOMs at 60 s.

Minute-long worlds

Prompt A first-person view from a strictly stationary observation point on a snowbound alpine trail beside a sheer rocky cliff. The spatial layout leads across a narrow white path toward a dark cave mouth recessed into the mountain wall, its entrance rimmed with long hanging icicles and flanked by wind-bent pine trees, with distant jagged peaks fading into the left background. The environment is built from rough granite faces, compacted snow, powdery drifts, boot-marked tracks, frosted bark, and translucent blue ice. A lone mountaineer in an orange jacket stands ahead as part of the scene, contrasting against the cold terrain. Diffuse gray daylight and a faint pulsing blue glow from within the cave create a tense survival-exploration mood. There is no dynamic camera movement and no action taken by the observer; meanwhile, snow particles stream through the air, fog curls around the cave entrance, pine branches sway, and loose powder skims over the trail.

Prompt

Prompt A first-person view from a strictly stationary observation point at the mouth of a dark limestone cave embedded in dense forest. The spatial layout leads from wet foreground stones and shallow pooled water into a tunnel-like cavern, where a narrow trail of blue fireflies marks a readable route toward a faint golden chamber deeper inside. The scene is textured with slick black rock, mossy cave edges, exposed roots, flat stepping stones, glossy puddles, and mineral-streaked walls reflecting scattered blue points of light. Dim forest daylight fades at the entrance while warm gold illumination glows from within, creating a mysterious natural exploration mood. There is no dynamic camera movement and no action taken by the person filming; the perspective remains fully fixed. Autonomous motion continues as bats flutter near the ceiling, fireflies drift and blink, water ripples softly, and hanging roots sway in the damp air.

Prompt

Prompt A first-person view from a strictly stationary observation point on a winding dirt path inside an enormous magical mushroom forest. Colossal mushroom stems rise like tree trunks on both sides, their broad ribbed caps forming layered canopies above a lantern-lit village nestled deep in the midground. A small round robot stands on the path ahead, surrounded by oversized leaves, beadlike dew drops, tiny mushrooms, vines, mossy soil, and scattered stones. The surfaces mix soft fungal textures, rough barklike stems, damp earth, glossy leaves, and warm metal lanterns. Purple and golden twilight fills the space, creating a whimsical yet physically grounded atmosphere with strong depth and scale. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. Autonomous motion animates the world: glowing spores drift, giant butterflies flap overhead, lantern flames flicker, dew trembles on leaves, and distant village lights shimmer.

Prompt

Prompt A first-person view from a strictly stationary observation point on a narrow forest path at night. Towering trunks and dense undergrowth form a dark natural corridor, with the trail bending deeper into mist toward a faint warm light hidden among distant branches. A young red fox stands ahead on the wet path, facing a chain of glowing blue fireflies that mark the route between trees. The ground is layered with slick mud, wet leaves, mossy stones, shallow puddles, and exposed roots, while a nearby massive trunk bears fresh claw marks cut into rough bark. Cool moonlight filters through the canopy, creating silver reflections and a mysterious but inviting atmosphere. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. Autonomous motion continues: fireflies drift and pulse, fog curls through the trees, leaves tremble, puddles ripple, and the fox's ears and tail shift subtly.

Prompt

Prompt A first-person view from a strictly stationary observation point at a surreal open crossroads. Three dirt paths split across a rugged meadow: the left route enters a luminous blue forest, the center route climbs toward a ruined stone tower beneath dark storm clouds, and the right route curves into a sunlit village built among giant tree roots and cliffside houses. A young explorer stands ahead with a backpack and walking stick, framed by grass, scattered rocks, twisted roots, crumbling masonry, and distant wooden rooftops. The lighting sharply contrasts cool magical blues, storm-gray shadows, and warm golden sunlight, creating a clear sense of separated destinations. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. Autonomous motion continues as birds cross the sky, grass and cloak fabric ripple, storm clouds churn, blue forest light pulses, and village leaves sway.

Prompt

Prompt A first-person view from a strictly stationary observation point overlooks a steep golden sand dune descending between rust-red canyon walls toward a half-buried crashed spacecraft. The spatial layout is broad and sloped, with scattered alien metal panels, rocks, and wreckage fragments forming obstacles along the sandy basin, while the ship's broken fuselage anchors the valley floor. A small round robot rides a battered metal panel in the foreground as part of the scene. Materials are gritty and sun-baked: rippled dune sand, sharp canyon stone, dented hull plating, scorched mechanical parts, dusty glass, and scratched robot casing. Warm sunset light floods the desert with an adventurous, high-energy mood, contrasted by blinking blue lights inside the wreck. There is no dynamic camera movement and no action by the unseen observer; sand sprays, dust plumes curl, loose grains cascade, heat haze shimmers, and the wreck lights pulse softly.

Prompt

Prompt A first-person view from a strictly stationary observation point inside a submerged ancient temple. Massive stone columns frame a deep underwater corridor, leading across a sand-dusted floor toward a cracked wall emitting bright green light in the distance. Broken statues, carved blocks, eroded steps, coral clusters, seaweed, and barnacle-covered pillars create a layered ruin with strong spatial depth. The materials feel ancient and weathered, with rough stone masonry softened by marine growth, scattered shells, suspended silt, and worn sculptural fragments half-buried in sand. Filtered sunlight descends through blue water, forming shifting caustic patterns and a calm, mysterious atmosphere. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. Autonomous motion fills the scene: small fish drift between columns, bubbles rise, seaweed sways gently, particles float in the current, and the green fissure glows steadily.

Prompt

Prompt A first-person view from a strictly stationary observation point on a moss-covered stone path within a dense ancient rainforest. The spatial layout draws from a wet foreground rock holding a brass compass toward a narrow ruin-lined trail, broken statues, hanging vines, and a distant temple partially hidden by trees and mist. Surfaces are richly textured with slick moss, cracked carved stone, wet leaves, damp bark, polished brass, rain-speckled glass, and puddled stone slabs. Soft shafts of sunlight pierce the canopy, mixing with humid haze to create a mysterious archaeological adventure mood. There is no dynamic camera movement and no action taken by the person filming; the perspective remains fully fixed. Autonomous motion animates the environment as butterflies and small insects flutter near the ground, droplets slide across the compass glass, vines sway lightly, mist drifts around the ruins, and the blue compass needle glows steadily toward the temple.

Prompt

Prompt A first-person view of a high-altitude alpine ridge where sun-drenched snow blankets undulating terrain, casting long shadows across rocky outcrops and sparse, frost-laden shrubs. Jagged peaks rise in the distance under a pale blue sky streaked with wispy clouds, their slopes etched with exposed granite and wind-sculpted drifts. The foreground reveals textured snowfields interrupted by dark boulders and low-lying evergreen bushes dusted white, while the horizon dissolves into layered mountain ranges fading through atmospheric haze. Sunlight glints off icy surfaces near the observer's fixed vantage point, illuminating fine particulate snow suspended midair, suggesting recent wind activity without implying motion beyond what is visually present.

Prompt

Prompt A first-person view of a deep, rugged canyon where a turbulent river flows rapidly between towering rock walls. The water churns white as it rushes over submerged boulders, carving a path through the steep, craggy terrain. Massive cliffs rise vertically on both sides, composed of fractured brown and gray stone with patches of green moss and sparse coniferous trees clinging to ledges. The scene is illuminated by bright daylight, casting sharp shadows that accentuate the rough textures of the rock faces. In the immediate foreground, jagged rocks frame the bottom of the view, anchoring the perspective high above the rushing current.

Prompt

Prompt A first-person view of a serene library interior featuring a large wooden table in the immediate foreground, upon which rests an open hardcover book, a spiral notebook, and a black pen. The perspective is fixed, looking past the study surface toward rows of tall, dark wood bookshelves densely packed with colorful volumes. Natural light streams through high arched windows in the background, illuminating the scene with a soft, warm glow that highlights the texture of the wooden furniture and the orderly arrangement of the collection. The atmosphere is quiet and studious, defined by the rich browns of the cabinetry and the muted tones of the book spines filling the architectural space.

Prompt

Prompt A first-person view of a narrow, muddy trail winding through a dense, rain-drenched tropical forest, where broad palm fronds and ferns crowd the path on both sides. Rain falls visibly in vertical streaks, glistening on wet leaves and saturating the earthy ground littered with fallen branches and decaying foliage. The canopy overhead filters soft, diffused light, creating a misty, humid atmosphere that blurs distant tree trunks into a green-gray haze. The scene is static except for the continuous descent of raindrops, emphasizing the stillness of the observer anchored within this lush, water-heavy environment.

Prompt

Prompt A first-person view from a strictly stationary observation point on a cracked futuristic highway overtaken by weeds. The spatial layout centers on a broad, damaged roadway lined with abandoned hover vehicles, broken guardrails, decayed traffic lights, and a distant ruined megacity crowned by a vertical blue energy beam. A small maintenance robot sits in the central lane as a scale marker, while a damaged robot head near the curb projects a faint holographic arrow toward the skyline. Surfaces are textured with fractured asphalt, rusted metal shells, tangled vines, grime-stained concrete, exposed cables, and rain-darkened debris. Heavy storm clouds and muted gray light create a bleak post-apocalyptic sci-fi mood. There is no dynamic camera movement and no action taken by the person filming; the perspective remains fixed. Autonomous motion continues as sparks drip from old wires, birds shift on traffic lights, the hologram flickers, and clouds churn above the city.

Prompt

Prompt A first-person view from a strictly stationary observation point at a T-shaped intersection inside an abandoned underground sci-fi research facility. The spatial layout splits into three distinct routes: a left corridor submerged in shallow water, a right corridor filled with warm steam, and a heavy circular metal door straight ahead, half-open onto deep darkness. The architecture is industrial and bunker-like, with ribbed metal tunnel frames, stained concrete walls, corroded panels, loose floor cables, broken glass, and wet reflective plating. Cold blue emergency lights wash the flooded passage, while orange warning lamps burn through haze on the opposite side, creating a tense, cinematic sci-fi horror atmosphere. There is no dynamic camera movement and no action taken by the unseen observer; the world moves independently as ceiling lights flicker, water ripples across the floor, steam drifts through the right corridor, and dust motes float in the stale air.

Prompt

Twenty-second worlds

Prompt A first-person view from a strictly stationary observation point inside a small abandoned mountain cabin at dusk. The spatial layout places a rough wooden table in the foreground, holding an old hand-drawn map pinned by stones, a rusty key, and a still-warm lantern, while an open plank door frames a snowy forest path leading to a distant cave with a faint golden glow. The cabin is built from dark weathered logs, uneven floorboards, soot-blackened stone around the fireplace, worn shelves, barrels, and damp muddy footprints across the wood. Warm amber firelight and lantern glow contrast with the cold blue exterior dusk, creating a cozy but ominous survival-exploration mood. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as embers smoke, the lantern flame flickers, snow gusts through the doorway, and tree branches stir outside.

Prompt

Prompt A first-person view from a strictly fixed observation point on a narrow muddy forest trail after rainfall. The spatial layout is enclosed by dense tree trunks, low broken branches, and wet undergrowth, with a chain of enormous fresh footprints leading into thick white mist as the main path marker. A nearby trunk bears deep claw marks, while the foreground is packed with slick mud, rain-filled impressions, crushed leaves, snapped twigs, mossy bark, and glossy foliage. Soft gray daylight diffuses through the fog, creating a suspenseful, damp exploration mood with limited visibility ahead. There is no dynamic camera movement and no action taken by the person filming; the perspective remains entirely static. Independent ambient motion animates the space: small birds scatter in the distance, fog drifts between the trees, leaves tremble under residual rain, droplets fall into puddles, and muddy water ripples inside the footprints.

Prompt

Prompt A first-person view from a strictly stationary observation point on a quiet suburban street after rain. Small modern houses, parked cars, lawns, sidewalks, and curbside trees frame a long wet roadway, while a trail of muddy footprints leads toward a massive medieval castle rising impossibly beyond the neighborhood. The castle's stone towers, crenellated walls, narrow windows, and mist-wrapped base dominate the far end of the street, contrasting with asphalt, roof shingles, painted siding, glass windows, and reflective puddles. Overcast gray light creates a cold, magical realism atmosphere, softened by warm interior house lights and torchlike glows within the castle. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. Autonomous motion continues in the environment: crows circle above the towers, fog drifts across the battlements, leaves slide along the wet road, and puddles ripple faintly.

Prompt

Prompt A first-person view from a strictly stationary observation point within an overgrown jungle ruin. A mossy stone path leads across shallow puddles and uneven steps toward a massive circular sealed door set into an ancient wall, framed by broken arches, carved blocks, hanging vines, and glowing green symbols. A small round exploration robot stands on the path ahead, its eye lights reflected in the wet stone. The environment is rich with slick moss, weathered masonry, cracked steps, glossy leaves, damp roots, and water-darkened stonework, blending ancient craftsmanship with a gentle sci-fi presence. Sunbeams cut through the dense canopy, creating a humid, mysterious atmosphere with warm highlights and cool green reflections. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. Autonomous motion continues as vines sway, insects crawl, butterflies flutter, puddles ripple, and the door symbols pulse faintly.

Prompt

Prompt A first-person view from a strictly stationary observation point inside a grand old library hall. The spatial layout is broad and symmetrical, with towering wooden bookshelves rising along both walls, arched windows at the rear, twin staircases framing the central reading area, and open aisles around desks, globes, and side tables. At the center, an old map table anchors the room beneath a forming circular portal of blue and gold energy. Polished marble floor tiles, carved wood railings, brass instruments, leather-bound volumes, wax candles, scattered parchment, and dust-softened surfaces create a richly crafted scholarly interior. Warm candlelight mixes with cold magical glow, producing an adventurous, uncanny atmosphere. There is no dynamic camera movement and no action taken by the person filming; the view remains fixed. Autonomous motion fills the space as papers spiral upward, candles flicker violently, dust motes shimmer, and the portal churns above the table.

Prompt

Prompt A first-person view from a strictly stationary observation point inside a dark abandoned laboratory corridor. The layout forms a long, clinical passage with a shattered glass containment chamber on the left, cracked observation windows and lab benches on the right, and a half-open metal door at the far end. Wet humanoid footprints cut through the glossy tiled floor, leading away from the broken chamber through scattered equipment, cables, overturned furniture, and puddles. Surfaces are slick, grimy, and damaged, with fractured glass, stained metal cabinets, peeling wall panels, and waterlogged ceiling tiles. Flickering fluorescent tubes and a faint green emergency glow create a cold sci-fi thriller atmosphere. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. The environment moves autonomously: water drips from the ceiling, mist hangs and curls in the hallway, lights pulse irregularly, and loose cables sway faintly.

Prompt

Prompt A first-person view from a strictly stationary observation point within a narrow rainforest trail after heavy rain. The layout centers on a muddy path receding between dense tropical vegetation toward a partially hidden ancient stone gate, with one carved pillar and an upper arch emerging through vines and foliage. In the foreground, a brass compass rests on a moss-covered rock, while oversized footprints, snapped branches, and crushed leaves mark the wet ground. Surfaces are slick and tactile: muddy puddles, glossy leaves, damp bark, rough stone, and thick green moss. Soft sunlight filters through the canopy, mixing with drifting mist and faint blue light behind the gate to create an inviting, mysterious atmosphere. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as birds scatter overhead, mist curls between trees, leaves tremble, and puddles shimmer with subtle ripples.

Prompt

Prompt A first-person view from a strictly stationary observation point on a rugged desert rover, overlooking a tire-rutted sandy route that cuts through rolling golden dunes toward a colossal crashed spacecraft. The wreck forms the dominant landmark: a broken circular engine faces the route, while torn metallic hull plates trail half-buried across the sand. The foreground contains dust-coated rover plating, scattered alien debris, jagged metal shards, rippled dunes, and dry granular sand shaped by wind. Harsh afternoon sunlight bleaches the landscape, producing a hot, cinematic expedition mood with shimmering heat haze on the horizon. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person filming. Autonomous world motion continues around the static view: sand streams across the track, black smoke rises from the wreck, blue interior lights blink weakly, and distant dust curls over the dunes.

Prompt

Prompt A first-person view from a strictly stationary observation point low on a worn wooden floor inside a cluttered, sunlit room. A miniature red race car sits on a cracked plank path surrounded by scattered marbles, colorful blocks, buttons, books, and small toy pieces, with table legs forming a looming overhead structure and an open doorway leading into the dim back of the room. The space is textured with splintered floorboards, peeling wall paint, dusty furniture, glassy marble reflections, and soft fabric shadows. Slatted afternoon sunlight streams through the window blinds, creating warm bands of light and a nostalgic, slightly mysterious atmosphere. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. Autonomous motion animates the room: dust motes drift through sunbeams, loose marbles wobble faintly, and a cat's shadow passes silently near the doorway.

Prompt

Prompt A first-person view from a strictly stationary observation point across a vast dry mountain steppe beneath a clear blue sky. The spatial layout stretches over open tawny grassland dotted with low scrub and scattered stones, leading toward layered beige hills and snow-capped mountain ranges across the horizon. A silver sports car occupies the near foreground as part of the scene, emphasizing scale against the immense plain and distant peaks. Materials are sunlit and tactile: brushed metallic bodywork, dark glass, rubber tires, brittle grass, dusty soil, pale gravel, and wind-worn rocky slopes. Bright midday light creates crisp shadows and a clean, expansive atmosphere. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as clouds drift slowly, dry grass ripples, dust lifts near the ground, and heat haze shimmers across the distant flats.

Prompt

Prompt A first-person view from a strictly stationary observation point inside a submerged ancient temple. A clear stone walkway runs between coral-encrusted columns toward a cracked ceremonial wall, where bright green light leaks through the central fracture and aligns with glowing symbols on the floor. A small spherical diving robot hovers ahead, its blue front light illuminating broken statues, scattered masonry, sand patches, seaweed, purple coral, and barnacle-covered pillars. The textures are eroded and waterworn, with carved stone softened by algae, silt, and marine growth. Shafts of sunlight descend from the surface, mixing with the green fissure glow to create a calm, mysterious underwater atmosphere. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. Autonomous motion continues as fish swim, bubbles rise, particles drift, seaweed sways, caustics shimmer, and the green symbols pulse faintly.

Prompt

Prompt A first-person view from a strictly fixed observation point at a three-way fork in an ancient forest trail. The layout opens into three readable routes: a left path descending toward a misty river with blue fireflies above the water, a central stone path climbing toward an ivy-covered ruined tower, and a right path disappearing into a darker pine grove hung with orange lanterns. Moss-covered boulders, exposed roots, damp soil, rough bark, and weathered stonework create a richly textured fantasy wilderness. Soft morning fog filters through tall conifers, giving the scene a mysterious, exploratory mood. There is no dynamic camera movement and no action taken by the person filming; the view remains entirely stationary. Independent ambient motion animates the world: birds circle above the tower, river water flows below, lanterns sway faintly, mist drifts between trees, and glowing fireflies hover near the riverbank.

Prompt

Prompt A first-person view from a strictly stationary miniature-scale observation point on a dusty wooden floor inside an abandoned room. The spatial layout stretches across cracked floorboards toward a towering Victorian dollhouse mansion, its open front door reached by a narrow ramp of old stacked books, with buttons, marbles, and scattered debris forming small obstacles along the route. A tiny red toy car sits in the foreground as part of the scene, emphasizing the oversized scale of furniture legs, gaps in the planks, and the decayed mansion facade. Materials are dry and worn: splintered wood grain, chipped painted walls, dusty glass, brittle paper, dull plastic buttons, polished marbles, and sagging spiderwebs. Pale sunlight beams through broken windows, creating a playful yet mysterious atmosphere. There is no dynamic camera movement and no action by the unseen observer; dust motes drift, spiderweb strands tremble, and loose paper edges flutter faintly.

Prompt

Prompt A first-person view from a strictly stationary observation point overlooks an abandoned winter campsite in a snowy pine basin beneath a rocky mountain slope. The spatial layout places a glowing campfire and scattered gear in the left foreground, a lone mountaineer in an orange jacket near center, and a footprint trail crossing open snow toward a cave cut into the cliff. Distant peaks and dark conifers frame the depth. Materials are cold and tactile: powdery snow, ice-dusted rocks, singed firewood, metal cups, canvas supplies, frosted backpack fabric, and rough bark. Purple dusk light cools the landscape, while fire embers and golden cave light add uneasy warmth. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as snow falls, pine branches sway, embers flicker, steam rises from the cup, and the cave glow pulses softly.

Prompt

Prompt A first-person view from a strictly stationary observation point on an elevated forest ledge before a massive ancient doorway carved into a cliff. Stone steps form a clear approach from mossy path to half-open doors, flanked by carved pillars, guardian statues, ivy-covered walls, and a sweeping mountain valley to the left. A cloaked traveler stands near the entrance as a scale marker. Surfaces are textured with cracked stone slabs, engraved masonry, damp moss, tangled vines, fallen leaves, weathered statues, and glowing turquoise runes etched across the ground. Warm late-day sunlight mixes with intense cyan light spilling from the doorway, creating a tense fantasy atmosphere of discovery. There is no dynamic camera movement and no action taken by the person filming; the perspective remains fixed. Autonomous motion continues as leaves swirl, birds scatter from the cliff, vines tremble, and the portal light pulses softly.

Prompt

Prompt A first-person view from a strictly stationary observation point inside a sunlit desert canyon facing an ancient ruin entrance. The spatial layout forms a clear sandy approach between red rock walls, leading toward a monumental stone doorway carved with worn geometric reliefs, with a small friendly robot positioned at the threshold and a backpacked explorer standing nearby as part of the scene. Materials combine ancient and futuristic details: weathered sandstone blocks, chipped carvings, loose gravel, dry scrub, dusty metal cables, and the robot's scuffed white shell with a glowing face. Warm sunlight casts long shadows across the canyon floor, while faint green light inside the ruin adds a mysterious sci-fi atmosphere. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as wind pushes sand through the doorway, dust swirls around the robot, its raised arm makes subtle servo adjustments, and the interior glow flickers softly.

Prompt

Prompt A first-person view from a strictly stationary observation point on a grassy alpine trail overlooking a bright mountain valley. The spatial layout follows a dirt path bordered by split-rail fences, wildflowers, and scattered rocks, passing a rustic log cabin on the left before descending toward a hidden blue lake beneath distant snow-capped peaks. A small shiba inu wearing a red scarf stands ahead on the trail as part of the scene, while a wooden post with a softly glowing green object marks the right side. Materials are crisp and natural: rough cabin logs, weathered shingles, bark-textured fence rails, gravelly soil, soft grass, delicate petals, and realistic fur. Morning sunlight creates a cozy adventure atmosphere. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as chimney smoke rises, butterflies flutter, birds circle, grass ripples, the scarf stirs, and the hanging glow sways faintly.

Prompt

Prompt A first-person view from a strictly stationary observation point overlooking a vast desert canyon road at sunset. The spatial layout divides ahead into two clear routes: one sandy track climbs toward a red rock arch, while another descends into a shadowed canyon, both leading visually toward a colossal crashed spaceship half-buried in the distant dunes. A rugged off-road vehicle occupies the foreground as a scale marker amid tire tracks, dry shrubs, scattered alien metal fragments, and eroded sandstone ridges. Surfaces are textured with dusty gravel, wind-cut rock, dented metallic wreckage, and rippled sand. Warm orange sunlight casts long shadows, creating an epic sci-fi expedition mood. There is no dynamic camera movement and no action taken by the person filming; the view remains fixed. Autonomous motion continues as dust streams across the road, black smoke rises from wreckage, blue engine lights blink, and heat haze shimmers over the canyon.

Prompt

Same first frame, different paths

Prompt A first-person view from a strictly stationary observation point across a vast pale salt flat bordered by distant mountain ranges. The spatial layout is broad, open, and unobstructed, stretching from the central foreground sports car across a textured white basin toward layered gray ridgelines under a wide sky. Surfaces include rough salt crust, powdery ground streaks, smooth reflective black bodywork, dark glass, rubber tires, and glowing red rear lights. Bright low sunlight from the right washes the scene in cool white and soft gold, with atmospheric haze flattening the far mountains into a calm, high-speed desert mood. There is no dynamic camera movement and no action taken by the person filming; the perspective remains fixed. Autonomous motion appears as pale dust and salt spray trailing around the car, thin clouds drifting overhead, heat shimmer near the horizon, and faint wind lines brushing across the flat ground.

Prompt

Prompt A first-person view from a strictly stationary observation point across a vast pale salt flat bordered by distant mountain ranges. The spatial layout is broad, open, and unobstructed, stretching from the central foreground sports car across a textured white basin toward layered gray ridgelines under a wide sky. Surfaces include rough salt crust, powdery ground streaks, smooth reflective black bodywork, dark glass, rubber tires, and glowing red rear lights. Bright low sunlight from the right washes the scene in cool white and soft gold, with atmospheric haze flattening the far mountains into a calm, high-speed desert mood. There is no dynamic camera movement and no action taken by the person filming; the perspective remains fixed. Autonomous motion appears as pale dust and salt spray trailing around the car, thin clouds drifting overhead, heat shimmer near the horizon, and faint wind lines brushing across the flat ground.

Prompt

Prompt A first-person view from a strictly stationary observation point across a vast pale salt flat bordered by distant mountain ranges. The spatial layout is broad, open, and unobstructed, stretching from the central foreground sports car across a textured white basin toward layered gray ridgelines under a wide sky. Surfaces include rough salt crust, powdery ground streaks, smooth reflective black bodywork, dark glass, rubber tires, and glowing red rear lights. Bright low sunlight from the right washes the scene in cool white and soft gold, with atmospheric haze flattening the far mountains into a calm, high-speed desert mood. There is no dynamic camera movement and no action taken by the person filming; the perspective remains fixed. Autonomous motion appears as pale dust and salt spray trailing around the car, thin clouds drifting overhead, heat shimmer near the horizon, and faint wind lines brushing across the flat ground.

Prompt

Prompt A first-person view from a strictly stationary observation point within a misty alien swamp at dawn. The spatial layout spreads across shallow reflective water, muddy stepping stones, and mossy islets toward a luminous purple forest, with a half-submerged crashed spacecraft on the left and a glowing rock on the right holding a small translucent-finned alien creature. An astronaut in a white suit stands in the midground as part of the environment, giving scale to the wreckage and strange wetlands. Materials are slick and otherworldly: wet mud, rippled water, scorched metal hull plating, cracked cockpit glass, bioluminescent plants, and gelatinous fins catching the light. Soft sunrise colors mix with violet plant glow and blinking red emergency lights, creating a survival-mystery atmosphere. There is no dynamic camera movement and no action by the unseen observer; dragonfly-like creatures hover, fog drifts, plants pulse, water ripples, and the creature's fins twitch subtly.

Prompt

Prompt A first-person view from a strictly stationary observation point within a misty alien swamp at dawn. The spatial layout spreads across shallow reflective water, muddy stepping stones, and mossy islets toward a luminous purple forest, with a half-submerged crashed spacecraft on the left and a glowing rock on the right holding a small translucent-finned alien creature. An astronaut in a white suit stands in the midground as part of the environment, giving scale to the wreckage and strange wetlands. Materials are slick and otherworldly: wet mud, rippled water, scorched metal hull plating, cracked cockpit glass, bioluminescent plants, and gelatinous fins catching the light. Soft sunrise colors mix with violet plant glow and blinking red emergency lights, creating a survival-mystery atmosphere. There is no dynamic camera movement and no action by the unseen observer; dragonfly-like creatures hover, fog drifts, plants pulse, water ripples, and the creature's fins twitch subtly.

Prompt

Prompt A first-person view from a strictly stationary observation point within a misty alien swamp at dawn. The spatial layout spreads across shallow reflective water, muddy stepping stones, and mossy islets toward a luminous purple forest, with a half-submerged crashed spacecraft on the left and a glowing rock on the right holding a small translucent-finned alien creature. An astronaut in a white suit stands in the midground as part of the environment, giving scale to the wreckage and strange wetlands. Materials are slick and otherworldly: wet mud, rippled water, scorched metal hull plating, cracked cockpit glass, bioluminescent plants, and gelatinous fins catching the light. Soft sunrise colors mix with violet plant glow and blinking red emergency lights, creating a survival-mystery atmosphere. There is no dynamic camera movement and no action by the unseen observer; dragonfly-like creatures hover, fog drifts, plants pulse, water ripples, and the creature's fins twitch subtly.

Prompt

Prompt A first-person view from a strictly stationary observation point on the snowy shore of a frozen alpine lake at twilight. The spatial layout opens across a broad sheet of ice toward a solitary red wooden door centered far out on the lake, with a trail of footprints leading across the frozen surface and thin cracks radiating back toward the shore. Snow-laden pine trees frame both sides, while distant mountain ridges reflect across the dark, glassy ice. Materials feel crisp and tactile: packed snowbanks, frosted rocks, scratched translucent ice, powder-coated branches, and painted wood glowing at the base of the door. Lavender-blue dusk light blends with warm golden light leaking from beneath the door, creating a quiet, surreal mystery. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as snowflakes fall, clouds drift, pine boughs tremble, and reflections shimmer faintly across the ice.

Prompt

Prompt A first-person view from a strictly stationary observation point on the snowy shore of a frozen alpine lake at twilight. The spatial layout opens across a broad sheet of ice toward a solitary red wooden door centered far out on the lake, with a trail of footprints leading across the frozen surface and thin cracks radiating back toward the shore. Snow-laden pine trees frame both sides, while distant mountain ridges reflect across the dark, glassy ice. Materials feel crisp and tactile: packed snowbanks, frosted rocks, scratched translucent ice, powder-coated branches, and painted wood glowing at the base of the door. Lavender-blue dusk light blends with warm golden light leaking from beneath the door, creating a quiet, surreal mystery. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as snowflakes fall, clouds drift, pine boughs tremble, and reflections shimmer faintly across the ice.

Prompt

Prompt A first-person view from a strictly stationary observation point on the snowy shore of a frozen alpine lake at twilight. The spatial layout opens across a broad sheet of ice toward a solitary red wooden door centered far out on the lake, with a trail of footprints leading across the frozen surface and thin cracks radiating back toward the shore. Snow-laden pine trees frame both sides, while distant mountain ridges reflect across the dark, glassy ice. Materials feel crisp and tactile: packed snowbanks, frosted rocks, scratched translucent ice, powder-coated branches, and painted wood glowing at the base of the door. Lavender-blue dusk light blends with warm golden light leaking from beneath the door, creating a quiet, surreal mystery. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as snowflakes fall, clouds drift, pine boughs tremble, and reflections shimmer faintly across the ice.

Prompt

Prompt A first-person view from a strictly stationary observation point inside a vast jungle canyon. The spatial layout forms a deep natural corridor between steep vine-covered cliffs, opening toward a massive waterfall that veils an ancient stone temple barely visible behind the falling water. A small folded paper airplane occupies the foreground as part of the scene, framed by colorful birds and drifting leaves. Materials feel vivid and layered: crisp paper folds, wet stone walls, tangled vines, glossy foliage, suspended droplets, and weathered carved masonry softened by spray. Bright daylight filters through the canyon, creating mist, rainbow light, and a whimsical adventure atmosphere. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as the waterfall crashes, mist billows, birds flap across the canyon, leaves tumble, droplets sparkle, and the paper plane quivers in the air currents.

Prompt

Prompt A first-person view from a strictly stationary observation point inside a vast jungle canyon. The spatial layout forms a deep natural corridor between steep vine-covered cliffs, opening toward a massive waterfall that veils an ancient stone temple barely visible behind the falling water. A small folded paper airplane occupies the foreground as part of the scene, framed by colorful birds and drifting leaves. Materials feel vivid and layered: crisp paper folds, wet stone walls, tangled vines, glossy foliage, suspended droplets, and weathered carved masonry softened by spray. Bright daylight filters through the canyon, creating mist, rainbow light, and a whimsical adventure atmosphere. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as the waterfall crashes, mist billows, birds flap across the canyon, leaves tumble, droplets sparkle, and the paper plane quivers in the air currents.

Prompt

Prompt A first-person view from a strictly stationary observation point inside a vast jungle canyon. The spatial layout forms a deep natural corridor between steep vine-covered cliffs, opening toward a massive waterfall that veils an ancient stone temple barely visible behind the falling water. A small folded paper airplane occupies the foreground as part of the scene, framed by colorful birds and drifting leaves. Materials feel vivid and layered: crisp paper folds, wet stone walls, tangled vines, glossy foliage, suspended droplets, and weathered carved masonry softened by spray. Bright daylight filters through the canyon, creating mist, rainbow light, and a whimsical adventure atmosphere. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as the waterfall crashes, mist billows, birds flap across the canyon, leaves tumble, droplets sparkle, and the paper plane quivers in the air currents.

Prompt

Prompt A first-person view from a strictly stationary observation point inside a vast abandoned shopping mall atrium reclaimed by vegetation. Multi-level balconies, broken storefronts, rusted railings, escalators, and a cracked glass roof frame a deep central hall, where a quadcopter hovers above a debris-strewn floor and a red signal glows on the far balcony. The space is textured with peeling concrete, stained walls, shattered skylight panels, hanging vines, moss-covered steps, torn banners, puddles, and scattered rubble. Sunbeams pour through the damaged roof, illuminating dust and humid greenery with a warm, post-collapse atmosphere. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. Autonomous motion animates the environment: drone rotors spin, birds cross the atrium, vines sway gently, water drips into puddles, dust motes drift through the light, and the red beacon pulses faintly.

Prompt

Prompt A first-person view from a strictly stationary observation point inside a vast abandoned shopping mall atrium reclaimed by vegetation. Multi-level balconies, broken storefronts, rusted railings, escalators, and a cracked glass roof frame a deep central hall, where a quadcopter hovers above a debris-strewn floor and a red signal glows on the far balcony. The space is textured with peeling concrete, stained walls, shattered skylight panels, hanging vines, moss-covered steps, torn banners, puddles, and scattered rubble. Sunbeams pour through the damaged roof, illuminating dust and humid greenery with a warm, post-collapse atmosphere. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. Autonomous motion animates the environment: drone rotors spin, birds cross the atrium, vines sway gently, water drips into puddles, dust motes drift through the light, and the red beacon pulses faintly.

Prompt

Prompt A first-person view from a strictly stationary observation point inside a vast abandoned shopping mall atrium reclaimed by vegetation. Multi-level balconies, broken storefronts, rusted railings, escalators, and a cracked glass roof frame a deep central hall, where a quadcopter hovers above a debris-strewn floor and a red signal glows on the far balcony. The space is textured with peeling concrete, stained walls, shattered skylight panels, hanging vines, moss-covered steps, torn banners, puddles, and scattered rubble. Sunbeams pour through the damaged roof, illuminating dust and humid greenery with a warm, post-collapse atmosphere. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. Autonomous motion animates the environment: drone rotors spin, birds cross the atrium, vines sway gently, water drips into puddles, dust motes drift through the light, and the red beacon pulses faintly.

Prompt

Prompt A first-person view from a strictly stationary observation point on an abandoned underground subway platform. The spatial layout extends along a wet platform and parallel train tracks into a deep tunnel, where a faint red glow pulses at the vanishing point; a slightly open maintenance door, discarded suitcase, old newspaper, tiled walls, and concrete support columns define the side spaces. Materials are grimy and tactile: cracked concrete, stained ceramic tile, rusted rail metal, puddled floor seams, dangling black cables, peeling paint, and damp soot on the ceiling. Flickering fluorescent fixtures cast weak reflections across the platform, creating a tense urban mystery atmosphere. There is no dynamic camera movement and no action taken by the unseen observer; the world moves independently as steam drifts from vents, ceiling lights stutter, the red tunnel glow throbs softly, water trembles in shallow puddles, and loose papers flutter faintly in the stale air.

Prompt

Prompt A first-person view from a strictly stationary observation point on an abandoned underground subway platform. The spatial layout extends along a wet platform and parallel train tracks into a deep tunnel, where a faint red glow pulses at the vanishing point; a slightly open maintenance door, discarded suitcase, old newspaper, tiled walls, and concrete support columns define the side spaces. Materials are grimy and tactile: cracked concrete, stained ceramic tile, rusted rail metal, puddled floor seams, dangling black cables, peeling paint, and damp soot on the ceiling. Flickering fluorescent fixtures cast weak reflections across the platform, creating a tense urban mystery atmosphere. There is no dynamic camera movement and no action taken by the unseen observer; the world moves independently as steam drifts from vents, ceiling lights stutter, the red tunnel glow throbs softly, water trembles in shallow puddles, and loose papers flutter faintly in the stale air.

Prompt

Prompt A first-person view from a strictly stationary observation point on an abandoned underground subway platform. The spatial layout extends along a wet platform and parallel train tracks into a deep tunnel, where a faint red glow pulses at the vanishing point; a slightly open maintenance door, discarded suitcase, old newspaper, tiled walls, and concrete support columns define the side spaces. Materials are grimy and tactile: cracked concrete, stained ceramic tile, rusted rail metal, puddled floor seams, dangling black cables, peeling paint, and damp soot on the ceiling. Flickering fluorescent fixtures cast weak reflections across the platform, creating a tense urban mystery atmosphere. There is no dynamic camera movement and no action taken by the unseen observer; the world moves independently as steam drifts from vents, ceiling lights stutter, the red tunnel glow throbs softly, water trembles in shallow puddles, and loose papers flutter faintly in the stale air.

Prompt

Prompt A first-person view from a strictly stationary observation point on a deserted tropical beach at sunrise. The spatial layout stretches from the foamy shoreline on the left across wet reflective sand toward a dense wall of jungle and leaning palms on the right, with a long mysterious drag mark carving a dark path from the water's edge into the vegetation. The beach is textured with rippled sand, sea foam, broken crate planks, torn black fabric, scattered debris, and small glowing blue shells embedded along the trail. Warm pink-orange dawn light reflects across the shallow water and slick sand, creating a cinematic mystery-adventure mood. There is no dynamic camera movement and no action taken by the unseen observer; the world moves independently as waves wash over the first part of the mark, palm fronds sway, birds circle over the ocean, and clouds drift through the glowing sky.

Prompt

Prompt A first-person view from a strictly stationary observation point on a deserted tropical beach at sunrise. The spatial layout stretches from the foamy shoreline on the left across wet reflective sand toward a dense wall of jungle and leaning palms on the right, with a long mysterious drag mark carving a dark path from the water's edge into the vegetation. The beach is textured with rippled sand, sea foam, broken crate planks, torn black fabric, scattered debris, and small glowing blue shells embedded along the trail. Warm pink-orange dawn light reflects across the shallow water and slick sand, creating a cinematic mystery-adventure mood. There is no dynamic camera movement and no action taken by the unseen observer; the world moves independently as waves wash over the first part of the mark, palm fronds sway, birds circle over the ocean, and clouds drift through the glowing sky.

Prompt

Prompt A first-person view from a strictly stationary observation point on a deserted tropical beach at sunrise. The spatial layout stretches from the foamy shoreline on the left across wet reflective sand toward a dense wall of jungle and leaning palms on the right, with a long mysterious drag mark carving a dark path from the water's edge into the vegetation. The beach is textured with rippled sand, sea foam, broken crate planks, torn black fabric, scattered debris, and small glowing blue shells embedded along the trail. Warm pink-orange dawn light reflects across the shallow water and slick sand, creating a cinematic mystery-adventure mood. There is no dynamic camera movement and no action taken by the unseen observer; the world moves independently as waves wash over the first part of the mark, palm fronds sway, birds circle over the ocean, and clouds drift through the glowing sky.

Prompt

Prompt A first-person view from a strictly stationary observation point at a T-shaped intersection inside an abandoned underground sci-fi research facility. The spatial layout splits into three distinct routes: a left corridor submerged in shallow water, a right corridor filled with warm steam, and a heavy circular metal door straight ahead, half-open onto deep darkness. The architecture is industrial and bunker-like, with ribbed metal tunnel frames, stained concrete walls, corroded panels, loose floor cables, broken glass, and wet reflective plating. Cold blue emergency lights wash the flooded passage, while orange warning lamps burn through haze on the opposite side, creating a tense, cinematic sci-fi horror atmosphere. There is no dynamic camera movement and no action taken by the unseen observer; the world moves independently as ceiling lights flicker, water ripples across the floor, steam drifts through the right corridor, and dust motes float in the stale air.

Prompt

Prompt A first-person view from a strictly stationary observation point at a T-shaped intersection inside an abandoned underground sci-fi research facility. The spatial layout splits into three distinct routes: a left corridor submerged in shallow water, a right corridor filled with warm steam, and a heavy circular metal door straight ahead, half-open onto deep darkness. The architecture is industrial and bunker-like, with ribbed metal tunnel frames, stained concrete walls, corroded panels, loose floor cables, broken glass, and wet reflective plating. Cold blue emergency lights wash the flooded passage, while orange warning lamps burn through haze on the opposite side, creating a tense, cinematic sci-fi horror atmosphere. There is no dynamic camera movement and no action taken by the unseen observer; the world moves independently as ceiling lights flicker, water ripples across the floor, steam drifts through the right corridor, and dust motes float in the stale air.

Prompt

Prompt A first-person view from a strictly stationary observation point at a T-shaped intersection inside an abandoned underground sci-fi research facility. The spatial layout splits into three distinct routes: a left corridor submerged in shallow water, a right corridor filled with warm steam, and a heavy circular metal door straight ahead, half-open onto deep darkness. The architecture is industrial and bunker-like, with ribbed metal tunnel frames, stained concrete walls, corroded panels, loose floor cables, broken glass, and wet reflective plating. Cold blue emergency lights wash the flooded passage, while orange warning lamps burn through haze on the opposite side, creating a tense, cinematic sci-fi horror atmosphere. There is no dynamic camera movement and no action taken by the unseen observer; the world moves independently as ceiling lights flicker, water ripples across the floor, steam drifts through the right corridor, and dust motes float in the stale air.

Prompt

Refiner effect

Stage 1 Refined

Prompt

A first-person view from a strictly stationary observation point inside a vast jungle canyon. The spatial layout forms a deep natural corridor between steep vine-covered cliffs, opening toward a massive waterfall that veils an ancient stone temple barely visible behind the falling water. A small folded paper airplane occupies the foreground as part of the scene, framed by colorful birds and drifting leaves. Materials feel vivid and layered: crisp paper folds, wet stone walls, tangled vines, glossy foliage, suspended droplets, and weathered carved masonry softened by spray. Bright daylight filters through the canyon, creating mist, rainbow light, and a whimsical adventure atmosphere. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as the waterfall crashes, mist billows, birds flap across the canyon, leaves tumble, droplets sparkle, and the paper plane quivers in the air currents.

Stage 1 Refined

Prompt

A first-person view from a strictly stationary observation point on an elevated forest ledge before a massive ancient doorway carved into a cliff. Stone steps form a clear approach from mossy path to half-open doors, flanked by carved pillars, guardian statues, ivy-covered walls, and a sweeping mountain valley to the left. A cloaked traveler stands near the entrance as a scale marker. Surfaces are textured with cracked stone slabs, engraved masonry, damp moss, tangled vines, fallen leaves, weathered statues, and glowing turquoise runes etched across the ground. Warm late-day sunlight mixes with intense cyan light spilling from the doorway, creating a tense fantasy atmosphere of discovery. There is no dynamic camera movement and no action taken by the person filming; the perspective remains fixed. Autonomous motion continues as leaves swirl, birds scatter from the cliff, vines tremble, and the portal light pulses softly.

Stage 1 Refined

Prompt

A first-person view from a strictly stationary observation point overlooks an abandoned winter campsite in a snowy pine basin beneath a rocky mountain slope. The spatial layout places a glowing campfire and scattered gear in the left foreground, a lone mountaineer in an orange jacket near center, and a footprint trail crossing open snow toward a cave cut into the cliff. Distant peaks and dark conifers frame the depth. Materials are cold and tactile: powdery snow, ice-dusted rocks, singed firewood, metal cups, canvas supplies, frosted backpack fabric, and rough bark. Purple dusk light cools the landscape, while fire embers and golden cave light add uneasy warmth. There is no dynamic camera movement and no action by the unseen observer; the world moves independently as snow falls, pine branches sway, embers flicker, steam rises from the cup, and the cave glow pulses softly.

Stage 1 Refined

Prompt

A first-person view from a strictly stationary observation point inside a submerged ancient temple. A clear stone walkway runs between coral-encrusted columns toward a cracked ceremonial wall, where bright green light leaks through the central fracture and aligns with glowing symbols on the floor. A small spherical diving robot hovers ahead, its blue front light illuminating broken statues, scattered masonry, sand patches, seaweed, purple coral, and barnacle-covered pillars. The textures are eroded and waterworn, with carved stone softened by algae, silt, and marine growth. Shafts of sunlight descend from the surface, mixing with the green fissure glow to create a calm, mysterious underwater atmosphere. The observer's perspective remains fixed, with no dynamic camera movement and no actions taken by the person recording. Autonomous motion continues as fish swim, bubbles rise, particles drift, seaweed sways, caustics shimmer, and the green symbols pulse faintly.

Citation

Copy BibTeX

@article{zhu2026sanawm,
title = {{SANA-WM}: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer},
author = {Zhu, Haoyi and Liu, Haozhe and Zhao, Yuyang and Ye, Tian and Chen, Junsong and Yu, Jincheng and He, Tong and Han, Song and Xie, Enze},
journal = {arXiv preprint arXiv:2605.15178},
year = {2026},
}

All videos on this page are produced by the bidirectional variant of SANA-WM followed by the second-stage long-video refiner.

First-frame images for all demo videos in this gallery were generated with OpenAI’s GPT Image 2 and Google’s Nano Banana Pro; SANA-WM animates the still into a minute-long video.

Hero reel music: “Immersed” by Kevin MacLeod (incompetech.com), licensed under CC BY 4.0. The site player is muted; the downloaded mp4 carries the track.
