Interactive world models are advancing rapidly, yet no unified standard exists for systematic evaluation. WBench fills this gap with 289 multi-turn cases across 5 dimensions — evaluating 20 models with 22 metrics validated against human judgments. We find that no single model dominates all dimensions.
Multi-turn
Navi + action + event + PS
Navigation
W/A/S/D/left/right/up/down
Subject Action
Character action
Event Editing
Environment change
Perspective Switch
FP ↔ TP
Key Findings
1🏆
No model dominates all dimensions. Among 20 evaluated models, including commercial APIs (Kling 3.0, Seedance 1.5), open-source models (Wan 2.7, HY-Video, Cosmos), and closed-source beta world models (Genie 3, Happy Oyster, HY-World), each excels in different aspects. Kling 3.0 leads overall but lags in Consistency; HY-Video ranks 1st in Consistency among text-conditioned models but struggles with Interaction; world models like Happy Oyster and HY-World dominate Navigation yet underperform in Video Quality.
2🧭
Navigation is largely independent of other capabilities. Among text-conditioned models, YUME 1.5 achieves the highest navigation score (72.0) yet ranks near bottom on event editing (57.8) and perspective switching (16.7). Conversely, Wan 2.7 leads in event editing (84.0) and subject action (83.4) but scores only 66.0 on navigation. This suggests navigation and semantic interaction require fundamentally different internal representations.
3🎬
Camera control does not imply subject control. Camera-conditioned world models (InSpatio, LingBot, HY-World) achieve high perspective consistency and navigation scores, but action-conditioned models (Genie 3, Happy Oyster, MatrixGame) better handle perspective switch. The two control paradigms remain orthogonal.
4⚙
Physical correctness follows rendering quality, not control ability. Models with higher video quality tend to produce more physically plausible outputs (correlation ρ=0.82), while control ability (navigation, interaction) shows near-zero correlation with physics scores, suggesting physics emerges from visual fidelity rather than world understanding.
5🔄
Multi-turn interactions compound errors. Navigation accuracy drops -21 points from turn 1 to turn 4 as errors compound across steps. Dedicated world models (HY-World) degrade much less than text-conditioned models (Kling 3.0), suggesting explicit geometric control better preserves spatial state than text-based prompting.
Leaderboard
Click column headers to sort. Scores 0–100, higher is better. # = per-metric rank.
QualitySettingInteractionConsistencyPhysicalHover abbreviations for full names
Split:
Metric:
Type:
#
Model
Average ↕
Quality ↕
Setting ↕
Interaction ↕
Consistency ↕
Physical ↕
🥇
Kling 3.0 Kling AI · API
79.2
83.0
91.0
70.3
82.5
69.3
🥈
LingBot-World Ant Group · Open Source
78.8
81.5
72.6
79.8
88.9
71.2
🥉
Wan 2.7 Alibaba · Open Source
78.5
82.6
91.4
66.0
80.5
71.8
4
HY-World 1.5 Tencent · Open Source
78.4
80.2
72.2
87.5
86.0
66.3
5
HY-Video 1.5 Tencent · Open Source
78.2
79.7
85.6
71.8
86.7
67.4
6
Happy Oyster Alibaba · Web
77.1
79.3
74.2
85.1
83.3
63.5
7
Seedance 1.5 ByteDance · API
76.5
83.2
82.9
68.0
80.2
68.4
8
Cosmos 2.5 NVIDIA · Open Source
75.2
75.6
83.3
64.1
85.6
67.4
9
LTX 2.3 Lightricks · Open Source
74.4
78.7
85.2
67.6
75.6
64.9
10
InSpatio-World InSpatio · Open Source
74.3
74.9
71.4
72.8
87.4
65.2
11
Fantasy-World Amap · Open Source
74.2
75.5
71.3
72.1
85.3
66.8
12
Genie 3 Google · Web
74.1
77.4
72.5
73.3
81.4
65.7
13
LongCat-Video Meituan · Open Source
73.7
78.2
72.3
63.1
85.9
68.9
14
YUME 1.5 Shanghai AI Lab · Open Source
73.5
79.5
72.4
72.0
78.6
65.2
15
Infinite-World Nankai · Open Source
72.9
78.7
69.3
75.9
78.7
62.1
16
MatrixGame3 Skywork · Open Source
71.2
76.9
63.6
83.5
72.9
59.3
17
Kairos 3.0 SenseTime · Open Source
70.7
76.4
70.3
65.1
81.4
60.4
18
HY-GameCraft Tencent · Open Source
68.5
74.9
66.6
67.8
70.6
62.4
19
MatrixGame2 Skywork · Open Source
68.5
75.7
67.1
80.6
62.0
57.2
20
Astra Tsinghua · Open Source
64.0
69.7
59.6
67.7
71.6
51.4
#
Model
Average ↕
Quality ↕
Setting ↕
Interaction ↕
Consistency ↕
Physical ↕
🥇
Kling 3.0 Kling AI · API
79.4
80.0
91.0
73.1
83.9
69.2
🥈
Wan 2.7 Alibaba · Open Source
78.4
81.0
91.4
72.1
75.8
71.6
🥉
Seedance 1.5 ByteDance · API
76.2
81.9
82.9
68.3
79.9
68.2
4
HY-Video 1.5 Tencent · Open Source
74.3
76.6
85.6
54.7
87.5
67.1
5
LTX 2.3 Lightricks · Open Source
70.9
77.0
85.2
49.4
78.0
65.1
6
Cosmos 2.5 NVIDIA · Open Source
70.4
71.7
83.3
43.5
86.3
67.0
7
LongCat-Video Meituan · Open Source
69.9
77.2
72.3
45.1
86.6
68.4
8
YUME 1.5 Shanghai AI Lab · Open Source
68.9
77.6
72.4
48.4
80.9
65.4
9
Kairos 3.0 SenseTime · Open Source
65.7
73.1
70.3
41.6
83.2
60.5
#
Model
Aesth
Imag
BgCon
Flick
Dyn
Smooth
HPSv3
Scene
Subj
Navi
Spat
GSp
Persp
Seg
Geo
Photo
SubjC
VPlaus
CFid
🥇
HY-Video 1.5 Tencent · Open Source
63.4
67.4
92.1
94.2
73.9
98.7
68.0
77.5
93.6
71.8
79.2
75.1
86.6
99.4
94.6
80.3
91.6
59.7
75.0
🥈
Kling 3.0 Kling AI · API
63.0
68.1
92.3
93.2
97.5
97.6
69.1
89.0
92.9
70.3
75.2
75.1
76.8
93.0
88.9
79.9
88.5
60.7
78.0
🥉
Cosmos 2.5 NVIDIA · Open Source
61.8
66.9
92.3
94.8
49.0
98.2
66.5
72.4
94.2
64.1
78.1
74.3
84.3
94.3
94.6
81.6
92.3
60.1
74.7
4
LTX 2.3 Lightricks · Open Source
57.9
61.0
88.3
93.2
98.1
96.4
56.1
81.3
89.2
67.6
70.2
70.2
69.8
75.8
76.9
79.2
87.2
55.7
74.0
5
Seedance 1.5 ByteDance · API
61.0
69.3
89.6
92.4
99.4
97.5
73.0
71.6
94.2
68.0
72.7
72.4
70.5
96.2
82.4
76.8
90.1
60.7
76.0
6
Wan 2.7 Alibaba · Open Source
61.4
68.0
89.4
92.2
100.0
96.3
71.1
88.3
94.6
66.0
71.0
71.0
78.2
92.4
83.7
76.4
90.7
60.3
83.3
7
Kairos 3.0 SenseTime · Open Source
59.9
62.7
91.1
95.4
70.1
97.5
58.5
52.2
88.5
65.1
76.8
62.0
76.3
94.3
89.0
80.8
90.8
58.0
62.7
8
LongCat-Video Meituan · Open Source
66.5
69.6
95.1
94.8
45.9
97.9
77.6
53.1
91.5
63.1
83.3
66.2
81.5
99.4
95.4
82.2
93.4
61.8
76.0
9
YUME 1.5 Shanghai AI Lab · Open Source
58.7
63.3
90.3
93.0
96.8
97.0
57.0
53.1
91.7
72.0
71.5
71.4
48.0
99.4
88.0
83.3
88.8
57.7
72.7
10
Astra Tsinghua · Open Source
48.6
52.5
85.3
96.0
79.6
97.7
28.0
43.4
75.9
67.7
64.7
63.3
30.0
86.6
85.6
87.5
83.5
54.6
48.3
11
Fantasy-World Amap · Open Source
63.0
62.8
94.2
95.8
49.0
97.9
65.8
52.4
90.1
72.1
80.6
64.2
79.8
100.0
95.3
84.8
92.5
59.7
74.0
12
HY-GameCraft Tencent · Open Source
52.6
58.7
86.5
93.7
96.8
97.6
38.3
50.6
82.5
67.8
60.5
60.5
17.9
99.4
88.3
85.0
82.6
56.5
68.3
13
Genie 3 Google · Web
51.6
59.3
90.7
95.0
92.4
97.8
55.2
61.1
83.8
73.3
79.9
78.4
54.5
93.6
88.6
84.5
90.4
59.7
71.7
14
Happy Oyster Alibaba · Web
56.6
63.9
91.4
94.0
94.2
97.0
58.3
57.4
91.1
85.1
77.7
75.8
75.0
96.2
87.2
79.8
91.5
57.6
69.3
15
HY-World 1.5 Tencent · Open Source
60.1
65.4
92.7
93.5
91.1
98.1
60.5
53.5
90.8
87.5
90.6
84.9
62.5
100.0
92.0
83.1
89.1
58.6
74.0
16
Infinite-World Nankai · Open Source
58.7
66.1
88.8
94.1
82.8
98.0
62.3
54.0
84.5
75.9
74.9
74.4
33.8
100.0
94.3
85.1
88.4
57.2
67.0
17
InSpatio-World InSpatio · Open Source
64.4
67.6
95.0
96.0
26.1
98.8
76.1
51.7
91.1
72.8
93.8
66.5
72.5
100.0
97.3
87.4
94.4
63.1
67.3
18
LingBot-World Ant Group · Open Source
66.9
67.9
96.9
94.1
66.2
96.9
81.4
51.6
93.6
79.8
92.7
67.1
90.9
99.4
95.4
83.3
93.5
64.8
77.7
19
MatrixGame2 Skywork · Open Source
54.0
60.3
86.9
94.6
94.9
98.2
41.0
49.4
84.9
80.6
64.5
64.5
29.2
21.0
86.1
81.3
87.2
55.0
59.3
20
MatrixGame3 Skywork · Open Source
46.4
70.0
85.7
86.3
97.5
95.4
57.1
48.9
78.4
83.5
81.0
80.4
13.3
89.8
87.6
75.3
83.0
54.0
64.7
#
Model
Aesth
Imag
BgCon
Flick
Dyn
Smooth
HPSv3
Scene
Subj
Navi
EE
SA
PS
Spat
GSp
Persp
Seg
Geo
Photo
SubjC
VPlaus
CFid
🥇
HY-Video 1.5 Tencent · Open Source
61.9
67.4
92.4
95.5
68.8
98.8
67.5
77.5
93.6
71.8
63.8
55.6
27.6
79.2
75.1
86.6
99.3
94.4
81.4
91.5
59.3
75.0
🥈
Kling 3.0 Kling AI · API
61.3
67.7
92.7
94.5
89.9
97.9
68.8
89.0
92.9
70.3
81.4
85.6
55.0
75.2
75.1
76.8
92.7
89.4
80.4
88.5
60.4
78.0
🥉
Cosmos 2.5 NVIDIA · Open Source
60.1
67.2
92.3
96.0
42.4
98.3
65.9
72.4
94.2
64.1
48.2
41.6
20.0
78.1
74.3
84.3
93.1
94.2
82.1
91.8
59.3
74.7
4
LTX 2.3 Lightricks · Open Source
56.9
62.3
89.3
94.1
94.4
96.8
57.7
81.3
89.2
67.6
53.0
51.8
25.0
70.2
70.2
69.8
77.8
81.1
79.4
86.7
56.2
74.0
5
Seedance 1.5 ByteDance · API
59.7
69.8
89.6
93.4
98.3
97.6
72.9
71.6
94.2
68.0
80.4
80.0
45.0
72.7
72.4
62.7
92.4
83.5
76.7
89.3
60.5
76.0
6
Wan 2.7 Alibaba · Open Source
59.6
68.1
89.5
93.0
99.3
96.5
69.4
88.3
94.6
66.0
84.0
83.4
55.0
71.0
71.0
62.2
65.6
82.6
75.5
88.7
59.8
83.3
7
Kairos 3.0 SenseTime · Open Source
58.4
63.6
91.8
96.3
63.5
97.9
58.8
52.2
88.5
65.1
46.8
41.4
13.3
76.8
62.0
76.3
94.1
91.5
82.1
90.7
58.2
62.7
8
LongCat-Video Meituan · Open Source
64.7
69.8
94.7
94.9
59.7
97.7
76.3
53.1
91.5
63.1
50.4
48.4
18.3
83.3
66.2
81.5
98.6
94.7
81.5
92.4
60.8
76.0
9
YUME 1.5 Shanghai AI Lab · Open Source
59.3
65.7
92.0
94.8
86.1
97.7
62.0
53.1
91.7
72.0
57.8
47.0
16.7
71.5
71.4
48.0
99.3
91.1
84.1
89.4
58.1
72.7
Evaluation Metrics
22 metrics across 5 dimensions. All scores normalized to 0–100, higher is better.
Video Quality (7)
AestheticVBench aesthetic scorer
ImagingVBench technical quality
FlickeringInter-frame brightness stability
DynamicRAFT optical flow magnitude
SmoothnessRAFT flow consistency
BackgroundCLIP background similarity
HPSv3Human Preference Score v3
Setting Adherence (2)
SceneVLM: scene elements vs. environment prompt
SubjectVLM: appearance/action vs. character prompt
Same case, different models — see how metrics capture quality differences.
Video Quality (6 sub-metrics)
Prompt: A modern city street in clear daylight. A Shiba Inu with tan-and-cream fur trots forward on a broad asphalt road lined with storefronts. Interactions: W → right → right → left
Prompt: A realistic basketball court, third-person view locked onto player #12 in red, tracking movement across polished hardwood floor with clean markings. Other players in red and blue uniforms move around. Hoop and backboard at far end under even indoor lighting. Interactions: W → left → S
MatrixGame 3.0
Scene 30.0Subject 20.0
Kling 3.0
Scene 100.0Subject 100.0
Navigation Trajectory (3 sub-metrics)
Prompt: A third-person realistic scene on a dirt path between grapevine rows in a Tuscan vineyard in the afternoon. A man in a white linen shirt walks forward. Interactions: W → D → S → A
Happy Oyster
NavScore 94.8Accuracy 98.5Consistency 91.2
Genie 3
NavScore 32.2Accuracy 57.5Consistency 6.9
Navigation Trajectory (3 sub-metrics)
Prompt: First-person view inside a neoclassical museum gallery, facing deeper into the hall. A wide marble staircase descends behind, a broad archway opens into an adjacent exhibition hall. Interactions: left → left → right → right
LongCat-Video
NavScore 56.2Accuracy 19.9Consistency 92.6
MatrixGame 3.0
NavScore 81.4Accuracy 90.0Consistency 72.7
Subject Action Adherence
Prompt: An outdoor concrete basketball court, first-person view. Action (Turn 1): Dribble the basketball with the right hand, bouncing it on the ground several times in place.
YUME 1.5
Score 35.0
Kling 3.0
Score 95.0
Event Edit Adherence
Prompt: A grand magical library interior in CG style. A young wizard in purple hooded robe, seen from behind. Event (Turn 1): The wizard picks up the crystal staff.
LongCat-Video
Score 8.0
Wan 2.7
Score 96.0
Perspective Switch Adherence
Prompt: Large wooden sailing ship at sea during a storm. A pirate captain in a long dark coat with a tricorn hat. Switch: TP → FP → TP
LongCat-Video
Score 0.0
Wan 2.7
Score 100.0
Spatial Consistency & Gated Spatial Consistency
Prompt: A retro 1980s arcade with rows of colorful game cabinets. Neon strip lights in pink and blue. Interactions: A → A → D → D (loop trajectory)
HY-GameCraft
Spatial 8.8Gated 66.8Dynamic 1.0
Fantasy World
Spatial 93.6Gated 7.0Dynamic 7.0
HY-World 1.5
Spatial 87.4Gated 87.4Dynamic 1.0
Note: Spatial Consistency measures frame similarity after a full loop (returning to start). Static videos can hack this metric, so Gated Spatial Consistency weights by motion magnitude — low-dynamic videos get penalized.
Physical: Collision
Prompt: Outdoor basketball court on a sunny afternoon. A standard orange basketball resting on the concrete surface. Interactions: A → A → D → D
Kairos 3.0
Causal Fidelity 10.0Visual Plausibility 60.0
Genie 3
Causal Fidelity 100.0Visual Plausibility 80.0
Physical: Surface Interaction
Prompt: Antarctic ice sheet under bright overcast sky. An emperor penguin with black back and yellow-orange ear patches waddles side to side. Interactions: A → A → D → D