1

Unifying Embodied World Modeling Through Language-Conditioned Video Gen