GameGen-X: Interactive Open-world Game Video Generation

Haoxuan Che1*, Xuanhua He2*, Quande Liu3✉, Cheng Jin1, Hao Chen1✉
1Hong Kong Univerity of Science and Technology;
2Univerity of Science and Technology of China;
3The Chinese Univerity of Hong Kong

* Equal Contribution       ✉ Co-corresponding Authors

[Arxiv]      [Github]      [Galleray]      [Huggingface space]     

For any inquiries, please email to: hche@ust.hk, qdliu0226@gmail.com.

Overview of GameGen-X Functionality

Part I for the basic functionality showcase, and Part II for the key features of GameGen-X (0:47).

Demo Shouted out to "Journey to the West"

Abstract

We introduce GameGen-X, the first diffusion transformer model specifically designed for both generating and interactively controlling open-world game videos. This model facilitates high-quality, open-domain generation by simulating an extensive array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, predicting and altering future content based on the current clip, thus allowing for gameplay simulation. To realize this vision, we first collected and built an Open-World Video Game Dataset (OGameData) from scratch. It is the first and largest dataset for open-world game video generation and control, which comprises over one million diverse gameplay video clips sampling from over 150 games with informative captions from GPT-4o. GameGen-$\mathbb{X}$ undergoes a two-stage training process, consisting of foundation model pre-training and instruction tuning. Firstly, the model was pre-trained via text-to-video generation and video continuation, endowing it with the capability for long-sequence, high-quality open-domain game video generation. Further, to achieve interactive controllability, we designed InstructNet to incorporate game-related multi-modal control signal experts. This allows the model to adjust latent representations based on user inputs, unifying character interaction and scene content control for the first time in video generation. During instruction tuning, only the InstructNet is updated while the pre-trained foundation model is frozen, enabling the integration of interactive controllability without loss of diversity and quality of generated video content. GameGen-$\mathbb{X}$ represents a significant leap forward in open-world video game design using generative models. It demonstrates the potential of generative models to serve as auxiliary tools to traditional rendering techniques, effectively merging creative generation with interactive capabilities. The project will be available at https://github.com/GameGen-X/GameGen-X.

High-quality Game Generation

Character Generation    

Environment Generation    

Action Generation    

Event Generation    

Open-domain Generation    

Multi-modality Interactive Control

Structural Instruction Prompts    

Operation Signals    

Video Prompts    

Qualitative Comparision

Generation Comparision    

Control Comparision    

OGameData Showcase


OGameData Summary: OGameData is a comprehensive multi-genre open-world video game dataset, which contains generation and control subsets. Sourcing over 32,000 videos from local engines and the internet, each video ranges from several minutes to several hours in length. The dataset features more than 150 next-generation games across various genres, including open-world RPGs, FPS, racing games, action-puzzle games, and more. It also covers different perspectives (first-person, third-person) and styles (realistic, Eastern traditional, cyberpunk, post-apocalyptic, Western fantasy, etc.). After a rigorous selection process that spanned six months and involved multiple human experts and advanced model algorithms, we have curated over 4,000 hours of high-quality video clips, ranging from 720p to 4k resolution. These segments were meticulously annotated by GPT-4O, providing a rich source of labeled data for training and validation purposes. The OGameData is expected to become an invaluable resource for researchers and developers, enabling the exploration of various applications such as video game generative AI development, interactive control, and immersive virtual environments. Its imminent open-source release will offer the scientific community unprecedented access to a broad spectrum of video game data, fostering innovation and collaboration across multiple disciplines.


OGameData for Generation Training    

OGameData for Instruction Tuning    

Acknowledgements: Our project page is borrowed from DreamBooth.