Genie Centurion

Overview

Genie Centurion (GCENT) is a system that enable one person guide many robots, using model-assisted teleoperation, and collect data in closed-loop.

While Vision-Language-Action (VLA) models show strong generalizability in various tasks, real-world deployment of robotic policy still requires large-scale, high-quality human expert demonstrations. However, passive data collection via human teleoperation is costly, hard to scale, and often biased toward passive demonstrations with limited diversity. To address this, we propose Genie Centurion (GCENT), a scalable and general data collection paradigm based on human rewind-and-refine guidance. When the robot execution failures occur, GCENT enables the system revert to a previous state with a rewind mechanism, after which a teleoperator provides corrective demonstrations to refine the policy. This framework supports a one-human-to-many-robots supervision scheme with a Task Sentinel module, which autonomously predicts task success and solicits human intervention when necessary, enabling scalable supervision. Empirical results show that GCENT achieves up to 40% higher task success rates than state-of-the-art data collection methods, and reaches comparable performance using less than half the data. We also quantify the data yield-to-effort ratio under multi-robot scenarios, demonstrating GCENT's potential for scalable and cost-efficient robot policy training in real-world environments.

Data Collection

Direct Guidance

Rewind-and-Refine Guidance

Autonomous Execution

Long-horizon Task: Microwave

Long-horizon Task: Sandwich

High-precision Task: Insertion

High-precision Task: Typing

At 4x Speed

Instruction Following

AGIBOT

AI

AGO

BIG

BIT

GO

At 2x Speed

Results

Intervention rates decreased consistently across all tasks as iteration rounds progressed.

Comparison of data efficiency across methods. GCENT achieves 0.9+ task score with significantly fewer frames. At the same frame count, GCENT improves model performance by an average of 40%; at the same performance level, GCENT requires only 44.5% of the frames compared to passive data collection on average.

BibTeX


      @article{wang2025genie,
        title={Genie Centurion: Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance},
        author={Wang, Wenhao and Song, Jianheng and Liu, Chiming and Ma, Jiayao and Feng, Siyuan and Wang, Jingyuan and Jiang, Yuxin and Chen, Kylin and Zhan, Sikang and Wang, Yi and others},
        journal={arXiv preprint arXiv:2505.18793},
        year={2025}
      }

Genie Centurion Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance