Logo Genie Centurion
Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance

Wenhao Wang*             Jianheng Song*             Chiming Liu*            
Jiayao Ma           Siyuan Feng           Jingyuan Wang           Yuxin Jiang           Kylin Chen
Sikang Zhan           Yi Wang           Tong Meng           Modi Shi           Xindong He
Guanghui Ren           Yang Yang           Maoqing Yao

AgiBot
* Equal contribution
Corresponding author

Overview

Genie Centurion (GCENT) is a system that enable one person guide many robots, using model-assisted teleoperation, and collect data in closed-loop.

While Vision-Language-Action (VLA) models show strong generalizability in various tasks, real-world deployment of robotic policy still requires large-scale, high-quality human expert demonstrations. However, passive data collection via human teleoperation is costly, hard to scale, and often biased toward passive demonstrations with limited diversity. To address this, we propose Genie Centurion (GCENT), a scalable and general data collection paradigm based on human rewind-and-refine guidance. When the robot execution failures occur, GCENT enables the system revert to a previous state with a rewind mechanism, after which a teleoperator provides corrective demonstrations to refine the policy. This framework supports a one-human-to-many-robots supervision scheme with a Task Sentinel module, which autonomously predicts task success and solicits human intervention when necessary, enabling scalable supervision. Empirical results show that GCENT achieves up to 40% higher task success rates than state-of-the-art data collection methods, and reaches comparable performance using less than half the data. We also quantify the data yield-to-effort ratio under multi-robot scenarios, demonstrating GCENT's potential for scalable and cost-efficient robot policy training in real-world environments.

Data Collection

Direct Guidance

Rewind-and-Refine Guidance

Autonomous Execution

Long-horizon Task: Microwave

Long-horizon Task: Sandwich

High-precision Task: Insertion

High-precision Task: Typing

At 4x Speed

Instruction Following

AGIBOT

AI

AGO

BIG

BIT

GO

At 2x Speed

Results

Intervention rates decreased consistently across all tasks as iteration rounds progressed.

Comparison of data efficiency across methods. GCENT achieves 0.9+ task score with significantly fewer frames. At the same frame count, GCENT improves model performance by an average of 40%; at the same performance level, GCENT requires only 44.5% of the frames compared to passive data collection on average.

BibTeX


  @misc{wang2025geniecenturionacceleratingscalable,
      title={Genie Centurion: Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance}, 
      author={Wenhao Wang and Jianheng Song and Chiming Liu and Jiayao Ma and Siyuan Feng and Jingyuan Wang and Yuxin Jiang and Kylin Chen and Sikang Zhan and Yi Wang and Tong Meng and Modi Shi and Xindong He and Guanghui Ren and Yang Yang and Maoqing Yao},
      year={2025},
      eprint={2505.18793},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2505.18793}, 
  }