Scalable Behavior Cloning with Open Data, Training, and Evaluation

Abstract (EN)

We introduce ABC, a fully open-source stack for manipulation with behavior cloning. At its core is ABC-130K: the largest open-source teleoperation dataset to date, featuring 3,500 hours of data spanning over 130K episodes across 195 diverse tasks. Furthermore, we open-source our accessible hardware setup, training infrastructure, and simulation pipeline. We also release 400 hours of sim-teleop data and provide a co-training recipe that produces correlated simulation and real-world evaluation, offering a reliable proxy for ablating model-design and training decisions before costly real-world evaluation. We explore various training recipes and compare common architectural choices for Diffusion Transformers (DiT) and Vision-Language-Action (VLA) models, grounding our findings in real-world evaluations. The resulting policies successfully execute dexterous tasks such as box folding and extracting credit cards from wallets. By providing a reproducible toolkit, we aim to place researchers on an equal footing, establishing the necessary foundation to learn the ABCs of Behavior Cloning together as a community.

摘要 (ZH)

我们提出了ABC，一个完全开源的基于行为克隆的操作堆栈。其核心是ABC-130K：迄今为止最大的开源遥操作数据集，包含3500小时数据，涵盖195个不同任务的13万多个回合。此外，我们开源了可获取的硬件设置、训练基础设施和仿真流程。我们还发布了400小时仿真遥操作数据，并提供了一种协同训练方案，可产生相关的仿真与真实世界评估，在为昂贵的真实世界评估进行模型设计与训练决策消融前提供可靠的代理指标。我们探索了多种训练方案，并对比了扩散变换器（DiT）和视觉-语言-动作（VLA）模型的常见架构选择，将发现扎根于真实世界评估。最终策略成功执行了如折纸盒和从钱包中取出信用卡等灵巧任务。通过提供可复现的工具包，我们旨在使研究者处于平等地位，为社区共同学习行为克隆的基础知识奠定必要基础。

← Back