I describe my solution to the LeHome Challenge 2026, an ICRA 2026 competition on bimanual garment folding. The system placed 1st of 62 teams in the online (simulation) round and 2nd in the real-world final. It improves a vision-language-action (VLA) policy with a reinforcement-learning loop. The policy is its own value function: the same network that predicts actions also predicts success, progress, and a few task-relevant future quantities, and those predictions drive advantage estimation, live failure detection, and candidate selection. The work mostly recombines existing RL ideas with engineering and optimization contributions that can be used together as one recipe or individually: AWR + RECAP combined for flow-matching VLA; an asynchronous distributed training / rollout pipeline through HuggingFace Hub; inference-time hyperparameters optimization via Thompson sampling; a sim-to-real recipe with camera-alignment tooling, heavy augmentation and DAgger-like HIL data collection.
我描述了我对LeHome 2026挑战赛的解决方案,该赛事是ICRA 2026的双臂衣物折叠竞赛。该系统在线上(模拟)轮中位列62支参赛队伍的第一名,并在真实世界决赛中获得第二名。它通过强化学习循环改进了视觉-语言-动作(VLA)策略。该策略本身就是其价值函数:预测动作的同一网络也预测成功率、进度以及若干任务相关的未来量,这些预测用于优势估计、实时故障检测和候选选择。本工作主要是将现有强化学习思想与工程和优化贡献重新结合,这些贡献可作为一个整体配方或单独使用:AWR + RECAP结合用于流匹配VLA;通过HuggingFace Hub的异步分布式训练/推理管线;基于汤普森采样的推理时超参数优化;包含相机对齐工具、强数据增强和类似DAgger的人机交互数据收集的仿真到真实方案。