FlyMirage: A Fully Automated Generation Pipeline for Diverse and Scalable UAV Flight Data via Generative World Model

Abstract (EN)

In the field of Vision-Language Navigation (VLN), aerial datasets remain limited in their ability to combine scale, diversity, and realism, often relying on either costly real-world scenes or visually limited simulations. To address these challenges, we introduce FlyMirage, a highly scalable and fully automated data generation pipeline for aerial VLN. Our approach leverages large language models (LLM) as an environment designer to promote scene diversity, paired with a generative world model that instantiates these designs into high-fidelity 3D Gaussian Splatting (3DGS) scenes. To substantially reduce human labor and ensure the feasibility of flight data, FlyMirage automates scene exploration and semantic information acquisition, and further integrates a dynamically feasible planner for uncrewed aerial vehicle (UAV) trajectory generation. Utilizing this toolchain, we generate a large-scale, diverse, and photorealistic aerial VLN dataset, with dynamically feasible flying trajectories, designed to support the development of next-generation embodied navigation models.

摘要 (ZH)

在视觉语言导航(VLN)领域,现有的空中数据集在规模、多样性和真实性方面仍存在局限,往往依赖于成本高昂的真实场景或视觉上受限的仿真环境。为解决这些问题,我们提出了FlyMirage,一种高度可扩展且全自动的空中VLN数据生成管道。该方法利用大型语言模型(LLM)作为环境设计器以增强场景多样性,并结合生成式世界模型将这些设计实例化为高保真三维高斯泼溅(3DGS)场景。为大幅减少人工干预并确保飞行数据的可行性,FlyMirage实现了场景探索与语义信息获取的自动化,并进一步集成了适用于无人飞行器(UAV)轨迹生成的动态可行规划器。借助该工具链,我们生成了一个大规模、多样化且具有照片级真实感的空中VLN数据集,包含动态可行的飞行轨迹,旨在支持下一代具身导航模型的发展。

← Back