AerialVLN: Vision-and-Language Navigation for UAVs

📘 Reference Information

Title: AerialVLN: Vision-and-Language Navigation for UAVs
Authors: Liu, Shubo; Zhang, Hongsheng; Qi, Yuankai; Wang, Peng; Zhang, Yanning; Wu, Qi
Publication: (2023)
Citekey: liuAerialVLNVisionandLanguageNavigation2023
DOI: 10.1109/ICCV51070.2023.01411
Links: Online | PDF

🧾 Metadata

Start date: 2025-10-01
End date:
Page range: 15384
Keywords: #obscite

🧠 Abstract / Summary

简要概述研究的背景、目标、方法、结果与结论（建议 3–5 句）。

🔍 Key Concepts

核心概念	说明
Problem
Method / Model
Result
Contribution

💬 Highlights & Annotations

%% begin annotations %%

Imported on 2025-11-01 10:41 晚上

[!quote|#5fb236]+ 📗 Reference
(p. 15384)

💭 nothing

[!quote|#ff6666]+ ⚠️ Critique Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning. To fill this gap and facilitate research in this field, we propose a new task named AerialVLN, which is UAV-based and towards outdoor environments.
(p. 15384)

[!quote|#a28ae5]+ 🔧 Method scenarios
(p. 15384)

[!quote|#ffd400]+ 📌 Important
(p. 15384)

💭 无

[!quote|#ffd400]+ 📌 Important
(p. 15385)

[!quote|#ffd400]+ 📌 Important To release humans from manually operating UAVs and to fill the research gap in the field of navigation in the sky, we propose a city-level UAV-based vision-andlanguage navigation task, named AerialVLN, and a corresponding dataset.
(p. 15385)

[!quote|#5fb236]+ 📗 Reference On average, up to 83 words are in each instruction, involving a large vocabulary of 4,470 words. Finally, we evaluate five baselines, including two golden standard VLN models in VLN, Seq2Seq model and cross-modal matching (CMA) model, and our proposed model to serve as starting baselines on AerialVLN.
(p. 15385)

[!quote|#5fb236]+ 📗 Reference In this section, we review two types of closely related work: UAV navigation and Ground-based VLN.
(p. 15385)

[!quote|#ffd400]+ 📌 Important
(p. 15386)

— %% end annotations %%

🧩 Reflections / Insights

这篇文献的核心创新是什么？
与已有研究相比，它的主要改进点在哪？
可能的局限性或未来方向？

🔗 Connections

Related Works:
Relevance to My Research:

🧾 Citation

[1]

S. Liu, H. Zhang, Y. Qi, P. Wang, Y. Zhang和Q. Wu, 《AerialVLN: Vision-and-Language Navigation for UAVs》, 收入 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France: IEEE, 10月 2023, 页 15338～15348. doi: 10.1109/ICCV51070.2023.01411 .

%% Import Date: 2025-11-01T22:42:02.266+08:00 %%