ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.12751
  4. Cited By
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with
  Spatial Relation Matching
v1v2v3 (latest)

Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching

21 November 2023
Meng Chu
Zhedong Zheng
Wei Ji
Tingyu Wang
Tat-Seng Chua
ArXiv (abs)PDFHTML

Papers citing "Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching"

33 / 33 papers shown
Title
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with
  Vision-Language Benchmark
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
Dongping Chen
Ruoxi Chen
Shilin Zhang
Yinuo Liu
Yaochen Wang
Huichi Zhou
Qihui Zhang
Yao Wan
Pan Zhou
Lichao Sun
ELM
54
123
0
07 Feb 2024
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
  Image-Dialogue Data
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
Yanda Li
Chi Zhang
Gang Yu
Zhibin Wang
Bin-Bin Fu
Guosheng Lin
Chunhua Shen
Ling Chen
Yunchao Wei
MLLM
62
31
0
20 Aug 2023
Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
Aayush Dhakal
Adeel Ahmad
Subash Khanal
Srikumar Sastry
Hannah Kerner
Nathan Jacobs
59
13
0
29 Jul 2023
Large Language Model as Attributed Training Data Generator: A Tale of
  Diversity and Bias
Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
Yue Yu
Yuchen Zhuang
Jieyu Zhang
Yu Meng
Alexander Ratner
Ranjay Krishna
Jiaming Shen
Chao Zhang
ALM
100
234
0
28 Jun 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology
Quilt-1M: One Million Image-Text Pairs for Histopathology
Wisdom O. Ikezogwo
M. S. Seyfioglu
Fatemeh Ghezloo
Dylan Stefan Chan Geva
Fatwir Sheikh Mohammed
Pavan Kumar Anand
Ranjay Krishna
Linda G. Shapiro
CLIPVLM
303
125
0
20 Jun 2023
Cross-view Geo-localization via Learning Disentangled Geometric Layout
  Correspondence
Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence
Xiaohan Zhang
Xingyu Li
Waqas Sultani
Yi Zhou
S. Wshah
76
59
0
08 Dec 2022
Language Models are Realistic Tabular Data Generators
Language Models are Realistic Tabular Data Generators
V. Borisov
Kathrin Seßler
Tobias Leemann
Martin Pawelczyk
Gjergji Kasneci
LMTD
106
252
0
12 Oct 2022
Generate rather than Retrieve: Large Language Models are Strong Context
  Generators
Generate rather than Retrieve: Large Language Models are Strong Context Generators
Wenhao Yu
Dan Iter
Shuohang Wang
Yichong Xu
Mingxuan Ju
Soumya Sanyal
Chenguang Zhu
Michael Zeng
Meng Jiang
RALMAIMat
342
336
0
21 Sep 2022
Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization
  Using Satellite Image
Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image
Yujiao Shi
Hongdong Li
52
81
0
10 Apr 2022
TransGeo: Transformer Is All You Need for Cross-view Image
  Geo-localization
TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization
Sijie Zhu
M. Shah
Chong Chen
ViT
92
160
0
31 Mar 2022
Cross-modal Map Learning for Vision and Language Navigation
Cross-modal Map Learning for Vision and Language Navigation
G. Georgakis
Karl Schmeckpeper
Karan Wanchoo
Soham Dan
E. Miltsakaki
Dan Roth
Kostas Daniilidis
87
66
0
10 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
557
4,421
0
28 Jan 2022
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual
  Concepts
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng
Xinsong Zhang
Hang Li
VLMCLIP
87
307
0
16 Nov 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
223
1,979
0
16 Jul 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
470
21,603
0
25 Mar 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLMCLIP
469
3,906
0
11 Feb 2021
A Recurrent Vision-and-Language BERT for Navigation
A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong
Qi Wu
Yuankai Qi
Cristian Rodriguez-Opazo
Stephen Gould
LM&Ro
104
303
0
26 Nov 2020
VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
Sijie Zhu
Taojiannan Yang
Chong Chen
71
175
0
24 Nov 2020
Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization
Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization
Tingyu Wang
Zhedong Zheng
C. Yan
Jiyong Zhang
Yaoqi Sun
Bolun Zheng
Yi Yang
59
169
0
26 Aug 2020
Improving Vision-and-Language Navigation with Image-Text Pairs from the
  Web
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Arjun Majumdar
Ayush Shrivastava
Stefan Lee
Peter Anderson
Devi Parikh
Dhruv Batra
LM&Ro
171
235
0
30 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
148
1,947
0
13 Apr 2020
University-1652: A Multi-view Multi-source Benchmark for Drone-based
  Geo-localization
University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization
Zhedong Zheng
Yunchao Wei
Yi Yang
55
243
0
27 Feb 2020
Towards Learning a Generic Agent for Vision-and-Language Navigation via
  Pre-training
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
Weituo Hao
Chunyuan Li
Xiujun Li
Lawrence Carin
Jianfeng Gao
LM&Ro
93
282
0
25 Feb 2020
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning
  Tasks
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks
Fengda Zhu
Yi Zhu
Xiaojun Chang
Xiaodan Liang
LRM
94
243
0
18 Nov 2019
Building Information Modeling and Classification by Visual Learning At A
  City Scale
Building Information Modeling and Classification by Visual Learning At A City Scale
Qian Yu
Chaofeng Wang
Barbaros Cetiner
Stella X. Yu
Frank Mckenna
E. Taciroğlu
K. Law
AI4CE
74
21
0
14 Oct 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
83
306
0
12 Sep 2019
Visual Semantic Reasoning for Image-Text Matching
Visual Semantic Reasoning for Image-Text Matching
Kunpeng Li
Yulun Zhang
Keqin Li
Yuanyuan Li
Y. Fu
VLM
91
506
0
06 Sep 2019
Lending Orientation to Neural Networks for Cross-view Geo-localization
Lending Orientation to Neural Networks for Cross-view Geo-localization
Liu Liu
Hongdong Li
57
249
0
29 Mar 2019
Generalized Intersection over Union: A Metric and A Loss for Bounding
  Box Regression
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
S. Hamid Rezatofighi
Deyuan Li
JunYoung Gwak
Amir Sadeghian
Ian Reid
Silvio Savarese
154
4,186
0
25 Feb 2019
Dual-Path Convolutional Image-Text Embeddings with Instance Loss
Dual-Path Convolutional Image-Text Embeddings with Instance Loss
Zhedong Zheng
Liang Zheng
Michael Garrett
Yi Yang
Mingliang Xu
Yi-Dong Shen
140
478
0
15 Nov 2017
Wide-Area Image Geolocalization with Aerial Reference Imagery
Wide-Area Image Geolocalization with Aerial Reference Imagery
Scott Workman
Richard Souvenir
Nathan Jacobs
71
329
0
13 Oct 2015
Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction
Carl Doersch
Abhinav Gupta
Alexei A. Efros
DRLSSL
171
2,792
0
19 May 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
224
2,497
0
01 Apr 2015
1