Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.12751
Cited By
v1
v2
v3 (latest)
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
21 November 2023
Meng Chu
Zhedong Zheng
Wei Ji
Tingyu Wang
Tat-Seng Chua
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching"
33 / 33 papers shown
Title
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
Dongping Chen
Ruoxi Chen
Shilin Zhang
Yinuo Liu
Yaochen Wang
Huichi Zhou
Qihui Zhang
Yao Wan
Pan Zhou
Lichao Sun
ELM
54
123
0
07 Feb 2024
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
Yanda Li
Chi Zhang
Gang Yu
Zhibin Wang
Bin-Bin Fu
Guosheng Lin
Chunhua Shen
Ling Chen
Yunchao Wei
MLLM
62
31
0
20 Aug 2023
Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
Aayush Dhakal
Adeel Ahmad
Subash Khanal
Srikumar Sastry
Hannah Kerner
Nathan Jacobs
59
13
0
29 Jul 2023
Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
Yue Yu
Yuchen Zhuang
Jieyu Zhang
Yu Meng
Alexander Ratner
Ranjay Krishna
Jiaming Shen
Chao Zhang
ALM
100
234
0
28 Jun 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology
Wisdom O. Ikezogwo
M. S. Seyfioglu
Fatemeh Ghezloo
Dylan Stefan Chan Geva
Fatwir Sheikh Mohammed
Pavan Kumar Anand
Ranjay Krishna
Linda G. Shapiro
CLIP
VLM
303
125
0
20 Jun 2023
Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence
Xiaohan Zhang
Xingyu Li
Waqas Sultani
Yi Zhou
S. Wshah
76
59
0
08 Dec 2022
Language Models are Realistic Tabular Data Generators
V. Borisov
Kathrin Seßler
Tobias Leemann
Martin Pawelczyk
Gjergji Kasneci
LMTD
106
252
0
12 Oct 2022
Generate rather than Retrieve: Large Language Models are Strong Context Generators
Wenhao Yu
Dan Iter
Shuohang Wang
Yichong Xu
Mingxuan Ju
Soumya Sanyal
Chenguang Zhu
Michael Zeng
Meng Jiang
RALM
AIMat
342
336
0
21 Sep 2022
Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image
Yujiao Shi
Hongdong Li
52
81
0
10 Apr 2022
TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization
Sijie Zhu
M. Shah
Chong Chen
ViT
92
160
0
31 Mar 2022
Cross-modal Map Learning for Vision and Language Navigation
G. Georgakis
Karl Schmeckpeper
Karan Wanchoo
Soham Dan
E. Miltsakaki
Dan Roth
Kostas Daniilidis
87
66
0
10 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
557
4,421
0
28 Jan 2022
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng
Xinsong Zhang
Hang Li
VLM
CLIP
87
307
0
16 Nov 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
223
1,979
0
16 Jul 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
470
21,603
0
25 Mar 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
469
3,906
0
11 Feb 2021
A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong
Qi Wu
Yuankai Qi
Cristian Rodriguez-Opazo
Stephen Gould
LM&Ro
104
303
0
26 Nov 2020
VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
Sijie Zhu
Taojiannan Yang
Chong Chen
71
175
0
24 Nov 2020
Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization
Tingyu Wang
Zhedong Zheng
C. Yan
Jiyong Zhang
Yaoqi Sun
Bolun Zheng
Yi Yang
59
169
0
26 Aug 2020
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Arjun Majumdar
Ayush Shrivastava
Stefan Lee
Peter Anderson
Devi Parikh
Dhruv Batra
LM&Ro
171
235
0
30 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
148
1,947
0
13 Apr 2020
University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization
Zhedong Zheng
Yunchao Wei
Yi Yang
55
243
0
27 Feb 2020
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
Weituo Hao
Chunyuan Li
Xiujun Li
Lawrence Carin
Jianfeng Gao
LM&Ro
93
282
0
25 Feb 2020
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks
Fengda Zhu
Yi Zhu
Xiaojun Chang
Xiaodan Liang
LRM
94
243
0
18 Nov 2019
Building Information Modeling and Classification by Visual Learning At A City Scale
Qian Yu
Chaofeng Wang
Barbaros Cetiner
Stella X. Yu
Frank Mckenna
E. Taciroğlu
K. Law
AI4CE
74
21
0
14 Oct 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
83
306
0
12 Sep 2019
Visual Semantic Reasoning for Image-Text Matching
Kunpeng Li
Yulun Zhang
Keqin Li
Yuanyuan Li
Y. Fu
VLM
91
506
0
06 Sep 2019
Lending Orientation to Neural Networks for Cross-view Geo-localization
Liu Liu
Hongdong Li
57
249
0
29 Mar 2019
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
S. Hamid Rezatofighi
Deyuan Li
JunYoung Gwak
Amir Sadeghian
Ian Reid
Silvio Savarese
154
4,186
0
25 Feb 2019
Dual-Path Convolutional Image-Text Embeddings with Instance Loss
Zhedong Zheng
Liang Zheng
Michael Garrett
Yi Yang
Mingliang Xu
Yi-Dong Shen
140
478
0
15 Nov 2017
Wide-Area Image Geolocalization with Aerial Reference Imagery
Scott Workman
Richard Souvenir
Nathan Jacobs
71
329
0
13 Oct 2015
Unsupervised Visual Representation Learning by Context Prediction
Carl Doersch
Abhinav Gupta
Alexei A. Efros
DRL
SSL
171
2,792
0
19 May 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
224
2,497
0
01 Apr 2015
1