Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.17283
Cited By
v1
v2
v3 (latest)
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
3 January 2025
Lijie Tao
Han Zhang
Haizhao Jing
Yu Liu
Kelu Yao
Guoting Wei
Xizhe Xue
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques"
50 / 64 papers shown
Title
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
Dawei Yan
Pengcheng Li
Yang Li
Hao Chen
Qingguo Chen
Weihua Luo
Wei Dong
Qingsen Yan
Haokui Zhang
Chunhua Shen
3DV
VLM
69
5
0
15 Sep 2024
ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning
Pei Deng
Wenqian Zhou
Hanlin Wu
53
3
0
13 Sep 2024
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding
Xiang Li
Jian Ding
Mohamed Elhoseiny
CoGe
65
33
0
18 Jun 2024
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding
Junwei Luo
Zhen Pang
Yongjun Zhang
Tingzhu Wang
Linlin Wang
...
Jiangwei Lao
Jian Wang
Jingdong Chen
Yihua Tan
Yansheng Li
85
27
0
14 Jun 2024
ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization
Chen Mao
Jingqi Hu
89
5
0
04 Jun 2024
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
Ling Li
Yu Ye
Bingchuan Jiang
Wei Zeng
VLM
LRM
66
10
0
03 Jun 2024
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation
Danpei Zhao
Bo Yuan
Ziqiang Chen
Tian Li
Zhuoran Liu
Wentao Li
Yue Gao
122
10
0
06 Apr 2024
Large Language Models for Captioning and Retrieving Remote Sensing Images
João Daniel Silva
João Magalhães
D. Tuia
Bruno Martins
81
29
0
09 Feb 2024
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar
Zhenshi Li
Feng-Xue Gu
Xue-liang Zhang
Pengfeng Xiao
132
62
0
04 Feb 2024
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
Wei Zhang
Miaoxin Cai
Tong Zhang
Zhuang Yin
Xuerui Mao
98
100
0
30 Jan 2024
SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
MLLM
117
47
0
18 Jan 2024
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Zhecheng Wang
R. Prabha
Tianyuan Huang
Jiajun Wu
Ram Rajagopal
67
65
0
20 Dec 2023
MetaSegNet: Metadata-collaborative Vision-Language Representation Learning for Semantic Segmentation of Remote Sensing Images
Libo Wang
Sijun Dong
Ying Chen
Xiaoliang Meng
Shenghui Fang
Ayman Habib
Songlin Fei
33
5
0
20 Dec 2023
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
Junjue Wang
Zhuo Zheng
Zihang Chen
A. Ma
Yanfei Zhong
39
24
0
19 Dec 2023
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan Liu
Yiwei Ma
Xiaoqing Zhang
Haowei Wang
Jiayi Ji
Xiaoshuai Sun
Rongrong Ji
89
46
0
19 Dec 2023
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
Xin Guo
Jiangwei Lao
Bo Dang
Yingying Zhang
Lei Yu
...
Jian Wang
Jingdong Chen
Ming Yang
Yongjun Zhang
Yansheng Li
104
129
0
15 Dec 2023
Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
Utkarsh Mall
Cheng Perng Phoo
Meilin Kelsey Liu
Carl Vondrick
B. Hariharan
Kavita Bala
VLM
64
42
0
12 Dec 2023
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
Kartik Kuckreja
M. S. Danish
Muzammal Naseer
Abhijit Das
Salman Khan
Fahad Shahbaz Khan
86
154
0
24 Nov 2023
SpectralGPT: Spectral Remote Sensing Foundation Model
Danfeng Hong
Bing Zhang
Xuyang Li
Yuxuan Li
Chenyu Li
...
Xiuping Jia
Antonio J. Plaza
Paolo Gamba
J. Benediktsson
J. Chanussot
94
419
0
13 Nov 2023
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting
Lei Li
96
24
0
24 Oct 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
241
471
0
14 Oct 2023
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization
V. Cepeda
Gaurav Kumar Nayak
Mubarak Shah
51
102
0
27 Sep 2023
RSGPT: A Remote Sensing Vision Language Model and Benchmark
Yuan Hu
Jianlong Yuan
Congcong Wen
Xiaonan Lu
Xiang Li
VLM
76
113
0
28 Jul 2023
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
Fan Liu
Delong Chen
Zhan-Rong Guan
Xiaocong Zhou
Jiale Zhu
Qiaolin Ye
Liyong Fu
Jun Zhou
VLM
106
222
0
19 Jun 2023
RRSIS: Referring Remote Sensing Image Segmentation
Zhenghang Yuan
Lichao Mou
Yuansheng Hua
Xiao Xiang Zhu
84
37
0
14 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
426
4,422
0
09 Jun 2023
S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions
Sangwoo Mo
Minkyu Kim
Kyungmin Lee
Jinwoo Shin
VLM
CLIP
112
25
0
23 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLM
VLM
139
2,095
0
11 May 2023
SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models
Jonathan Roberts
Kai Han
Samuel Albanie
VLM
81
14
0
23 Apr 2023
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
569
4,910
0
17 Apr 2023
APPLeNet: Visual Attention Parameterized Prompt Learning for Few-Shot Remote Sensing Image Generalization using CLIP
Mainak Singha
Ankit Jha
Bhupendra S. Solanki
Shirsha Bose
Biplab Banerjee
VLM
107
27
0
12 Apr 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIP
VLM
149
512
0
27 Mar 2023
Towards Geospatial Foundation Models via Continual Pretraining
Matías Mendieta
Boran Han
Xingjian Shi
Yi Zhu
Chen Chen
VLM
AI4CE
97
73
0
09 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
429
4,642
0
30 Jan 2023
MGeo: Multi-Modal Geographic Pre-Training Method
Ruixue Ding
Boli Chen
Pengjun Xie
Fei Huang
Xin Li
Qiang-Wei Zhang
Yao Xu
60
18
0
11 Jan 2023
SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding
Favyen Bastani
Piper Wolters
Ritwik Gupta
Joe Ferdinando
Aniruddha Kembhavi
70
109
0
28 Nov 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
127
117
0
23 Oct 2022
Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern Hopfield Networks
Yonghao Xu
Weikang Yu
Pedram Ghamisi
Michael K Kopp
Sepp Hochreiter
58
33
0
08 Aug 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
477
7,819
0
11 Nov 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
490
10,496
0
17 Jun 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
291
2,521
0
20 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
465
21,566
0
25 Mar 2021
FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery
Xian Sun
Peijin Wang
Zhiyuan Yan
F. Xu
Ruiping Wang
...
Tao Xu
M. Weinmann
Stefan Hinz
Cheng Wang
Kun Fu
ObjD
AI4TS
66
368
0
09 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
972
29,810
0
26 Feb 2021
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
389
6,802
0
23 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
673
41,430
0
22 Oct 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
880
42,379
0
28 May 2020
RSVQA: Visual Question Answering for Remote Sensing Data
Sylvain Lobry
Diego Marcos
J. Murray
D. Tuia
109
221
0
16 Mar 2020
Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark
Ke Li
G. Wan
Gong Cheng
L. Meng
Junwei Han
64
1,459
0
31 Aug 2019
iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images
Syed Waqas Zamir
Aditya Arora
Akshita Gupta
Salman Khan
Guolei Sun
Fahad Shahbaz Khan
Fan Zhu
Ling Shao
Guisong Xia
X. Bai
SSeg
VLM
76
348
0
30 May 2019
1
2
Next