ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.17283
  4. Cited By
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
v1v2v3 (latest)

Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques

3 January 2025
Lijie Tao
Han Zhang
Haizhao Jing
Yu Liu
Kelu Yao
Guoting Wei
Xizhe Xue
ArXiv (abs)PDFHTML

Papers citing "Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques"

50 / 64 papers shown
Title
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
Dawei Yan
Pengcheng Li
Yang Li
Hao Chen
Qingguo Chen
Weihua Luo
Wei Dong
Qingsen Yan
Haokui Zhang
Chunhua Shen
3DVVLM
69
5
0
15 Sep 2024
ChangeChat: An Interactive Model for Remote Sensing Change Analysis via
  Multimodal Instruction Tuning
ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning
Pei Deng
Wenqian Zhou
Hanlin Wu
53
3
0
13 Sep 2024
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote
  Sensing Image Understanding
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding
Xiang Li
Jian Ding
Mohamed Elhoseiny
CoGe
65
33
0
18 Jun 2024
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for
  Remote Sensing Vision-Language Understanding
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding
Junwei Luo
Zhen Pang
Yongjun Zhang
Tingzhu Wang
Linlin Wang
...
Jiangwei Lao
Jian Wang
Jingdong Chen
Yihua Tan
Yansheng Li
85
27
0
14 Jun 2024
ProGEO: Generating Prompts through Image-Text Contrastive Learning for
  Visual Geo-localization
ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization
Chen Mao
Jingqi Hu
89
5
0
04 Jun 2024
GeoReasoner: Geo-localization with Reasoning in Street Views using a
  Large Vision-Language Model
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
Ling Li
Yu Ye
Bingchuan Jiang
Wei Zeng
VLMLRM
66
10
0
03 Jun 2024
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal
  Remote Sensing Image Interpretation
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation
Danpei Zhao
Bo Yuan
Ziqiang Chen
Tian Li
Zhuoran Liu
Wentao Li
Yue Gao
122
10
0
06 Apr 2024
Large Language Models for Captioning and Retrieving Remote Sensing
  Images
Large Language Models for Captioning and Retrieving Remote Sensing Images
João Daniel Silva
João Magalhães
D. Tuia
Bruno Martins
81
29
0
09 Feb 2024
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal
  Language Model
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar
Zhenshi Li
Feng-Xue Gu
Xue-liang Zhang
Pengfeng Xiao
132
62
0
04 Feb 2024
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor
  Image Comprehension in Remote Sensing Domain
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
Wei Zhang
Miaoxin Cai
Tong Zhang
Zhuang Yin
Xuerui Mao
98
100
0
30 Jan 2024
SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction
  Tuning with Large Language Model
SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
MLLM
117
47
0
18 Jan 2024
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for
  Remote Sensing
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Zhecheng Wang
R. Prabha
Tianyuan Huang
Jiajun Wu
Ram Rajagopal
67
65
0
20 Dec 2023
MetaSegNet: Metadata-collaborative Vision-Language Representation
  Learning for Semantic Segmentation of Remote Sensing Images
MetaSegNet: Metadata-collaborative Vision-Language Representation Learning for Semantic Segmentation of Remote Sensing Images
Libo Wang
Sijun Dong
Ying Chen
Xiaoliang Meng
Shenghui Fang
Ayman Habib
Songlin Fei
33
5
0
20 Dec 2023
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote
  Sensing Visual Question Answering
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
Junjue Wang
Zhuo Zheng
Zihang Chen
A. Ma
Yanfei Zhong
39
24
0
19 Dec 2023
Rotated Multi-Scale Interaction Network for Referring Remote Sensing
  Image Segmentation
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan Liu
Yiwei Ma
Xiaoqing Zhang
Haowei Wang
Jiayi Ji
Xiaoshuai Sun
Rongrong Ji
89
46
0
19 Dec 2023
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards
  Universal Interpretation for Earth Observation Imagery
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
Xin Guo
Jiangwei Lao
Bo Dang
Yingying Zhang
Lei Yu
...
Jian Wang
Jingdong Chen
Ming Yang
Yongjun Zhang
Yansheng Li
104
129
0
15 Dec 2023
Remote Sensing Vision-Language Foundation Models without Annotations via
  Ground Remote Alignment
Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
Utkarsh Mall
Cheng Perng Phoo
Meilin Kelsey Liu
Carl Vondrick
B. Hariharan
Kavita Bala
VLM
64
42
0
12 Dec 2023
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
Kartik Kuckreja
M. S. Danish
Muzammal Naseer
Abhijit Das
Salman Khan
Fahad Shahbaz Khan
86
154
0
24 Nov 2023
SpectralGPT: Spectral Remote Sensing Foundation Model
SpectralGPT: Spectral Remote Sensing Foundation Model
Danfeng Hong
Bing Zhang
Xuyang Li
Yuxuan Li
Chenyu Li
...
Xiuping Jia
Antonio J. Plaza
Paolo Gamba
J. Benediktsson
J. Chanussot
94
419
0
13 Nov 2023
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought
  Language Prompting
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting
Lei Li
96
24
0
24 Oct 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
241
471
0
14 Oct 2023
GeoCLIP: Clip-Inspired Alignment between Locations and Images for
  Effective Worldwide Geo-localization
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization
V. Cepeda
Gaurav Kumar Nayak
Mubarak Shah
51
102
0
27 Sep 2023
RSGPT: A Remote Sensing Vision Language Model and Benchmark
RSGPT: A Remote Sensing Vision Language Model and Benchmark
Yuan Hu
Jianlong Yuan
Congcong Wen
Xiaonan Lu
Xiang Li
VLM
76
113
0
28 Jul 2023
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
Fan Liu
Delong Chen
Zhan-Rong Guan
Xiaocong Zhou
Jiale Zhu
Qiaolin Ye
Liyong Fu
Jun Zhou
VLM
106
222
0
19 Jun 2023
RRSIS: Referring Remote Sensing Image Segmentation
RRSIS: Referring Remote Sensing Image Segmentation
Zhenghang Yuan
Lichao Mou
Yuansheng Hua
Xiao Xiang Zhu
84
37
0
14 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
426
4,422
0
09 Jun 2023
S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist
  Captions
S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions
Sangwoo Mo
Minkyu Kim
Kyungmin Lee
Jinwoo Shin
VLMCLIP
112
25
0
23 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLMVLM
139
2,095
0
11 May 2023
SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using
  Vision-Language Models
SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models
Jonathan Roberts
Kai Han
Samuel Albanie
VLM
81
14
0
23 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
569
4,910
0
17 Apr 2023
APPLeNet: Visual Attention Parameterized Prompt Learning for Few-Shot
  Remote Sensing Image Generalization using CLIP
APPLeNet: Visual Attention Parameterized Prompt Learning for Few-Shot Remote Sensing Image Generalization using CLIP
Mainak Singha
Ankit Jha
Bhupendra S. Solanki
Shirsha Bose
Biplab Banerjee
VLM
107
27
0
12 Apr 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIPVLM
149
512
0
27 Mar 2023
Towards Geospatial Foundation Models via Continual Pretraining
Towards Geospatial Foundation Models via Continual Pretraining
Matías Mendieta
Boran Han
Xingjian Shi
Yi Zhu
Chen Chen
VLMAI4CE
97
73
0
09 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
429
4,642
0
30 Jan 2023
MGeo: Multi-Modal Geographic Pre-Training Method
MGeo: Multi-Modal Geographic Pre-Training Method
Ruixue Ding
Boli Chen
Pengjun Xie
Fei Huang
Xin Li
Qiang-Wei Zhang
Yao Xu
60
18
0
11 Jan 2023
SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image
  Understanding
SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding
Favyen Bastani
Piper Wolters
Ritwik Gupta
Joe Ferdinando
Aniruddha Kembhavi
70
109
0
28 Nov 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing
  Data
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
127
117
0
23 Oct 2022
Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern
  Hopfield Networks
Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern Hopfield Networks
Yonghao Xu
Weikang Yu
Pedram Ghamisi
Michael K Kopp
Sepp Hochreiter
58
33
0
08 Aug 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViTTPM
477
7,819
0
11 Nov 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
490
10,496
0
17 Jun 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
291
2,521
0
20 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
465
21,566
0
25 Mar 2021
FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in
  High-Resolution Remote Sensing Imagery
FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery
Xian Sun
Peijin Wang
Zhiyuan Yan
F. Xu
Ruiping Wang
...
Tao Xu
M. Weinmann
Stefan Hinz
Cheng Wang
Kun Fu
ObjDAI4TS
66
368
0
09 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
972
29,810
0
26 Feb 2021
Training data-efficient image transformers & distillation through
  attention
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
389
6,802
0
23 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
673
41,430
0
22 Oct 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
880
42,379
0
28 May 2020
RSVQA: Visual Question Answering for Remote Sensing Data
RSVQA: Visual Question Answering for Remote Sensing Data
Sylvain Lobry
Diego Marcos
J. Murray
D. Tuia
109
221
0
16 Mar 2020
Object Detection in Optical Remote Sensing Images: A Survey and A New
  Benchmark
Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark
Ke Li
G. Wan
Gong Cheng
L. Meng
Junwei Han
64
1,459
0
31 Aug 2019
iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images
iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images
Syed Waqas Zamir
Aditya Arora
Akshita Gupta
Salman Khan
Guolei Sun
Fahad Shahbaz Khan
Fan Zhu
Ling Shao
Guisong Xia
X. Bai
SSegVLM
76
348
0
30 May 2019
12
Next