ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18022
  4. Cited By
RemoteSAM: Towards Segment Anything for Earth Observation

RemoteSAM: Towards Segment Anything for Earth Observation

23 May 2025
Liang Yao
Fan Liu
Delong Chen
Chuanyi Zhang
Yijun Wang
Ziyun Chen
Wei Xu
Shimin Di
Yuhui Zheng
ArXivPDFHTML

Papers citing "RemoteSAM: Towards Segment Anything for Earth Observation"

50 / 56 papers shown
Title
Falcon: A Remote Sensing Vision-Language Foundation Model
Kelu Yao
Nuo Xu
Rong Yang
Y. Xu
Zhuoyan Gao
...
Yi Ren
Pu Zhang
Jun Wang
Ning Wei
Chao Li
ObjD
54
2
0
14 Mar 2025
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Michael Tschannen
A. Gritsenko
Xiao Wang
Muhammad Ferjad Naeem
Ibrahim Alabdulmohsin
...
Basil Mustafa
Olivier J. Hénaff
Jeremiah Harmsen
Andreas Steiner
Xiaohua Zhai
VLM
103
54
0
21 Feb 2025
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
Yimiao Zhou
Mengcheng Lan
Xiang Li
Yiping Ke
Yiping Ke
Xue Jiang
Qingyun Li
Xue Yang
Wayne Zhang
ObjD
VLM
170
6
0
16 Nov 2024
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation
Zhe Dong
Yuzhe Sun
Tianzhu Liu
Wangmeng Zuo
Yanfeng Gu
47
5
0
11 Oct 2024
Prompting DirectSAM for Semantic Contour Extraction in Remote Sensing
  Images
Prompting DirectSAM for Semantic Contour Extraction in Remote Sensing Images
Shiyu Miao
Delong Chen
Fan Liu
Chuanyi Zhang
Yanhui Gu
Shengjie Guo
Jun Zhou
44
2
0
08 Oct 2024
Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing
  Image Segmentation
Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation
Sen Lei
Xinyu Xiao
Heng-Chao Li
Z. Shi
Qing Zhu
73
13
0
20 Sep 2024
Domain-invariant Progressive Knowledge Distillation for UAV-based Object
  Detection
Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection
Liang Yao
Fan Liu
Chuanyi Zhang
Zhiquan Ou
Ting Wu
VLM
87
5
0
21 Aug 2024
Cross-aware Early Fusion with Stage-divided Vision and Language
  Transformer Encoders for Referring Image Segmentation
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation
Yubin Cho
Hyunwoo Yu
Suk-Ju Kang
76
21
0
14 Aug 2024
Masked Angle-Aware Autoencoder for Remote Sensing Images
Masked Angle-Aware Autoencoder for Remote Sensing Images
Zhihao Li
B. Hou
Siteng Ma
Zitong Wu
Xianpeng Guo
Bo Ren
Licheng Jiao
71
12
0
04 Aug 2024
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language
  Inference
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan
Chaofeng Chen
Yiping Ke
Xinjiang Wang
Xue Jiang
Wayne Zhang
VLM
95
28
0
17 Jul 2024
FMARS: Annotating Remote Sensing Images for Disaster Management using
  Foundation Models
FMARS: Annotating Remote Sensing Images for Disaster Management using Foundation Models
Edoardo Arnaudo
Jacopo Lungo Vaschetti
Lorenzo Innocenti
Luca Barco
Davide Lisi
V. Fissore
Claudio Rossi
43
2
0
30 May 2024
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
  Training Strategies
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu
Yuge Tu
Xu Han
Chaoqun He
Ganqu Cui
...
Chaochao Jia
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
MoE
82
322
0
09 Apr 2024
Rethinking Transformers Pre-training for Multi-Spectral Satellite
  Imagery
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
Mubashir Noman
Muzammal Naseer
Hisham Cholakkal
Rao Muhammad Anwar
Salman Khan
Fahad Shahbaz Khan
ViT
58
42
0
08 Mar 2024
Subobject-level Image Tokenization
Subobject-level Image Tokenization
Delong Chen
Samuel Cahyawijaya
Jianfeng Liu
Baoyuan Wang
Pascale Fung
VLM
OCL
148
9
0
22 Feb 2024
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal
  Language Model
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar
Zhenshi Li
Feng-Xue Gu
Xue-liang Zhang
Pengfeng Xiao
111
56
0
04 Feb 2024
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor
  Image Comprehension in Remote Sensing Domain
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
Wei Zhang
Miaoxin Cai
Tong Zhang
Zhuang Yin
Xuerui Mao
62
96
0
30 Jan 2024
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Fan Liu
Tianshu Zhang
Wenwen Dai
Wenwen Cai
Wenwen Cai Xiaocong Zhou
Delong Chen
VLM
OffRL
45
27
0
03 Jan 2024
Mask Grounding for Referring Image Segmentation
Mask Grounding for Referring Image Segmentation
Yong Xien Chng
Henry Zheng
Yizeng Han
Xuchong Qiu
Gao Huang
ISeg
ObjD
75
18
0
19 Dec 2023
Rotated Multi-Scale Interaction Network for Referring Remote Sensing
  Image Segmentation
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan Liu
Yiwei Ma
Xiaoqing Zhang
Haowei Wang
Jiayi Ji
Xiaoshuai Sun
Rongrong Ji
65
42
0
19 Dec 2023
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Feng Wang
Jieru Mei
Alan Yuille
VLM
59
62
0
04 Dec 2023
Grounding Everything: Emerging Localization Properties in
  Vision-Language Transformers
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Walid Bousselham
Felix Petersen
Vittorio Ferrari
Hilde Kuehne
ObjD
VLM
56
45
0
01 Dec 2023
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
Kartik Kuckreja
M. S. Danish
Muzammal Naseer
Abhijit Das
Salman Khan
Fahad Shahbaz Khan
61
145
0
24 Nov 2023
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated
  Student-Teacher Learning
Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning
Yan Li
Weiwei Guo
Xue Yang
Ning Liao
Dunyun He
Jiaqi Zhou
Wenxian Yu
ObjD
VLM
50
8
0
20 Nov 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
  Multi-modal Large Language Models
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Ziyi Lin
Chris Liu
Renrui Zhang
Peng Gao
Longtian Qiu
...
Siyuan Huang
Yichi Zhang
Xuming He
Hongsheng Li
Yu Qiao
MLLM
VLM
50
219
0
13 Nov 2023
Beyond One-to-One: Rethinking the Referring Image Segmentation
Beyond One-to-One: Rethinking the Referring Image Segmentation
Yutao Hu
Qixiong Wang
Wenqi Shao
Enze Xie
Zhenguo Li
Jungong Han
Ping Luo
3DV
103
40
0
26 Aug 2023
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large
  Vision-Language Model for Remote Sensing
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zilun Zhang
Tiancheng Zhao
Yulong Guo
Yuxiang Cai
DiffM
VLM
41
61
0
20 Jun 2023
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
Fan Liu
Delong Chen
Zhan-Rong Guan
Xiaocong Zhou
Jiale Zhu
Qiaolin Ye
Liyong Fu
Jun Zhou
VLM
88
213
0
19 Jun 2023
RRSIS: Referring Remote Sensing Image Segmentation
RRSIS: Referring Remote Sensing Image Segmentation
Zhenghang Yuan
Lichao Mou
Yuansheng Hua
Xiao Xiang Zhu
67
37
0
14 Jun 2023
GRES: Generalized Referring Expression Segmentation
GRES: Generalized Referring Expression Segmentation
Chang Liu
Henghui Ding
Xudong Jiang
69
154
0
01 Jun 2023
SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using
  Vision-Language Models
SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models
Jonathan Roberts
Kai Han
Samuel Albanie
VLM
57
14
0
23 Apr 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
96
1,978
0
20 Apr 2023
A Billion-scale Foundation Model for Remote Sensing Images
A Billion-scale Foundation Model for Remote Sensing Images
Keumgang Cha
Junghoon Seo
Taekyung Lee
68
69
0
11 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Vision-Language Models for Vision Tasks: A Survey
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
109
516
0
03 Apr 2023
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial
  Representation Learning
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
Colorado Reed
Ritwik Gupta
Shufan Li
S. Brockman
Christopher Funk
Brian Clipp
Kurt Keutzer
Salvatore Candido
M. Uyttendaele
Trevor Darrell
148
179
0
30 Dec 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing
  Data
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
102
114
0
23 Oct 2022
Towards Robust Referring Image Segmentation
Towards Robust Referring Image Segmentation
Jianzong Wu
Xiangtai Li
Xia Li
Henghui Ding
Yu Tong
Dacheng Tao
3DV
69
44
0
20 Sep 2022
Advancing Plain Vision Transformer Towards Remote Sensing Foundation
  Model
Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model
Di Wang
Qiming Zhang
Yufei Xu
Jing Zhang
Bo Du
Dacheng Tao
Lefei Zhang
57
252
0
08 Aug 2022
SatMAE: Pre-training Transformers for Temporal and Multi-Spectral
  Satellite Imagery
SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
Yezhen Cong
Samarth Khanna
Chenlin Meng
Patrick Liu
Erik Rozi
Yutong He
Marshall Burke
David B. Lobell
Stefano Ermon
ViT
53
264
0
17 Jul 2022
A Unified Sequence Interface for Vision Tasks
A Unified Sequence Interface for Vision Tasks
Ting-Li Chen
Saurabh Saxena
Lala Li
Nayeon Lee
David J. Fleet
Geoffrey E. Hinton
VLM
MLLM
41
150
0
15 Jun 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement
  Networks for End-to-End Visual Grounding
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Jiabo Ye
Junfeng Tian
Ming Yan
Xiaoshan Yang
Xuwu Wang
Ji Zhang
Liang He
Xin Lin
ObjD
34
64
0
29 Mar 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
120
865
0
07 Feb 2022
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip Torr
188
317
0
04 Dec 2021
Extract Free Dense Labels from CLIP
Extract Free Dense Labels from CLIP
Chong Zhou
Chen Change Loy
Bo Dai
VLM
CLIP
108
467
0
02 Dec 2021
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic
  Segmentation
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation
Junjue Wang
Zhuo Zheng
A. Ma
Xiaoyan Lu
Yanfei Zhong
61
333
0
17 Oct 2021
Referring Transformer: A One-step Approach to Multi-task Visual
  Grounding
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Muchen Li
Leonid Sigal
ObjD
41
191
0
06 Jun 2021
Cross-Modal Progressive Comprehension for Referring Segmentation
Cross-Modal Progressive Comprehension for Referring Segmentation
Si Liu
Tianrui Hui
Shaofei Huang
Yunchao Wei
Yue Liu
Guanbin Li
EgoV
VOS
47
127
0
15 May 2021
Look Before You Leap: Learning Landmark Features for One-Stage Visual
  Grounding
Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding
Binbin Huang
Dongze Lian
Weixin Luo
Shenghua Gao
ObjD
42
94
0
09 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
319
21,175
0
25 Mar 2021
Linguistic Structure Guided Context Modeling for Referring Image
  Segmentation
Linguistic Structure Guided Context Modeling for Referring Image Segmentation
Tianrui Hui
Si Liu
Shaofei Huang
Guanbin Li
Sansi Yu
Faxi Zhang
Jizhong Han
58
150
0
01 Oct 2020
Referring Image Segmentation via Cross-Modal Progressive Comprehension
Referring Image Segmentation via Cross-Modal Progressive Comprehension
Shaofei Huang
Tianrui Hui
Si Liu
Guanbin Li
Yunchao Wei
Jizhong Han
Luoqi Liu
Yue Liu
EgoV
54
178
0
01 Oct 2020
12
Next