Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.08498
Cited By
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
14 June 2023
Seoyeon Kim
Minguk Kang
Dongwon Kim
Jaesik Park
Suha Kwak
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Extending CLIP's Image-Text Alignment to Referring Image Segmentation"
14 / 14 papers shown
Title
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
Weizhi Chen
Jingbo Chen
Yupeng Deng
Jiansheng Chen
Yuman Feng
Zhihao Xi
Diyou Liu
Kai Li
Yu Meng
VLM
51
0
0
25 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
85
2
0
08 Mar 2025
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
50
3
0
31 Dec 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
24
5
0
10 Oct 2024
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras
Pratik K. Mishra
Irene Ballester
Andrea Iaboni
B. Ye
Kristine Newman
Alex Mihailidis
Shehroz S. Khan
39
0
0
28 Aug 2024
Image Segmentation in Foundation Model Era: A Survey
Tianfei Zhou
Fei Zhang
Boyu Chang
Wenguan Wang
Ye Yuan
E. Konukoglu
Daniel Cremers
VLM
42
4
0
23 Aug 2024
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu
Paul Hongsuck Seo
Jeany Son
DiffM
50
4
0
10 Jul 2024
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
29
8
0
20 Apr 2024
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
143
637
0
26 May 2022
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip H. S. Torr
138
306
0
04 Dec 2021
Putting 3D Spatially Sparse Networks on a Diet
Junha Lee
Chris Choy
Jaesik Park
3DV
28
3
0
02 Dec 2021
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
173
686
0
22 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
295
3,693
0
11 Feb 2021
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Liujuan Cao
Chenglin Wu
Cheng Deng
Rongrong Ji
ObjD
161
286
0
19 Mar 2020
1