Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.14953
Cited By
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching
19 March 2025
Yang Liu
Wentao Feng
Zhuoyao Liu
Shudong Huang
Jiancheng Lv
DiffM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching"
21 / 21 papers shown
Title
Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment
Yang Liu
M. Liu
Shudong Huang
Jiancheng Lv
139
1
0
10 Mar 2025
Cross-modal Active Complementary Learning with Self-refining Correspondence
Yang Qin
Yuan Sun
Dezhong Peng
Qiufeng Wang
Xiaocui Peng
Peng Hu
85
21
0
26 Oct 2023
Improved Probabilistic Image-Text Representations
Sanghyuk Chun
VLM
82
31
0
29 May 2023
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
571
4,925
0
17 Apr 2023
Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Dongwon Kim
Nam-Won Kim
Suha Kwak
101
40
0
30 Nov 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
477
7,827
0
11 Nov 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
467
21,603
0
25 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.0K
29,926
0
26 Feb 2021
Probabilistic Embeddings for Cross-Modal Retrieval
Sanghyuk Chun
Seong Joon Oh
Rafael Sampaio de Rezende
Yannis Kalantidis
Diane Larlus
UQCV
485
210
0
13 Jan 2021
Similarity Reasoning and Filtration for Image-Text Matching
Haiwen Diao
Ying Zhang
Lingyun Ma
Huchuan Lu
301
340
0
05 Jan 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
682
41,483
0
22 Oct 2020
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
Hui Chen
Guiguang Ding
Xudong Liu
Zijia Lin
Ji Liu
Jungong Han
78
321
0
08 Mar 2020
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
83
306
0
12 Sep 2019
Visual Semantic Reasoning for Image-Text Matching
Kunpeng Li
Yulun Zhang
Keqin Li
Yuanyuan Li
Y. Fu
VLM
91
505
0
06 Sep 2019
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Yale Song
M. Soleymani
60
246
0
11 Jun 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,229
0
11 Oct 2018
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
109
1,158
0
21 Mar 2018
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
Jiuxiang Gu
Jianfei Cai
Shafiq Joty
Li Niu
G. Wang
VLM
88
361
0
17 Nov 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
123
4,223
0
25 Jul 2017
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Julia Hockenmaier
Svetlana Lazebnik
208
2,074
0
19 May 2015
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
434
43,875
0
01 May 2014
1