Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.07180
Cited By
Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning
13 November 2021
Yizhen Zhang
Minkyu Choi
Kuan Han
Zhongming Liu
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning"
12 / 12 papers shown
Title
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
Davide Berasi
Matteo Farina
Massimiliano Mancini
Elisa Ricci
Nicola Strisciuglio
CoGe
68
0
0
21 Mar 2025
All in One Framework for Multimodal Re-identification in the Wild
He Li
Mang Ye
Ming Zhang
Bo Du
35
9
0
08 May 2024
Towards Weakly Supervised Text-to-Audio Grounding
Xuenan Xu
Ziyang Ma
Mengyue Wu
Kai Yu
AI4TS
33
9
0
05 Jan 2024
Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding
Pavel Denisov
Ngoc Thang Vu
29
1
0
09 Oct 2023
Core-Periphery Principle Guided Redesign of Self-Attention in Transformers
Xiao-Xing Yu
Lu Zhang
Haixing Dai
Yanjun Lyu
Lin Zhao
Zihao Wu
David Liu
Tianming Liu
Dajiang Zhu
GNN
36
9
0
27 Mar 2023
MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images
Nasir Hayat
Krzysztof J. Geras
Farah E. Shamout
MedIm
24
40
0
14 Jul 2022
Regression Metric Loss: Learning a Semantic Representation Space for Medical Images
Hanqing Chao
Jiajin Zhang
Pingkun Yan
23
2
0
12 Jul 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
60
527
0
13 Jun 2022
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
304
3,708
0
11 Feb 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
Iterative Visual Reasoning Beyond Convolutions
Xinlei Chen
Li-Jia Li
Li Fei-Fei
Abhinav Gupta
LRM
GNN
37
213
0
29 Mar 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,746
0
26 Sep 2016
1