ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.07180
  4. Cited By
Explainable Semantic Space by Grounding Language to Vision with
  Cross-Modal Contrastive Learning

Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning

13 November 2021
Yizhen Zhang
Minkyu Choi
Kuan Han
Zhongming Liu
    VLM
ArXivPDFHTML

Papers citing "Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning"

12 / 12 papers shown
Title
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
Davide Berasi
Matteo Farina
Massimiliano Mancini
Elisa Ricci
Nicola Strisciuglio
CoGe
68
0
0
21 Mar 2025
All in One Framework for Multimodal Re-identification in the Wild
All in One Framework for Multimodal Re-identification in the Wild
He Li
Mang Ye
Ming Zhang
Bo Du
35
9
0
08 May 2024
Towards Weakly Supervised Text-to-Audio Grounding
Towards Weakly Supervised Text-to-Audio Grounding
Xuenan Xu
Ziyang Ma
Mengyue Wu
Kai Yu
AI4TS
33
9
0
05 Jan 2024
Leveraging Multilingual Self-Supervised Pretrained Models for
  Sequence-to-Sequence End-to-End Spoken Language Understanding
Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding
Pavel Denisov
Ngoc Thang Vu
29
1
0
09 Oct 2023
Core-Periphery Principle Guided Redesign of Self-Attention in
  Transformers
Core-Periphery Principle Guided Redesign of Self-Attention in Transformers
Xiao-Xing Yu
Lu Zhang
Haixing Dai
Yanjun Lyu
Lin Zhao
Zihao Wu
David Liu
Tianming Liu
Dajiang Zhu
GNN
36
9
0
27 Mar 2023
MedFuse: Multi-modal fusion with clinical time-series data and chest
  X-ray images
MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images
Nasir Hayat
Krzysztof J. Geras
Farah E. Shamout
MedIm
24
40
0
14 Jul 2022
Regression Metric Loss: Learning a Semantic Representation Space for
  Medical Images
Regression Metric Loss: Learning a Semantic Representation Space for Medical Images
Hanqing Chao
Jiajin Zhang
Pingkun Yan
23
2
0
12 Jul 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
60
527
0
13 Jun 2022
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
304
3,708
0
11 Feb 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
Iterative Visual Reasoning Beyond Convolutions
Iterative Visual Reasoning Beyond Convolutions
Xinlei Chen
Li-Jia Li
Li Fei-Fei
Abhinav Gupta
LRM
GNN
37
213
0
29 Mar 2018
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,746
0
26 Sep 2016
1