ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.05224
  4. Cited By
Do Vision and Language Encoders Represent the World Similarly?

Do Vision and Language Encoders Represent the World Similarly?

10 January 2024
Mayug Maniparambil
Raiymbek Akshulakov
Y. A. D. Djilali
Sanath Narayan
M. Seddik
K. Mangalam
Noel E. O'Connor
    VLM
ArXivPDFHTML

Papers citing "Do Vision and Language Encoders Represent the World Similarly?"

17 / 17 papers shown
Title
LDIR: Low-Dimensional Dense and Interpretable Text Embeddings with Relative Representations
LDIR: Low-Dimensional Dense and Interpretable Text Embeddings with Relative Representations
Yile Wang
Zhanyu Shen
Hui Huang
26
0
0
15 May 2025
Spingarn's Method and Progressive Decoupling Beyond Elicitable Monotonicity
Spingarn's Method and Progressive Decoupling Beyond Elicitable Monotonicity
B. Evens
P. Latafat
Panagiotis Patrinos
48
0
0
01 Apr 2025
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
Jianing Qi
Jiawei Liu
Hao Tang
Zhigang Zhu
104
1
0
21 Mar 2025
On the Internal Representations of Graph Metanetworks
Taesun Yeom
Jaeho Lee
GNN
59
0
0
12 Mar 2025
Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces
Souhail Hadgi
Luca Moschella
Andrea Santilli
Diego Gomez
Qixing Huang
Emanuele Rodolà
Simone Melzi
M. Ovsjanikov
40
0
0
07 Mar 2025
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Zhaofeng Wu
Xinyan Velocity Yu
Dani Yogatama
Jiasen Lu
Yoon Kim
AIFin
54
10
0
07 Nov 2024
Are Music Foundation Models Better at Singing Voice Deepfake Detection?
  Far-Better Fuse them with Speech Foundation Models
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models
Orchid Chetia Phukan
Sarthak Jain
Swarup Ranjan Behera
Arun Balaji Buduru
Rajesh Sharma
S. R Mahadeva Prasanna
28
0
0
21 Sep 2024
The Platonic Representation Hypothesis
The Platonic Representation Hypothesis
Minyoung Huh
Brian Cheung
Tongzhou Wang
Phillip Isola
77
111
0
13 May 2024
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
52
64
0
10 May 2023
ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training
ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training
Antonio Norelli
Marco Fumero
Valentino Maiorca
Luca Moschella
Emanuele Rodolà
Francesco Locatello
VLM
81
33
0
04 Oct 2022
Linearly Mapping from Image to Text Space
Linearly Mapping from Image to Text Space
Jack Merullo
Louis Castricato
Carsten Eickhoff
Ellie Pavlick
VLM
164
104
0
30 Sep 2022
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
317
5,785
0
29 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,848
0
18 Apr 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
278
1,082
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,700
0
11 Feb 2021
Similarity Analysis of Contextual Word Representation Models
Similarity Analysis of Contextual Word Representation Models
John M. Wu
Yonatan Belinkov
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
James R. Glass
51
73
0
03 May 2020
Word Translation Without Parallel Data
Word Translation Without Parallel Data
Alexis Conneau
Guillaume Lample
MarcÁurelio Ranzato
Ludovic Denoyer
Hervé Jégou
174
1,635
0
11 Oct 2017
1