Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.15162
Cited By
Linearly Mapping from Image to Text Space
30 September 2022
Jack Merullo
Louis Castricato
Carsten Eickhoff
Ellie Pavlick
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Linearly Mapping from Image to Text Space"
29 / 29 papers shown
Title
Shared Global and Local Geometry of Language Model Embeddings
Andrew Lee
Melanie Weber
F. Viégas
Martin Wattenberg
FedML
76
1
0
27 Mar 2025
Shades of Zero: Distinguishing Impossibility from Inconceivability
Jennifer Hu
Felix Sosa
T. Ullman
43
0
0
27 Feb 2025
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Zhaofeng Wu
Xinyan Velocity Yu
Dani Yogatama
Jiasen Lu
Yoon Kim
AIFin
54
10
0
07 Nov 2024
Phase Diagram of Vision Large Language Models Inference: A Perspective from Interaction across Image and Instruction
Houjing Wei
Hakaze Cho
Yuting Shi
MLLM
38
0
0
01 Nov 2024
Towards Interpreting Visual Information Processing in Vision-Language Models
Clement Neo
Luke Ong
Philip H. S. Torr
Mor Geva
David M. Krueger
Fazl Barez
89
6
0
09 Oct 2024
Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations
Lorenzo Basile
Santiago Acevedo
Luca Bortolussi
Fabio Anselmi
Alex Rodriguez
42
4
0
22 Jun 2024
A Concept-Based Explainability Framework for Large Multimodal Models
Jayneel Parekh
Pegah Khayatan
Mustafa Shukor
A. Newson
Matthieu Cord
40
16
0
12 Jun 2024
Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles
Abhijnan Nath
Huma Jamil
Shafiuddin Rehan Ahmed
George Baker
Rahul Ghosh
James H. Martin
Nathaniel Blanchard
Nikhil Krishnaswamy
32
2
0
13 Apr 2024
Learning to Project for Cross-Task Knowledge Distillation
Dylan Auty
Roy Miles
Benedikt Kolbeinsson
K. Mikolajczyk
40
0
0
21 Mar 2024
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
Enguang Wang
Zhimao Peng
Zhengyuan Xie
Fei Yang
Xialei Liu
Ming-Ming Cheng
62
3
0
15 Mar 2024
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
Usha Bhalla
Alexander X. Oesterling
Suraj Srinivas
Flavio du Pin Calmon
Himabindu Lakkaraju
41
35
0
16 Feb 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
34
87
0
11 Jan 2024
Do Vision and Language Encoders Represent the World Similarly?
Mayug Maniparambil
Raiymbek Akshulakov
Y. A. D. Djilali
Sanath Narayan
M. Seddik
K. Mangalam
Noel E. O'Connor
VLM
26
11
0
10 Jan 2024
Object Recognition as Next Token Prediction
Kaiyu Yue
Borchun Chen
Jonas Geiping
Hengduo Li
Tom Goldstein
Ser-Nam Lim
40
9
0
04 Dec 2023
Modality-invariant and Specific Prompting for Multimodal Human Perception Understanding
Hao Sun
Ziwei Niu
Xinyao Yu
Jiaqing Liu
Yen-Wei Chen
Lanfen Lin
29
0
0
17 Nov 2023
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks
Kousik Rajesh
Mrigank Raman
M. A. Karim
Pranit Chawla
VLM
25
2
0
31 Jul 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao
Shijie Wang
Ce Zhang
Changcheng Fu
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
LM&Ro
51
49
0
31 Jul 2023
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIP
VLM
47
0
0
10 Jul 2023
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks
Sherzod Hakimov
David Schlangen
VLM
36
5
0
23 May 2023
The Vector Grounding Problem
Dimitri Coelho Mollo
Raphael Milliere
38
26
0
04 Apr 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
29
507
0
07 Mar 2023
Multipath agents for modular multitask ML systems
Andrea Gesmundo
28
1
0
06 Feb 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
519
0
02 Jan 2023
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
...
Chenguang Zhu
Derek Hoiem
Shih-Fu Chang
Joey Tianyi Zhou
Heng Ji
MLLM
VLM
170
137
0
22 May 2022
Training Vision-Language Transformers from Captions
Liangke Gui
Yingshan Chang
Qiuyuan Huang
Subhojit Som
Alexander G. Hauptmann
Jianfeng Gao
Yonatan Bisk
VLM
ViT
174
11
0
19 May 2022
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
196
405
0
13 Jul 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,848
0
18 Apr 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,198
0
01 Sep 2014
1