Linearly Mapping from Image to Text Space

30 September 2022

Papers citing "Linearly Mapping from Image to Text Space"

29 / 29 papers shown

Title
Shared Global and Local Geometry of Language Model Embeddings Andrew Lee Melanie Weber F. Viégas Martin Wattenberg FedML 76 1 0 27 Mar 2025
Shades of Zero: Distinguishing Impossibility from Inconceivability Jennifer Hu Felix Sosa T. Ullman 43 0 0 27 Feb 2025
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities Zhaofeng Wu Xinyan Velocity Yu Dani Yogatama Jiasen Lu Yoon Kim AIFin 54 10 0 07 Nov 2024
Phase Diagram of Vision Large Language Models Inference: A Perspective from Interaction across Image and Instruction Houjing Wei Hakaze Cho Yuting Shi MLLM 38 0 0 01 Nov 2024
Towards Interpreting Visual Information Processing in Vision-Language Models Clement Neo Luke Ong Philip H. S. Torr Mor Geva David M. Krueger Fazl Barez 89 6 0 09 Oct 2024
Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations Lorenzo Basile Santiago Acevedo Luca Bortolussi Fabio Anselmi Alex Rodriguez 42 4 0 22 Jun 2024
A Concept-Based Explainability Framework for Large Multimodal Models Jayneel Parekh Pegah Khayatan Mustafa Shukor A. Newson Matthieu Cord 40 16 0 12 Jun 2024
Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles Abhijnan Nath Huma Jamil Shafiuddin Rehan Ahmed George Baker Rahul Ghosh James H. Martin Nathaniel Blanchard Nikhil Krishnaswamy 32 2 0 13 Apr 2024
Learning to Project for Cross-Task Knowledge Distillation Dylan Auty Roy Miles Benedikt Kolbeinsson K. Mikolajczyk 40 0 0 21 Mar 2024
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery Enguang Wang Zhimao Peng Zhengyuan Xie Fei Yang Xialei Liu Ming-Ming Cheng 62 3 0 15 Mar 2024
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) Usha Bhalla Alexander X. Oesterling Suraj Srinivas Flavio du Pin Calmon Himabindu Lakkaraju 41 35 0 16 Feb 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models Asma Ghandeharioun Avi Caciularu Adam Pearce Lucas Dixon Mor Geva 34 87 0 11 Jan 2024
Do Vision and Language Encoders Represent the World Similarly? Mayug Maniparambil Raiymbek Akshulakov Y. A. D. Djilali Sanath Narayan M. Seddik K. Mangalam Noel E. O'Connor VLM 26 11 0 10 Jan 2024
Object Recognition as Next Token Prediction Kaiyu Yue Borchun Chen Jonas Geiping Hengduo Li Tom Goldstein Ser-Nam Lim 40 9 0 04 Dec 2023
Modality-invariant and Specific Prompting for Multimodal Human Perception Understanding Hao Sun Ziwei Niu Xinyao Yu Jiaqing Liu Yen-Wei Chen Lanfen Lin 29 0 0 17 Nov 2023
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks Kousik Rajesh Mrigank Raman M. A. Karim Pranit Chawla VLM 25 2 0 31 Jul 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? Qi Zhao Shijie Wang Ce Zhang Changcheng Fu Minh Quan Do Nakul Agarwal Kwonjoon Lee Chen Sun LM&Ro 51 49 0 31 Jul 2023
Linear Alignment of Vision-language Models for Image Captioning Fabian Paischer M. Hofmarcher Sepp Hochreiter Thomas Adler CLIP VLM 47 0 0 10 Jul 2023
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks Sherzod Hakimov David Schlangen VLM 36 5 0 23 May 2023
The Vector Grounding Problem Dimitri Coelho Mollo Raphael Milliere 38 26 0 04 Apr 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT Yihan Cao Siyu Li Yixin Liu Zhiling Yan Yutong Dai Philip S. Yu Lichao Sun 29 507 0 07 Mar 2023
Multipath agents for modular multitask ML systems Andrea Gesmundo 28 1 0 06 Feb 2023
Muse: Text-To-Image Generation via Masked Generative Transformers Huiwen Chang Han Zhang Jarred Barber AJ Maschinot José Lezama ... Kevin Patrick Murphy William T. Freeman Michael Rubinstein Yuanzhen Li Dilip Krishnan DiffM 197 519 0 02 Jan 2023
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners Zhenhailong Wang Manling Li Ruochen Xu Luowei Zhou Jie Lei ... Chenguang Zhu Derek Hoiem Shih-Fu Chang Joey Tianyi Zhou Heng Ji MLLM VLM 170 137 0 22 May 2022
Training Vision-Language Transformers from Captions Liangke Gui Yingshan Chang Qiuyuan Huang Subhojit Som Alexander G. Hauptmann Jianfeng Gao Yonatan Bisk VLM ViT 174 11 0 19 May 2022
How Much Can CLIP Benefit Vision-and-Language Tasks? Sheng Shen Liunian Harold Li Hao Tan Joey Tianyi Zhou Anna Rohrbach Kai-Wei Chang Z. Yao Kurt Keutzer CLIP VLM MLLM 196 405 0 13 Jul 2021
The Power of Scale for Parameter-Efficient Prompt Tuning Brian Lester Rami Al-Rfou Noah Constant VPVLM 280 3,848 0 18 Apr 2021
High-Performance Large-Scale Image Recognition Without Normalization Andrew Brock Soham De Samuel L. Smith Karen Simonyan VLM 223 512 0 11 Feb 2021
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 296 39,198 0 01 Sep 2014