Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2412.00142
Cited By
v1
v2
v3 (latest)
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
28 November 2024
Chancharik Mitra
Brandon Huang
Tianning Chai
Zhiqiu Lin
Assaf Arbelle
Rogerio Feris
Leonid Karlinsky
Trevor Darrell
Deva Ramanan
Roei Herzig
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features"
50 / 74 papers shown
Title
On the Suitability of Reinforcement Fine-Tuning to Visual Tasks
X. Chen
Wei Li
Chunxu Liu
Chi Xie
Xiaoyan Hu
Chengqian Ma
Feng Zhu
Rui Zhao
ReLM
LRM
263
2
0
08 Apr 2025
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Computer Vision and Pattern Recognition (CVPR), 2025
Hao Yin
Guangzong Si
Zilei Wang
185
5
0
17 Mar 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision Team
Leonid Karlinsky
Assaf Arbelle
Abraham Daniels
A. Nassar
...
Sriram Raghavan
Tanveer Syeda-Mahmood
Peter W. J. Staar
Tal Drory
Rogerio Feris
VLM
AI4TS
378
10
0
14 Feb 2025
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Neural Information Processing Systems (NeurIPS), 2024
Baiqi Li
Zhiqiu Lin
Wenxuan Peng
Jean de Dieu Nyandwi
Daniel Jiang
Zixian Ma
Simran Khanuja
Ranjay Krishna
Graham Neubig
Deva Ramanan
AAML
CoGe
VLM
532
57
0
18 Oct 2024
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
International Conference on Learning Representations (ICLR), 2024
Di Wu
Hongwei Wang
Wenhao Yu
Yuwei Zhang
Kai-Wei Chang
Dong Yu
RALM
KELM
324
24
0
14 Oct 2024
Conan-embedding: General Text Embedding with More and Better Negative Samples
Shiyu Li
Yang Tang
Shizhe Chen
Xi Chen
163
15
0
28 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLM
SyDa
VLM
353
1,604
0
06 Aug 2024
E5-V: Universal Embeddings with Multimodal Large Language Models
Ting Jiang
Minghui Song
Zihan Zhang
Haizhen Huang
Weiwei Deng
Feng Sun
Qi Zhang
Deqing Wang
Fuzhen Zhuang
VLM
266
65
0
17 Jul 2024
MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline
D. Han
Eunhwan Park
Gisang Lee
Adam Lee
Nojun Kwak
228
6
0
17 Jul 2024
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Brandon Huang
Chancharik Mitra
Assaf Arbelle
Leonid Karlinsky
Trevor Darrell
Roei Herzig
188
35
0
21 Jun 2024
Fine-Tuned 'Small' LLMs (Still) Significantly Outperform Zero-Shot Generative AI Models in Text Classification
Martin Juan José Bucher
Marco Martini
ALM
AI4MH
273
69
0
12 Jun 2024
Why are Visually-Grounded Language Models Bad at Image Classification?
Yuhui Zhang
Alyssa Unell
Xiaohan Wang
Dhruba Ghosh
Yuchang Su
Ludwig Schmidt
Serena Yeung-Levy
VLM
198
58
0
28 May 2024
Scaling Laws for Discriminative Classification in Large Language Models
Dean Wyatte
Fatemeh Tahmasbi
Ming Li
Thomas Markovich
167
2
0
24 May 2024
BLINK: Multimodal Large Language Models Can See but Not Perceive
Xingyu Fu
Yushi Hu
Bangzheng Li
Yu Feng
Haoyu Wang
Xudong Lin
Dan Roth
Noah A. Smith
Wei-Chiu Ma
Ranjay Krishna
VLM
LRM
MLLM
471
273
0
18 Apr 2024
Finding Visual Task Vectors
Alberto Hojel
Yutong Bai
Trevor Darrell
Amir Globerson
Amir Bar
188
14
0
08 Apr 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
EGVM
329
316
0
01 Apr 2024
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
European Conference on Computer Vision (ECCV), 2024
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
170
104
0
22 Mar 2024
Unified Hallucination Detection for Multimodal Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Xiang Chen
Chenxi Wang
Yida Xue
Ningyu Zhang
Xiaoyan Yang
Qian Li
Yue Shen
Lei Liang
Jinjie Gu
Huajun Chen
HILM
281
63
0
05 Feb 2024
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Yongshuo Zong
Ondrej Bohdal
Tingyang Yu
Yongxin Yang
Timothy M. Hospedales
VLM
MLLM
227
106
0
03 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
650
780
0
02 Feb 2024
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Pengyu Wang
Dong Zhang
Linyang Li
Chenkun Tan
Xinghao Wang
Ke Ren
Botian Jiang
Xipeng Qiu
LLMSV
206
67
0
20 Jan 2024
Large Language Models Are Zero-Shot Text Classifiers
Zhiqiang Wang
Yiran Pang
Yanbin Lin
235
49
0
02 Dec 2023
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Computer Vision and Pattern Recognition (CVPR), 2023
Chancharik Mitra
Brandon Huang
Trevor Darrell
Roei Herzig
MLLM
LRM
264
154
0
27 Nov 2023
In-Context Learning Creates Task Vectors
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Roee Hendel
Mor Geva
Amir Globerson
246
227
0
24 Oct 2023
Function Vectors in Large Language Models
International Conference on Learning Representations (ICLR), 2023
Eric Todd
Millicent Li
Arnab Sen Sharma
Aaron Mueller
Byron C. Wallace
David Bau
210
172
0
23 Oct 2023
Meaning Representations from Trajectories in Autoregressive Models
International Conference on Learning Representations (ICLR), 2023
Tian Yu Liu
Matthew Trager
Alessandro Achille
Pramuditha Perera
Luca Zancato
Stefano Soatto
206
22
0
23 Oct 2023
Improved Baselines with Visual Instruction Tuning
Computer Vision and Pattern Recognition (CVPR), 2023
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
500
3,935
0
05 Oct 2023
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
International Conference on Learning Representations (ICLR), 2023
Haozhe Zhao
Zefan Cai
Shuzheng Si
Xiaojian Ma
Kaikai An
Liang Chen
Zixuan Liu
Sheng Wang
Wenjuan Han
Baobao Chang
MLLM
VLM
274
177
0
14 Sep 2023
Scaling Sentence Embeddings with Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ting Jiang
Shaohan Huang
Zhongzhi Luan
Deqing Wang
Fuzhen Zhuang
LRM
185
71
0
31 Jul 2023
SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality
Neural Information Processing Systems (NeurIPS), 2023
Cheng-Yu Hsieh
Jieyu Zhang
Zixian Ma
Aniruddha Kembhavi
Ranjay Krishna
CoGe
237
182
0
26 Jun 2023
Revisiting the Role of Language Priors in Vision-Language Models
International Conference on Machine Learning (ICML), 2023
Zhiqiu Lin
Xinyue Chen
Deepak Pathak
Pengchuan Zhang
Deva Ramanan
VLM
351
36
0
02 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Neural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
703
6,253
0
29 May 2023
Text Classification via Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Xiaofei Sun
Xiaoya Li
Jiwei Li
Leilei Gan
Shangwei Guo
Tianwei Zhang
Guoyin Wang
RALM
LRM
185
207
0
15 May 2023
Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Roei Herzig
Alon Mendelson
Leonid Karlinsky
Assaf Arbelle
Rogerio Feris
Trevor Darrell
Amir Globerson
VLM
239
38
0
10 May 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
651
1,130
0
27 Apr 2023
Visual Instruction Tuning
Neural Information Processing Systems (NeurIPS), 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
834
6,977
0
17 Apr 2023
Sigmoid Loss for Language Image Pre-Training
IEEE International Conference on Computer Vision (ICCV), 2023
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
1.1K
2,026
0
27 Mar 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
3.2K
19,835
0
15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
International Conference on Machine Learning (ICML), 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
1.0K
6,312
0
30 Jan 2023
Scaling Language-Image Pre-training via Masking
Computer Vision and Pattern Recognition (CVPR), 2022
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
286
378
0
01 Dec 2022
MTEB: Massive Text Embedding Benchmark
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Niklas Muennighoff
Nouamane Tazi
L. Magne
Nils Reimers
814
635
0
13 Oct 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
510
673
0
24 Sep 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
739
8,091
0
13 Apr 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Computer Vision and Pattern Recognition (CVPR), 2022
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
294
501
0
07 Apr 2022
Training language models to follow instructions with human feedback
Neural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
1.8K
16,759
0
04 Mar 2022
Autoregressive Image Generation using Residual Quantization
Computer Vision and Pattern Recognition (CVPR), 2022
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
650
547
0
03 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
International Conference on Machine Learning (ICML), 2022
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
1.2K
5,514
0
28 Jan 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Neural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
2.0K
13,778
0
28 Jan 2022
PromptBERT: Improving BERT Sentence Embeddings with Prompts
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ting Jiang
Jian Jiao
Shaohan Huang
Zi-qiang Zhang
Deqing Wang
Fuzhen Zhuang
Furu Wei
Haizhen Huang
Liangjie Zhang
Qi Zhang
189
146
0
12 Jan 2022
Large Dual Encoders Are Generalizable Retrievers
Jianmo Ni
Chen Qu
Jing Lu
Zhuyun Dai
Gustavo Hernández Ábrego
...
Vincent Zhao
Yi Luan
Keith B. Hall
Ming-Wei Chang
Yinfei Yang
DML
434
543
0
15 Dec 2021
1
2
Next