Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.06267
Cited By
v1
v2
v3
v4 (latest)
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
16 January 2023
Zhiqiu Lin
Samuel Yu
Zhiyi Kuang
Deepak Pathak
Deva Ramana
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models"
50 / 78 papers shown
Title
Hierarchical and Multimodal Data for Daily Activity Understanding
Ghazal Kaviani
Yavuz Yarici
Seulgi Kim
Mohit Prabhushankar
Ghassan AlRegib
Mashhour Solh
Ameya Patil
101
1
0
24 Apr 2025
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
Yassir Bendou
Amine Ouasfi
Vincent Gripon
A. Boukhayma
VLM
135
0
0
19 Jan 2025
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning
William A. Stigall
106
0
0
14 Oct 2024
Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection
Ye Jiang
Yimin Wang
MLLM
104
1
0
16 Jul 2024
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He
Yihe Deng
Shixiang Tang
Qihao Chen
Qingsong Xie
...
Feng Zhu
Rui Zhao
Wanli Ouyang
Donglian Qi
Yunfeng Yan
109
23
0
13 Jun 2023
Patching open-vocabulary models by interpolating weights
Gabriel Ilharco
Mitchell Wortsman
S. Gadre
Shuran Song
Hannaneh Hajishirzi
Simon Kornblith
Ali Farhadi
Ludwig Schmidt
VLM
KELM
96
175
0
10 Aug 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjD
VLM
90
298
0
12 Jun 2022
Prompt-aligned Gradient for Prompt Tuning
Beier Zhu
Yulei Niu
Yucheng Han
Yuehua Wu
Hanwang Zhang
VLM
279
290
0
30 May 2022
RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning
Mingkai Deng
Jianyu Wang
Cheng-Ping Hsieh
Yihan Wang
Han Guo
Tianmin Shu
Meng Song
Eric Xing
Zhiting Hu
89
340
0
25 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
157
1,301
0
04 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
382
3,542
0
29 Apr 2022
Unsupervised Prompt Learning for Vision-Language Models
Hao Huang
Jack Chu
Fangyun Wei
VPVLM
MLLM
VLM
89
133
0
07 Apr 2022
Visual Prompt Tuning
Menglin Jia
Luming Tang
Bor-Chun Chen
Claire Cardie
Serge Belongie
Bharath Hariharan
Ser-Nam Lim
VLM
VPVLM
153
1,627
0
23 Mar 2022
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models
Archiki Prasad
Peter Hase
Xiang Zhou
Joey Tianyi Zhou
95
123
0
14 Mar 2022
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment
Haoyu Song
Li Dong
Weinan Zhang
Ting Liu
Furu Wei
VLM
CLIP
79
138
0
14 Mar 2022
Conditional Prompt Learning for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VLM
CLIP
VPVLM
136
1,349
0
10 Mar 2022
Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution
Ananya Kumar
Aditi Raghunathan
Robbie Jones
Tengyu Ma
Percy Liang
OODD
120
674
0
21 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
535
4,360
0
28 Jan 2022
Debiased Learning from Naturally Imbalanced Pseudo-Labels
Xudong Wang
Zhi-Li Wu
Long Lian
Stella X. Yu
100
99
0
05 Jan 2022
SLIP: Self-supervision meets Language-Image Pre-training
Norman Mu
Alexander Kirillov
David Wagner
Saining Xie
VLM
CLIP
141
488
0
23 Dec 2021
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
126
1,062
0
07 Dec 2021
LiT: Zero-Shot Transfer with Locked-image text Tuning
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
98
556
0
15 Nov 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
265
400
0
06 Nov 2021
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Yangguang Li
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
F. Yu
Junjie Yan
VLM
CLIP
145
453
0
11 Oct 2021
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLM
CLIP
293
1,042
0
09 Oct 2021
Robust fine-tuning of zero-shot models
Mitchell Wortsman
Gabriel Ilharco
Jong Wook Kim
Mike Li
Simon Kornblith
...
Raphael Gontijo-Lopes
Hannaneh Hajishirzi
Ali Farhadi
Hongseok Namkoong
Ludwig Schmidt
VLM
132
725
0
04 Sep 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
495
2,399
0
02 Sep 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLM
SyDa
213
3,977
0
28 Jul 2021
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
169
786
0
25 Jun 2021
AudioCLIP: Extending CLIP to Image, Text and Audio
A. Guzhov
Federico Raue
Jörn Hees
Andreas Dengel
CLIP
VLM
122
366
0
24 Jun 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
688
6,079
0
29 Apr 2021
Rich Semantics Improve Few-shot Learning
Mohamed Afham
Salman Khan
Muhammad Haris Khan
Muzammal Naseer
Fahad Shahbaz Khan
VLM
86
24
0
26 Apr 2021
ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio
A. Guzhov
Federico Raue
Jörn Hees
Andreas Dengel
60
38
0
23 Apr 2021
GPT Understands, Too
Xiao Liu
Yanan Zheng
Zhengxiao Du
Ming Ding
Yujie Qian
Zhilin Yang
Jie Tang
VLM
168
1,175
0
18 Mar 2021
BERTese: Learning to Speak to BERT
Adi Haviv
Jonathan Berant
Amir Globerson
69
123
0
09 Mar 2021
Self-supervised Pretraining of Visual Features in the Wild
Priya Goyal
Mathilde Caron
Benjamin Lefaudeux
Min Xu
Pengchao Wang
...
Mannat Singh
Vitaliy Liptchinsky
Ishan Misra
Armand Joulin
Piotr Bojanowski
VLM
SSL
76
274
0
02 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
929
29,436
0
26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
443
3,856
0
11 Feb 2021
Cross-Modal Contrastive Learning for Text-to-Image Generation
Han Zhang
Jing Yu Koh
Jason Baldridge
Honglak Lee
Yinfei Yang
GAN
130
364
0
12 Jan 2021
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
391
1,967
0
31 Dec 2020
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
Wei Li
Can Gao
Guocheng Niu
Xinyan Xiao
Hao Liu
Jiachen Liu
Hua Wu
Haifeng Wang
94
379
0
31 Dec 2020
Multimodal Prototypical Networks for Few-shot Learning
Frederik Pahde
M. Puscas
T. Klein
Moin Nabi
57
90
0
17 Nov 2020
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick
Hinrich Schütze
124
973
0
15 Sep 2020
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
Dan Hendrycks
Steven Basart
Norman Mu
Saurav Kadavath
Frank Wang
...
Samyak Parajuli
Mike Guo
Basel Alomair
Jacob Steinhardt
Justin Gilmer
OOD
337
1,740
0
29 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
795
42,055
0
28 May 2020
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
369
18,778
0
13 Feb 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
348
1,615
0
21 Jan 2020
Improved Few-Shot Visual Classification
Peyman Bateni
Raghav Goyal
Vaden Masrani
Frank Wood
Leonid Sigal
VLM
69
231
0
07 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
92
431
0
28 Nov 2019
How Can We Know What Language Models Know?
Zhengbao Jiang
Frank F. Xu
Jun Araki
Graham Neubig
KELM
132
1,405
0
28 Nov 2019
1
2
Next