ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.13884
  4. Cited By
Multimodal Few-Shot Learning with Frozen Language Models

Multimodal Few-Shot Learning with Frozen Language Models

25 June 2021
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
    MLLM
ArXivPDFHTML

Papers citing "Multimodal Few-Shot Learning with Frozen Language Models"

50 / 532 papers shown
Title
FVP: Fourier Visual Prompting for Source-Free Unsupervised Domain
  Adaptation of Medical Image Segmentation
FVP: Fourier Visual Prompting for Source-Free Unsupervised Domain Adaptation of Medical Image Segmentation
Yan Wang
Jian Cheng
Yixin Chen
Shuai Shao
Lanyun Zhu
Zhenzhou Wu
Tianming Liu
Haogang Zhu
OOD
MedIm
60
25
0
26 Apr 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
75
1,922
0
20 Apr 2023
MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning
MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning
Bohan Li
Longxu Dou
Yutai Hou
Yunlong Feng
Honglin Mu
Qingfu Zhu
Qinghua Sun
Wanxiang Che
VLM
37
3
0
19 Apr 2023
Pretrained Language Models as Visual Planners for Human Assistance
Pretrained Language Models as Visual Planners for Human Assistance
Dhruvesh Patel
H. Eghbalzadeh
Nitin Kamra
Michael L. Iuzzolino
Unnat Jain
Ruta Desai
LM&Ro
21
24
0
17 Apr 2023
Towards Robust Prompts on Vision-Language Models
Towards Robust Prompts on Vision-Language Models
Jindong Gu
Ahmad Beirami
Xuezhi Wang
Alex Beutel
Philip Torr
Yao Qin
VLM
VPVLM
38
8
0
17 Apr 2023
An Iterative Optimizing Framework for Radiology Report Summarization
  with ChatGPT
An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT
Chong Ma
Zihao Wu
Jiaqi Wang
Shaochen Xu
Yaonai Wei
...
Tuo Zhang
Dajiang Zhu
Dinggang Shen
Tianming Liu
Xiang Li
MedIm
LM&MA
47
97
0
17 Apr 2023
Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models
Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models
Yaohua Zha
Jinpeng Wang
Tao Dai
Bin Chen
Zhi Wang
Shutao Xia
VLM
53
45
0
14 Apr 2023
Verbs in Action: Improving verb understanding in video-language models
Verbs in Action: Improving verb understanding in video-language models
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
37
70
0
13 Apr 2023
Efficient Multimodal Fusion via Interactive Prompting
Efficient Multimodal Fusion via Interactive Prompting
Yaowei Li
Ruijie Quan
Linchao Zhu
Yezhou Yang
35
44
0
13 Apr 2023
FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion
  Vision-Language Pre-training
FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training
Yunpeng Han
Lisai Zhang
Qingcai Chen
Zhijian Chen
Zhonghua Li
Jianxin Yang
Bo Zhao
AI4TS
VLM
31
11
0
11 Apr 2023
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
Jun Chen
Deyao Zhu
Kilichbek Haydarov
Xiang Li
Mohamed Elhoseiny
36
37
0
09 Apr 2023
How to Design Translation Prompts for ChatGPT: An Empirical Study
How to Design Translation Prompts for ChatGPT: An Empirical Study
Yuan Gao
Ruili Wang
Feng Hou
26
41
0
05 Apr 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
24
44
0
31 Mar 2023
AutoAD: Movie Description in Context
AutoAD: Movie Description in Context
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
29
34
0
29 Mar 2023
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init
  Attention
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang
Jiaming Han
Chris Liu
Peng Gao
Aojun Zhou
Xiangfei Hu
Shilin Yan
Pan Lu
Hongsheng Li
Yu Qiao
MLLM
74
747
0
28 Mar 2023
IFSeg: Image-free Semantic Segmentation via Vision-Language Model
IFSeg: Image-free Semantic Segmentation via Vision-Language Model
Sukmin Yun
S. Park
Paul Hongsuck Seo
Jinwoo Shin
VLM
MLLM
57
14
0
25 Mar 2023
Semantic Prompt for Few-Shot Image Recognition
Semantic Prompt for Few-Shot Image Recognition
Wentao Chen
Chenyang Si
Zhang Zhang
Liangdao Wang
Zilei Wang
Tien-Ping Tan
VLM
27
39
0
24 Mar 2023
Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
Hantao Yao
Rui Zhang
Changsheng Xu
VLM
VPVLM
130
204
0
23 Mar 2023
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation
  in an Open World
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
Qifan Yu
Juncheng Li
Yuehua Wu
Siliang Tang
Wei Ji
Yueting Zhuang
35
34
0
23 Mar 2023
Frozen Language Model Helps ECG Zero-Shot Learning
Frozen Language Model Helps ECG Zero-Shot Learning
Jun Yu Li
Che Liu
Sibo Cheng
Rossella Arcucci
linda Qiao
23
59
0
22 Mar 2023
Contrastive Alignment of Vision to Language Through Parameter-Efficient
  Transfer Learning
Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning
Zaid Khan
Yun Fu
VLM
41
12
0
21 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
eP-ALM: Efficient Perceptual Augmentation of Language Models
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLM
VLM
32
29
0
20 Mar 2023
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
E. Azarnasab
Faisal Ahmed
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
ReLM
KELM
LRM
26
368
0
20 Mar 2023
Decomposed Prototype Learning for Few-Shot Scene Graph Generation
Decomposed Prototype Learning for Few-Shot Scene Graph Generation
Xingchen Li
Long Chen
Guikun Chen
Yinfu Feng
Yi Yang
Jun Xiao
32
6
0
20 Mar 2023
A Picture is Worth a Thousand Words: Language Models Plan from Pixels
A Picture is Worth a Thousand Words: Language Models Plan from Pixels
Anthony Z. Liu
Lajanugen Logeswaran
Sungryull Sohn
Honglak Lee
LM&Ro
21
6
0
16 Mar 2023
Visual Prompt Based Personalized Federated Learning
Visual Prompt Based Personalized Federated Learning
Guang-Ming Li
Wansen Wu
Yan Sun
Li Shen
Baoyuan Wu
Dacheng Tao
FedML
VLM
23
18
0
15 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched
  Visual Descriptions
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Deyao Zhu
Jun Chen
Kilichbek Haydarov
Xiaoqian Shen
Wenxuan Zhang
Mohamed Elhoseiny
MLLM
45
97
0
12 Mar 2023
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and
  Multilingual Natural Language Generation
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
Bang-ju Yang
Fenglin Liu
Yuexian Zou
Xian Wu
Yaowei Wang
David Clifton
36
9
0
11 Mar 2023
Open-Ended Medical Visual Question Answering Through Prefix Tuning of
  Language Models
Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models
Tom van Sonsbeek
Mohammad Mahdi Derakhshani
Ivona Najdenkoska
Cees G. M. Snoek
M. Worring
LM&MA
16
51
0
10 Mar 2023
From Visual Prompt Learning to Zero-Shot Transfer: Mapping Is All You
  Need
From Visual Prompt Learning to Zero-Shot Transfer: Mapping Is All You Need
Ziqing Yang
Zeyang Sha
Michael Backes
Yang Zhang
VPVLM
VLM
43
3
0
09 Mar 2023
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for
  Document Information Extraction
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction
Jiabang He
Lei Wang
Yingpeng Hu
Ning Liu
Hui-juan Liu
Xingdong Xu
Hengtao Shen
MLLM
6
46
0
09 Mar 2023
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation
  Models
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Chenfei Wu
Sheng-Kai Yin
Weizhen Qi
Xiaodong Wang
Zecheng Tang
Nan Duan
MLLM
LRM
53
615
0
08 Mar 2023
Sample Efficient Multimodal Semantic Augmentation for Incremental
  Summarization
Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization
Sumanta Bhattacharyya
R. Manuvinakurike
Sahisnu Mazumder
Saurav Sahay
VLM
21
0
0
08 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
38
509
0
07 Mar 2023
PaLM-E: An Embodied Multimodal Language Model
PaLM-E: An Embodied Multimodal Language Model
Danny Driess
F. Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
...
Marc Toussaint
Klaus Greff
Andy Zeng
Igor Mordatch
Peter R. Florence
LM&Ro
22
1,579
0
06 Mar 2023
Multimodal Prompting with Missing Modalities for Visual Recognition
Multimodal Prompting with Missing Modalities for Visual Recognition
Yi-Lun Lee
Yi-Hsuan Tsai
Wei-Chen Chiu
Chen-Yu Lee
VPVLM
33
94
0
06 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
49
21
0
04 Mar 2023
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource
  Visual Question Answering
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
Jingjing Jiang
Nanning Zheng
MoE
40
6
0
02 Mar 2023
Meta Learning to Bridge Vision and Language Models for Multimodal
  Few-Shot Learning
Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning
Ivona Najdenkoska
Xiantong Zhen
M. Worring
VLM
26
18
0
28 Feb 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
39
221
0
27 Feb 2023
Language Is Not All You Need: Aligning Perception with Language Models
Language Is Not All You Need: Aligning Perception with Language Models
Shaohan Huang
Li Dong
Wenhui Wang
Y. Hao
Saksham Singhal
...
Johan Bjorck
Vishrav Chaudhary
Subhojit Som
Xia Song
Furu Wei
VLM
LRM
MLLM
32
536
0
27 Feb 2023
Few-shot Multimodal Multitask Multilingual Learning
Few-shot Multimodal Multitask Multilingual Learning
Aman Chadha
Vinija Jain
53
0
0
19 Feb 2023
ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using
  Large Language Models
ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models
Sheng Wang
Zihao Zhao
Xi Ouyang
Qian Wang
Dinggang Shen
LM&MA
MedIm
40
140
0
14 Feb 2023
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot
  Image Captioning
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Zhuolin Yang
Ming-Yu Liu
Zihan Liu
V. Korthikanti
Weili Nie
...
Yuke Zhu
M. Shoeybi
Bryan Catanzaro
Chaowei Xiao
Anima Anandkumar
VLM
RALM
34
39
0
09 Feb 2023
Language Quantized AutoEncoders: Towards Unsupervised Text-Image
  Alignment
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment
Hao Liu
Wilson Yan
Pieter Abbeel
34
25
0
02 Feb 2023
Multimodality Representation Learning: A Survey on Evolution,
  Pretraining and Its Applications
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
42
26
0
01 Feb 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
Grounding Language Models to Images for Multimodal Inputs and Outputs
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
31
117
0
31 Jan 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
320
4,279
0
30 Jan 2023
Debiased Fine-Tuning for Vision-language Models by Prompt Regularization
Debiased Fine-Tuning for Vision-language Models by Prompt Regularization
Beier Zhu
Yulei Niu
Saeil Lee
Minhoe Hur
Hanwang Zhang
VLM
VPVLM
32
22
0
29 Jan 2023
Towards Models that Can See and Read
Towards Models that Can See and Read
Roy Ganz
Oren Nuriel
Aviad Aberdam
Yair Kittenplon
Shai Mazor
Ron Litman
24
13
0
18 Jan 2023
Previous
123...1011789
Next