ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.12597
  4. Cited By
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
v1v2v3 (latest)

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

30 January 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
    VLMMLLM
ArXiv (abs)PDFHTML

Papers citing "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models"

50 / 2,338 papers shown
Title
Large Language Models and Foundation Models in Smart Agriculture:
  Basics, Opportunities, and Challenges
Large Language Models and Foundation Models in Smart Agriculture: Basics, Opportunities, and Challenges
Jiajia Li
Mingle Xu
Lirong Xiang
Dong Chen
Weichao Zhuang
Xunyuan Yin
Zhao Li
119
3
0
13 Aug 2023
COMICS: End-to-end Bi-grained Contrastive Learning for Multi-face
  Forgery Detection
COMICS: End-to-end Bi-grained Contrastive Learning for Multi-face Forgery Detection
Cong Zhang
H. Qi
Shuhui Wang
Yuezun Li
Siwei Lyu
CVBM
81
7
0
03 Aug 2023
Learning to Model the World with Language
Learning to Model the World with Language
Jessy Lin
Yuqing Du
Olivia Watkins
Danijar Hafner
Pieter Abbeel
Dan Klein
Anca Dragan
LM&RoSyDa
121
55
0
31 Jul 2023
Sample Less, Learn More: Efficient Action Recognition via Frame Feature
  Restoration
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Harry Cheng
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Mohan S. Kankanhalli
92
7
0
27 Jul 2023
Seeing through the Brain: Image Reconstruction of Visual Perception from
  Human Brain Signals
Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals
Yu-Ting Lan
Kan Ren
Yansen Wang
Wei-Long Zheng
Dongsheng Li
Bao-Liang Lu
Lili Qiu
DiffM
118
22
0
27 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
146
127
0
25 Jul 2023
Towards a Visual-Language Foundation Model for Computational Pathology
Towards a Visual-Language Foundation Model for Computational Pathology
Ming Y. Lu
Bowen Chen
Drew F. K. Williamson
Richard J. Chen
Ivy Liang
...
Andrew Zhang
L. Le
Georg Gerber
Anil V. Parwani
Faisal Mahmood
VLMMedIm
110
46
0
24 Jul 2023
Identifying Interpretable Subspaces in Image Representations
Identifying Interpretable Subspaces in Image Representations
Neha Kalibhat
S. Bhardwaj
Bayan Bruss
Hamed Firooz
Maziar Sanjabi
Soheil Feizi
FAtt
99
28
0
20 Jul 2023
Improving Multimodal Datasets with Image Captioning
Improving Multimodal Datasets with Image Captioning
Thao Nguyen
S. Gadre
Gabriel Ilharco
Sewoong Oh
Ludwig Schmidt
VLM
99
77
0
19 Jul 2023
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring
  Instruction Tuning
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Liang Zhao
En Yu
Zheng Ge
Jinrong Yang
Hao-Ran Wei
...
Jian‐Yuan Sun
Yuang Peng
Runpei Dong
Chunrui Han
Xiangyu Zhang
MLLMLRM
79
54
0
18 Jul 2023
Image Captions are Natural Prompts for Text-to-Image Models
Image Captions are Natural Prompts for Text-to-Image Models
Shiye Lei
Hao Chen
Senyang Zhang
Bo Zhao
Dacheng Tao
VLM
111
23
0
17 Jul 2023
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Yi-Syuan Chen
Yun-Zhu Song
Cheng Yu Yeo
Bei Liu
Jianlong Fu
Hong-Han Shuai
VLMLRM
92
4
0
15 Jul 2023
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times
  and Location Reasoning
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning
Gengyuan Zhang
Yurui Zhang
Kerui Zhang
Volker Tresp
LRM
69
13
0
12 Jul 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
  Language Models
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
132
519
0
12 Jul 2023
Emu: Generative Pretraining in Multimodality
Emu: Generative Pretraining in Multimodality
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
119
138
0
11 Jul 2023
Embodied Task Planning with Large Language Models
Embodied Task Planning with Large Language Models
Zhenyu Wu
Ziwei Wang
Xiuwei Xu
Jiwen Lu
Haibin Yan
LM&RoLLMAG
81
76
0
04 Jul 2023
Image Background Serves as Good Proxy for Out-of-distribution Data
Image Background Serves as Good Proxy for Out-of-distribution Data
Sen Pei
83
2
0
02 Jul 2023
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models
Uddeshya Upadhyay
Shyamgopal Karthik
Massimiliano Mancini
Zeynep Akata
MLLMVLM
84
4
0
01 Jul 2023
CLIPAG: Towards Generator-Free Text-to-Image Generation
CLIPAG: Towards Generator-Free Text-to-Image Generation
Roy Ganz
Michael Elad
VLM
82
8
0
29 Jun 2023
Towards Language Models That Can See: Computer Vision Through the LENS
  of Natural Language
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
William Berrios
Gautam Mittal
Tristan Thrush
Douwe Kiela
Amanpreet Singh
MLLMVLM
72
61
0
28 Jun 2023
Generative Multimodal Entity Linking
Generative Multimodal Entity Linking
Senbao Shi
Zhenran Xu
Baotian Hu
Hao Fei
MLLMVLM
62
6
0
22 Jun 2023
MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators
MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators
Yaqi Zhang
Di Huang
B. Liu
Shixiang Tang
Yan Lu
Lu Chen
Lei Bai
Qi Chu
Nenghai Yu
Wanli Ouyang
168
104
0
19 Jun 2023
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Rabiul Awal
Le Zhang
Aishwarya Agrawal
LRM
141
13
0
16 Jun 2023
Tell Me Where to Go: A Composable Framework for Context-Aware Embodied
  Robot Navigation
Tell Me Where to Go: A Composable Framework for Context-Aware Embodied Robot Navigation
Harel Biggie
Ajay Narasimha Mopidevi
Dusty Woods
Christoffer Heckman
LM&Ro
67
11
0
15 Jun 2023
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and
  Text Integration
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
Chenyang Lyu
Minghao Wu
Longyue Wang
Xinting Huang
Bingshuai Liu
Zefeng Du
Shuming Shi
Zhaopeng Tu
MLLMAuLLM
83
173
0
15 Jun 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute,
  Inspect, and Learn
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Difei Gao
Lei Ji
Luowei Zhou
Kevin Lin
Joya Chen
Zihan Fan
Mike Zheng Shou
MLLM
96
76
0
14 Jun 2023
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models
Raz Lapid
Moshe Sipper
AAML
110
17
0
13 Jun 2023
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He
Yihe Deng
Shixiang Tang
Qihao Chen
Qingsong Xie
...
Feng Zhu
Rui Zhao
Wanli Ouyang
Donglian Qi
Yunfeng Yan
121
24
0
13 Jun 2023
Valley: Video Assistant with Large Language model Enhanced abilitY
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
135
209
0
12 Jun 2023
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for
  Pre-training and Benchmarks
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Haiyang Xu
Qinghao Ye
Xuan-Wei Wu
Mingshi Yan
Yuan Miao
...
Qingfang Qian
Maofei Que
Ji Zhang
Xiaoyan Zeng
Feiyan Huang
VLMMLLM
101
25
0
07 Jun 2023
LRVS-Fashion: Extending Visual Search with Referring Instructions
LRVS-Fashion: Extending Visual Search with Referring Instructions
Simon Lepage
Jérémie Mary
David Picard
87
1
0
05 Jun 2023
Learning Probabilistic Symmetrization for Architecture Agnostic
  Equivariance
Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance
Jinwoo Kim
Tien Dat Nguyen
Ayhan Suleymanzade
Hyeokjun An
Seunghoon Hong
100
24
0
05 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
87
1
0
04 Jun 2023
Benchmarking Robustness of Adaptation Methods on Pre-trained
  Vision-Language Models
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models
Shuo Chen
Jindong Gu
Zhen Han
Yunpu Ma
Philip Torr
Volker Tresp
VPVLMVLM
127
21
0
03 Jun 2023
"Let's not Quote out of Context": Unified Vision-Language Pretraining
  for Context Assisted Image Captioning
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning
Abisek Rajakumar Kalarani
P. Bhattacharyya
Niyati Chhaya
Sumit Shekhar
CoGeVLM
111
9
0
01 Jun 2023
Vocabulary-free Image Classification
Vocabulary-free Image Classification
Alessandro Conti
Enrico Fini
Massimiliano Mancini
Paolo Rota
Yiming Wang
Elisa Ricci
VLM
129
27
0
01 Jun 2023
PanoGen: Text-Conditioned Panoramic Environment Generation for
  Vision-and-Language Navigation
PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation
Jialu Li
Joey Tianyi Zhou
DiffM
101
55
0
30 May 2023
LANCE: Stress-testing Visual Models by Generating Language-guided
  Counterfactual Images
LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images
Viraj Prabhu
Sriram Yenamandra
Prithvijit Chattopadhyay
Judy Hoffman
104
42
0
30 May 2023
Learning without Forgetting for Vision-Language Models
Learning without Forgetting for Vision-Language Models
Da-Wei Zhou
Yuanhan Zhang
Jingyi Ning
Jingyi Ning
De-Chuan Zhan
De-Chuan Zhan
Ziwei Liu
VLMCLL
140
44
0
30 May 2023
GlyphControl: Glyph Conditional Control for Visual Text Generation
GlyphControl: Glyph Conditional Control for Visual Text Generation
Yukang Yang
Dongnan Gui
Yuhui Yuan
Weicong Liang
Haisong Ding
Hang-Rui Hu
Kai Chen
DiffM
90
85
0
29 May 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and
  Dataset
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
195
112
0
29 May 2023
Distill Gold from Massive Ores: Efficient Dataset Distillation via
  Critical Samples Selection
Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection
Yue Xu
Yong-Lu Li
Kaitong Cui
Ziyu Wang
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
DD
122
8
0
28 May 2023
On Evaluating Adversarial Robustness of Large Vision-Language Models
On Evaluating Adversarial Robustness of Large Vision-Language Models
Yunqing Zhao
Tianyu Pang
Chao Du
Xiao Yang
Chongxuan Li
Ngai-Man Cheung
Min Lin
VLMAAMLMLLM
149
184
0
26 May 2023
ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs
ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs
Zihao Zhao
Sheng Wang
Jinchen Gu
Yitao Zhu
Lanzhuju Mei
Zixu Zhuang
Zhiming Cui
Qian Wang
Dinggang Shen
LM&MA
116
43
0
25 May 2023
Visual Programming for Text-to-Image Generation and Evaluation
Visual Programming for Text-to-Image Generation and Evaluation
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
MLLM
119
51
0
24 May 2023
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large
  Language Models
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Gen Luo
Yiyi Zhou
Tianhe Ren
Shen Chen
Xiaoshuai Sun
Rongrong Ji
VLMMLLM
117
98
0
24 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLMMLLMLRM
150
44
0
24 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
103
5
0
23 May 2023
MemeCap: A Dataset for Captioning and Interpreting Memes
MemeCap: A Dataset for Captioning and Interpreting Memes
EunJeong Hwang
Vered Shwartz
VLM
76
38
0
23 May 2023
UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model
UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model
Zhenghao Zhang
Shengfan Zhang
Zhichao Wei
Zuozhuo Dai
Siyu Zhu
VOSVLM
87
18
0
22 May 2023
Previous
123...454647
Next