ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.12597
  4. Cited By
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

30 January 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
    VLM
    MLLM
ArXivPDFHTML

Papers citing "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models"

50 / 822 papers shown
Title
Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation
Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation
Namhee Kim
Woojin Park
46
0
0
06 Feb 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
199
0
0
04 Feb 2025
UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
Aashish Rai
Dilin Wang
Mihir Jain
N. Sarafianos
Arthur Chen
Srinath Sridhar
Aayush Prakash
3DGS
74
1
0
03 Feb 2025
Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection
Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection
Tianlin Zhang
En Yu
Yi Shao
Shuai Li
Sujuan Hou
Jiande Sun
60
0
0
03 Feb 2025
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
Divya J. Bajpai
M. Hanawal
76
0
0
02 Feb 2025
Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding
Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding
Jingming Xia
Guanqun Cao
Guang Ma
Yiben Luo
Qinzhao Li
John Oyekan
MDE
59
0
0
01 Feb 2025
Laser: Efficient Language-Guided Segmentation in Neural Radiance Fields
Laser: Efficient Language-Guided Segmentation in Neural Radiance Fields
Xingyu Miao
Haoran Duan
Yang Bai
Tejal Shah
Jun Song
Yang Long
R. Ranjan
Ling Shao
76
4
0
31 Jan 2025
Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
Bin Zhu
Hui yan Qi
Yinxuan Gui
Jingjing Chen
Chong-Wah Ngo
Ee-Peng Lim
141
1
0
31 Jan 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
65
0
0
28 Jan 2025
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
Zhihang Lin
Mingbao Lin
Luxi Lin
Rongrong Ji
58
16
0
28 Jan 2025
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLM
SyDa
118
5
0
28 Jan 2025
Addressing Out-of-Label Hazard Detection in Dashcam Videos: Insights from the COOOL Challenge
Anh-Kiet Duong
Petra Gomez-Krämer
43
2
0
27 Jan 2025
PreciseCam: Precise Camera Control for Text-to-Image Generation
PreciseCam: Precise Camera Control for Text-to-Image Generation
Edurne Bernal-Berdun
Ana Serrano
B. Masiá
Matheus Gadelha
Yannick Hold-Geoffroy
Xin Sun
Diego F. F. Gutierrez
DiffM
VGen
47
0
0
22 Jan 2025
Patent Figure Classification using Large Vision-language Models
Patent Figure Classification using Large Vision-language Models
Sushil Awale
Eric Müller-Budack
Ralph Ewerth
38
0
0
22 Jan 2025
Triplet Synthesis For Enhancing Composed Image Retrieval via Counterfactual Image Generation
Kenta Uesugi
Naoki Saito
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
44
0
0
22 Jan 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Liang Feng
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
167
0
0
21 Jan 2025
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
MLLM
VLM
LRM
109
6
0
21 Jan 2025
Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection
Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection
Yuanze Li
Haolin Wang
Shihao Yuan
Ming-Yu Liu
Debin Zhao
Yiwen Guo
Chen Xu
Guangming Shi
Wangmeng Zuo
89
29
0
20 Jan 2025
Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding
Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding
Kohei Torimi
Ryosuke Yamada
Daichi Otsuka
Kensho Hara
Yuki M. Asano
Hirokatsu Kataoka
Y. Aoki
3DV
38
0
0
20 Jan 2025
Dynamic Scene Understanding from Vision-Language Representations
Dynamic Scene Understanding from Vision-Language Representations
Shahaf Pruss
Morris Alper
Hadar Averbuch-Elor
OCL
197
0
0
20 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
105
18
0
17 Jan 2025
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
Xueyan Li
Xinyan Chen
Yazhe Niu
Shuai Hu
Yu Liu
OffRL
65
3
0
17 Jan 2025
Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models
Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models
Abdulkadir Erol
Trilok Padhi
Agnik Saha
Ugur Kursuncu
Mehmet Emin Aktas
47
1
0
17 Jan 2025
DriveLM: Driving with Graph Visual Question Answering
DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima
Katrin Renz
Kashyap Chitta
L. Chen
Hanxue Zhang
Chengen Xie
Jens Beißwenger
Ping Luo
Andreas Geiger
Hongyang Li
108
164
0
17 Jan 2025
IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion
IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion
Tharun Anand
Aryan Garg
Kaushik Mitra
VGen
DiffM
52
0
0
13 Jan 2025
RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment
RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment
Difei Gu
Yunhe Gao
Yang Zhou
Mu Zhou
Dimitris N. Metaxas
LM&MA
53
2
0
13 Jan 2025
Initial Findings on Sensor based Open Vocabulary Activity Recognition via Text Embedding Inversion
Initial Findings on Sensor based Open Vocabulary Activity Recognition via Text Embedding Inversion
L. Ray
Bo Zhou
Sungho Suh
P. Lukowicz
VLM
35
0
0
13 Jan 2025
TimeLogic: A Temporal Logic Benchmark for Video QA
TimeLogic: A Temporal Logic Benchmark for Video QA
S. Swetha
Hilde Kuehne
Mubarak Shah
52
1
0
13 Jan 2025
LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models
LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models
Mozhgan Nasr Azadani
James Riddell
Sean Sedwards
Krzysztof Czarnecki
MLLM
VLM
47
2
0
13 Jan 2025
VideoAuteur: Towards Long Narrative Video Generation
VideoAuteur: Towards Long Narrative Video Generation
Junfei Xiao
Feng Cheng
Lu Qi
Liangke Gui
Jiepeng Cen
Zhibei Ma
Alan L. Yuille
Lu Jiang
VGen
58
2
0
10 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
Dahua Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
106
109
0
10 Jan 2025
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
Yuzhou Huang
Ziyang Yuan
Quande Liu
Qiulin Wang
Xintao Wang
Ruimao Zhang
Pengfei Wan
Di Zhang
Kun Gai
VGen
DiffM
45
10
0
08 Jan 2025
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Yuanpeng Tu
Hao Luo
Xi Chen
S. Ji
Xiang Bai
Hengshuang Zhao
VGen
DiffM
42
3
0
08 Jan 2025
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang
Qingkai Fang
Zhe Yang
Yang Feng
MLLM
VLM
71
28
0
07 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
Xianrui Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming Yang
VLM
96
12
0
07 Jan 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
132
3
0
05 Jan 2025
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Lijie Tao
H. Zhang
Haizhao Jing
Yu Liu
Kelu Yao
Guoting Wei
Xizhe Xue
37
0
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
53
0
0
03 Jan 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Wenhu Chen
MLLM
VLM
51
20
0
03 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
102
48
0
03 Jan 2025
Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs
Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs
Linhao Huang
Xue Jiang
Zhiqiang Wang
Wentao Mo
Xi Xiao
Bo Han
Yongjie Yin
Feng Zheng
AAML
53
2
0
02 Jan 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Jiaqi Wang
Hengshuang Zhao
86
7
0
02 Jan 2025
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang
Hang Zhang
Xin Li
Jiashuo Sun
Yongliang Shen
Weiming Lu
Deli Zhao
Yueting Zhuang
Lidong Bing
VLM
43
2
0
01 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
69
24
0
31 Dec 2024
VidTwin: Video VAE with Decoupled Structure and Dynamics
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang
Junliang Guo
Xinyi Xie
Tianyu He
Xu Sun
Jiang Bian
DRL
VGen
77
3
0
23 Dec 2024
Neural-MCRL: Neural Multimodal Contrastive Representation Learning for EEG-based Visual Decoding
Neural-MCRL: Neural Multimodal Contrastive Representation Learning for EEG-based Visual Decoding
Yueyang Li
Zijian Kang
Shengyu Gong
Wenhao Dong
Weiming Zeng
Hongjie Yan
W. Siok
Nizhuan Wang
54
2
0
23 Dec 2024
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang
Ziwei Zheng
Boxu Chen
Zhengyu Zhao
Chenhao Lin
Chao Shen
VLM
140
3
0
18 Dec 2024
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia
M. Li
Hancheng Ye
Wenjie Wu
Hongbin Zhou
...
Zeang Sheng
Botian Shi
Tao Chen
Junchi Yan
Bo Zhang
91
7
0
16 Dec 2024
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu
Yixin Chen
Junfeng Ni
Baoxiong Jia
Yu Liu
Diwen Wan
Gang Zeng
Siyuan Huang
DiffM
127
4
0
16 Dec 2024
CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution
CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution
Bingwen Hu
Heng Liu
Zhedong Zheng
Ping Liu
SupR
86
0
0
16 Dec 2024
Previous
123456...151617
Next