ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.12597
  4. Cited By
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
v1v2v3 (latest)

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

30 January 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
    VLMMLLM
ArXiv (abs)PDFHTML

Papers citing "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models"

50 / 2,338 papers shown
Title
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
Ziyu Yao
Xuxin Cheng
Zhiqi Huang
Lei Li
159
2
0
01 Jul 2025
A Narrative Review on Large AI Models in Lung Cancer Screening, Diagnosis, and Treatment Planning
A Narrative Review on Large AI Models in Lung Cancer Screening, Diagnosis, and Treatment Planning
Jiachen Zhong
Yiting Wang
Di Zhu
Ziwei Wang
LM&MAAI4CE
48
1
0
01 Jul 2025
ThinkVideo: High-Quality Reasoning Video Segmentation with Chain of Thoughts
ThinkVideo: High-Quality Reasoning Video Segmentation with Chain of Thoughts
Shiu-hong Kao
Yu-Wing Tai
Chi-Keung Tang
VOSMLLMVGenLRM
105
0
0
01 Jul 2025
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Shiu-hong Kao
Yu-Wing Tai
Chi-Keung Tang
MLLMLRM
283
1
0
01 Jul 2025
DreamCube: 3D Panorama Generation via Multi-plane Synchronization
DreamCube: 3D Panorama Generation via Multi-plane Synchronization
Yukun Huang
Yanning Zhou
Jianan Wang
Kaiyi Huang
Xihui Liu
18
0
0
20 Jun 2025
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
Lei Jiang
Zixun Zhang
Zizhou Wang
Xiaobing Sun
Zhen Li
Liangli Zhen
Xiaohua Xu
AAML
14
0
0
20 Jun 2025
How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions
How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions
Manuel Brack
Sudeep Katakol
Felix Friedrich
P. Schramowski
Hareesh Ravi
Kristian Kersting
Ajinkya Kale
20
0
0
20 Jun 2025
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
Tongtian Yue
Longteng Guo
Yepeng Tang
Zijia Zhao
Xinxin Zhu
Hua Huang
Jing Liu
MLLMVLM
16
0
0
20 Jun 2025
FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation
FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation
Fan Yang
Yousong Zhu
Xin Li
Yufei Zhan
Hongyin Zhao
Shurong Zheng
Yaowei Wang
Ming Tang
Jinqiao Wang
MLLMVLM
40
0
0
20 Jun 2025
Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs
Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs
Haoran Sun
Yankai Jiang
Wenjie Lou
Yujie Zhang
Wenjie Li
Lilong Wang
Mianxin Liu
Lei Liu
Xiaosong Wang
LRM
15
0
0
20 Jun 2025
AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models
AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models
Yuan Zhang
Chun-Kai Fan
Tao Huang
Ming Lu
Sicheng Yu
Junwen Pan
Kuan Cheng
Qi She
Shanghang Zhang
VLMLRM
19
0
0
19 Jun 2025
MBA: Multimodal Bidirectional Attack for Referring Expression Segmentation Models
MBA: Multimodal Bidirectional Attack for Referring Expression Segmentation Models
Xingbai Chen
Tingchao Fu
Renyang Liu
Wei Zhou
Chao Yi
AAML
26
0
0
19 Jun 2025
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Zejun Ma
Chao Zhang
23
0
0
18 Jun 2025
Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
Kartik Sharma
Yiqiao Jin
Vineeth Rakesh
Yingtong Dou
Menghai Pan
Mahashweta Das
Srijan Kumar
AAML
15
0
0
18 Jun 2025
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yong Man Ro
Yu-Chun Wang
Yueh-Hua Wu
VLM
38
0
0
18 Jun 2025
Demystifying the Visual Quality Paradox in Multimodal Large Language Models
Demystifying the Visual Quality Paradox in Multimodal Large Language Models
Shuo Xing
Lanqing guo
Hongyuan Hua
Seoyoung Lee
Peiran Li
Yufei Wang
Zhangyang Wang
Zhengzhong Tu
VLM
41
0
0
18 Jun 2025
Weakly-supervised VLM-guided Partial Contrastive Learning for Visual Language Navigation
Weakly-supervised VLM-guided Partial Contrastive Learning for Visual Language Navigation
Ruoyu Wang
Tong Yu
Junda Wu
Yao Liu
Julian McAuley
Lina Yao
15
0
0
18 Jun 2025
Privacy-Shielded Image Compression: Defending Against Exploitation from Vision-Language Pretrained Models
Privacy-Shielded Image Compression: Defending Against Exploitation from Vision-Language Pretrained Models
Xuelin Shen
Jiayin Xu
Kangsheng Yin
Wenhan Yang
AAML
19
0
0
18 Jun 2025
Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning
Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning
Ankan Deria
Adinath Madhavrao Dukre
Feilong Tang
Sara Atito
Sudipta Roy
Muhammad Awais
Muhammad Haris Khan
Imran Razzak
VLM
42
0
0
18 Jun 2025
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
Jiamin Xie
Ju Lin
Yiteng Huang
Tyler Vuong
Zhaojiang Lin
...
Peng Su
Prashant Rawat
Sangeeta Srivastava
Ming Sun
Florian Metze
17
0
0
17 Jun 2025
NetRoller: Interfacing General and Specialized Models for End-to-End Autonomous Driving
NetRoller: Interfacing General and Specialized Models for End-to-End Autonomous Driving
Ren Xin
Hongji Liu
Xiaodong Mei
Wenru Liu
Maosheng Ye
Zhili Chen
Jun Ma
27
0
0
17 Jun 2025
Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems
Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems
Tuan Nguyen
Long-Vu Hoang
Huy-Dat Tran
12
0
0
16 Jun 2025
Uncertainty-Informed Active Perception for Open Vocabulary Object Goal Navigation
Uncertainty-Informed Active Perception for Open Vocabulary Object Goal Navigation
Utkarsh Bajpai
Julius Ruckin
Cyrill Stachniss
Marija Popović
15
0
0
16 Jun 2025
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Shaolei Zhang
Shoutao Guo
Qingkai Fang
Yan Zhou
Yang Feng
MLLMAuLLMVLM
51
0
0
16 Jun 2025
Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments
Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments
Xuan Wang
Siyuan Liang
Zhe Liu
Yi Yu
Yuliang Lu
Xiaochun Cao
Ee-Chien Chang
X. Gao
AAML
70
0
0
16 Jun 2025
Anomaly Object Segmentation with Vision-Language Models for Steel Scrap Recycling
Anomaly Object Segmentation with Vision-Language Models for Steel Scrap Recycling
Daichi Tanaka
Takumi Karasawa
Shu Takenouchi
Rei Kawakami
18
0
0
16 Jun 2025
Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs
Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs
Gyutaek Oh
Seoyeon Kim
Sangjoon Park
Byung-Hoon Kim
LM&MALRM
31
0
0
16 Jun 2025
SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models
SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models
Xinyi Zhao
Congjing Zhang
Pei Guo
Wei Li
Lin Chen
Chaoyue Zhao
Shuai Huang
20
0
0
15 Jun 2025
Dynamic Modality Scheduling for Multimodal Large Models via Confidence, Uncertainty, and Semantic Consistency
Dynamic Modality Scheduling for Multimodal Large Models via Confidence, Uncertainty, and Semantic Consistency
Hiroshi Tanaka
Anika Rao
Hana Satou
Michael Johnson
Sofia García
18
0
0
15 Jun 2025
The Safety Reminder: A Soft Prompt to Reactivate Delayed Safety Awareness in Vision-Language Models
The Safety Reminder: A Soft Prompt to Reactivate Delayed Safety Awareness in Vision-Language Models
Peiyuan Tang
Haojie Xin
Xiaodong Zhang
Jun Sun
Qin Xia
Zijiang Yang
VLM
19
0
0
15 Jun 2025
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Xiao Xu
L. Qin
Wanxiang Che
Min-Yen Kan
MoEVLM
30
0
0
13 Jun 2025
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
Bo-Cheng Chiu
Jen-Jee Chen
Yu-Chee Tseng
Feng-Chi Chen
14
0
0
13 Jun 2025
Dynamic Double Space Tower
Dynamic Double Space Tower
Weikai Sun
Shijie Song
Han Wang
15
0
0
13 Jun 2025
Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis
Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis
Yuan Gao
Mattia Piccinini
Yuchen Zhang
Dingrui Wang
Korbinian Moller
...
Steven Peters
Andrea Stocco
Bassam Alrifaee
Marco Pavone
Johannes Betz
23
0
0
13 Jun 2025
Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?
Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?
Fei Lin
Ziyang Gong
Cong Wang
Yonglin Tian
Tengchao Zhang
Xue Yang
Gen Luo
Fei Wang
124
0
0
12 Jun 2025
Can Sound Replace Vision in LLaVA With Token Substitution?
Can Sound Replace Vision in LLaVA With Token Substitution?
Ali Vosoughi
Jing Bi
Pinxin Liu
Yunlong Tang
Chenliang Xu
CLIPVLM
131
0
0
12 Jun 2025
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration
Jun Wang
Lixing Zhu
Xiaohan Yu
A. Bhalerao
Yulan He
122
0
0
12 Jun 2025
LLMs Are Not Yet Ready for Deepfake Image Detection
LLMs Are Not Yet Ready for Deepfake Image Detection
Shahroz Tariq
David D. Nguyen
M.A.P. Chamikara
Tingmin Wu
A. Abuadbba
Kristen Moore
VLM
102
0
0
12 Jun 2025
Uncertainty-Aware Deep Learning for Automated Skin Cancer Classification: A Comprehensive Evaluation
Uncertainty-Aware Deep Learning for Automated Skin Cancer Classification: A Comprehensive Evaluation
Hamzeh Asgharnezhad
Pegah Tabarisaadi
Abbas Khosravi
R. Alizadehsani
Usha R. Acharya
122
0
0
12 Jun 2025
MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling
MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling
Liang Yin
Xudong Xie
Zhang Li
Xiang Bai
Yuliang Liu
LRM
117
0
0
12 Jun 2025
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation
Zhiyang Xu
Jiuhai Chen
Zhaojiang Lin
Xichen Pan
Lifu Huang
...
Di Jin
Michihiro Yasunaga
Lili Yu
Xi Lin
Shaoliang Nie
121
1
0
12 Jun 2025
LLM-to-Phy3D: Physically Conform Online 3D Object Generation with LLMs
LLM-to-Phy3D: Physically Conform Online 3D Object Generation with LLMs
Melvin Wong
Yueming Lyu
Thiago Rios
Stefan Menzel
Yew-Soon Ong
PINNAI4CE
38
0
0
11 Jun 2025
Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning
C. L. Philip Chen
Yunpeng Zhai
Yifan Zhao
Jinyang Gao
Bolin Ding
Jia Li
41
0
0
11 Jun 2025
A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation
A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation
Yukang Feng
Jianwen Sun
Chuanhao Li
Zizhen Li
Jiaxin Ai
...
Yifan Chang
Sizhuo Zhou
Shenglin Zhang
Yu Dai
Kaipeng Zhang
MLLMEGVM
90
0
0
11 Jun 2025
Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMs
Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMs
Beomsik Cho
Jaehyung Kim
64
0
0
11 Jun 2025
Vision Generalist Model: A Survey
Vision Generalist Model: A Survey
Ziyi Wang
Yongming Rao
Shuofeng Sun
Xinrun Liu
Yi Wei
...
Zuyan Liu
Yanbo Wang
Hongmin Liu
Jie Zhou
Jiwen Lu
65
0
0
11 Jun 2025
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning
Yuting Li
Lai Wei
Kaipeng Zheng
Jingyuan Huang
Linghe Kong
Lichao Sun
Weiran Huang
AAMLLRMVLM
80
0
0
11 Jun 2025
HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding
HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding
Yanzhao Shi
Xiaodan Zhang
Junzhong Ji
Haoning Jiang
Chengxin Zheng
Y. Wang
Liangqiong Qu
89
0
0
11 Jun 2025
Multimodal Representation Alignment for Cross-modal Information Retrieval
Fan Xu
Luis A. Leiva
19
0
0
10 Jun 2025
Bias Analysis in Unconditional Image Generative Models
Xiaofeng Zhang
Michelle Lin
Simon Lacoste-Julien
Aaron Courville
Yash Goyal
23
0
0
10 Jun 2025
1234...454647
Next