ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.12597
  4. Cited By
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
v1v2v3 (latest)

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

30 January 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
    VLMMLLM
ArXiv (abs)PDFHTML

Papers citing "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models"

50 / 2,352 papers shown
Title
TrojVLM: Backdoor Attack Against Vision Language Models
TrojVLM: Backdoor Attack Against Vision Language Models
Weimin Lyu
Lu Pang
Tengfei Ma
Haibin Ling
Chao Chen
MLLM
97
11
0
28 Sep 2024
Conditional Image Synthesis with Diffusion Models: A Survey
Conditional Image Synthesis with Diffusion Models: A Survey
Zheyuan Zhan
Defang Chen
Jian-Ping Mei
Zhenghe Zhao
Jiawei Chen
Chun-Yen Chen
Siwei Lyu
Can Wang
VLM
109
10
0
28 Sep 2024
Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
116
233
0
27 Sep 2024
Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey
Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey
Yi Zhang
Zhen Chen
Chih-Hong Cheng
Wenjie Ruan
Xiaowei Huang
Dezong Zhao
David Flynn
Siddartha Khastgir
Xingyu Zhao
MedIm
97
4
0
26 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
112
23
0
26 Sep 2024
SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language
  Models for Robotic Garment Manipulation
SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation
Xin Li
Siyuan Huang
Qiaojun Yu
Zhengkai Jiang
Ce Hao
Yimeng Zhu
Hongsheng Li
Peng Gao
Cewu Lu
77
0
0
26 Sep 2024
Resolving Multi-Condition Confusion for Finetuning-Free Personalized
  Image Generation
Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
Qihan Huang
Siming Fu
Jinlong Liu
Hao Jiang
Yipeng Yu
Jie Song
74
9
0
26 Sep 2024
Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications
Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications
Nghia Nguyen
Minh Nhat Vu
Tung D. Ta
Baoru Huang
T. Vo
Ngan Le
Anh Nguyen
VLMCLIP
79
6
0
26 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLMAuLLM
175
12
0
26 Sep 2024
Neural Contrast: Leveraging Generative Editing for Graphic Design
  Recommendations
Neural Contrast: Leveraging Generative Editing for Graphic Design Recommendations
Marian Lupascu
Ionut Mironica
Mihai-Sorin Stupariu
DiffM
66
0
0
26 Sep 2024
ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context
  Information in Multi-Turn Multimodal Medical Dialogue
ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue
Zhangpu Li
Changhong Zou
Suxue Ma
Zhicheng Yang
Chen Du
...
Xingzhi Sun
Jing Xiao
Kai Zhang
Mei Han
Mei Han
LM&MA
98
1
0
26 Sep 2024
Robotic Environmental State Recognition with Pre-Trained Vision-Language
  Models and Black-Box Optimization
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization
Kento Kawaharazuka
Yoshiki Obinata
Naoaki Kanazawa
Kei Okada
Masayuki Inaba
LM&Ro
61
0
0
26 Sep 2024
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task
  Learning Via Connector-MoE
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE
Xun Zhu
Ying Hu
Fanbin Mo
Miao Li
Ji Wu
125
9
0
26 Sep 2024
SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning
SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning
Rimvydas Rubavicius
Peter David Fagan
A. Lascarides
Subramanian Ramamoorthy
LM&Ro
456
0
0
26 Sep 2024
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography
Yuexi Du
John Onofrey
Nicha Dvornek
VLM
110
2
0
26 Sep 2024
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
248
52
0
26 Sep 2024
ChatCam: Empowering Camera Control through Conversational AI
ChatCam: Empowering Camera Control through Conversational AI
Xinhang Liu
Yu-Wing Tai
Chi-Keung Tang
VGen
81
3
0
25 Sep 2024
Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision,
  Physics Simulation, and a Robot with Reset
Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset
Andrew Goldberg
Kavish Kondap
Tianshuang Qiu
Zehan Ma
Letian Fu
Justin Kerr
Huang Huang
Kaiyuan Chen
Kuan Fang
Ken Goldberg
79
4
0
25 Sep 2024
DALDA: Data Augmentation Leveraging Diffusion Model and LLM with
  Adaptive Guidance Scaling
DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling
Kyuheon Jung
Yongdeuk Seo
Seongwoo Cho
Jaeyoung Kim
Hyun-seok Min
Sungchul Choi
33
1
0
25 Sep 2024
The Role of Language Models in Modern Healthcare: A Comprehensive Review
The Role of Language Models in Modern Healthcare: A Comprehensive Review
Amna Khalid
Ayma Khalid
Umar Khalid
LM&MA
68
0
0
25 Sep 2024
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
Francesco Verdini
Pierfrancesco Melucci
Stefano Perna
Francesco Cariaggi
Marco Gaido
...
Marek Kasztelnik
L. Bentivogli
Sébastien Bratières
P. Merialdo
Simone Scardapane
AuLLM
69
1
0
25 Sep 2024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Siyin Wang
Wenyi Yu
Yudong Yang
Changli Tang
Yixuan Li
...
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLMLM&MA
136
8
0
25 Sep 2024
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Jiacheng Zhang
Yang Jiao
Shaoxiang Chen
Jingjing Chen
Zhiyu Tan
Hao Li
Jingjing Chen
MLLM
151
23
0
25 Sep 2024
GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design
GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design
Phillip Mueller
Sebastian Mueller
Lars Mikelsons
125
2
0
25 Sep 2024
A Unified Hallucination Mitigation Framework for Large Vision-Language
  Models
A Unified Hallucination Mitigation Framework for Large Vision-Language Models
Yue Chang
Liqiang Jing
Xiaopeng Zhang
Yue Zhang
VLMMLLM
112
4
0
24 Sep 2024
Expert-level vision-language foundation model for real-world radiology
  and comprehensive evaluation
Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation
Xiaohong Liu
Guoxing Yang
Yulin Luo
Jiaji Mao
Xiang Zhang
Ming Gao
Shanghang Zhang
Jun Shen
Guangyu Wang
VLMLM&MAMedIm
68
2
0
24 Sep 2024
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character
  Pre-training in LLMs
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
Yang Yuhang
Peng Yizhou
Eng Siong Chng
Xionghu Zhong
AuLLMAI4CE
53
0
0
24 Sep 2024
Learning Multiple Probabilistic Decisions from Latent World Model in
  Autonomous Driving
Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving
Lingyu Xiao
Jiang-Jiang Liu
Sen Yang
Xiaofan Li
Xiaoqing Ye
Wankou Yang
Jingdong Wang
132
0
0
24 Sep 2024
SYNERGAI: Perception Alignment for Human-Robot Collaboration
SYNERGAI: Perception Alignment for Human-Robot Collaboration
Yixin Chen
Guoxi Zhang
Yaowei Zhang
Hongming Xu
Peiyuan Zhi
Qing Li
Siyuan Huang
75
0
0
24 Sep 2024
Critic Loss for Image Classification
Critic Loss for Image Classification
B. Rappazzo
Aaron Ferber
Carla P. Gomes
VLM
63
0
0
23 Sep 2024
Exploring Fine-grained Retail Product Discrimination with Zero-shot
  Object Classification Using Vision-Language Models
Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models
Anil Osman Tur
Alessandro Conti
Cigdem Beyan
Davide Boscaini
Roberto Larcher
S. Messelodi
Fabio Poiesi
Elisa Ricci
VLM
108
0
0
23 Sep 2024
Multi-modal Generative AI: Multi-modal LLMs, Diffusions and the Unification
Multi-modal Generative AI: Multi-modal LLMs, Diffusions and the Unification
X. Wang
Yuwei Zhou
Bin Huang
Hong Chen
Wenwu Zhu
DiffM
154
9
0
23 Sep 2024
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
Sombit Dey
Jan-Nico Zaech
Nikolay Nikolov
Luc Van Gool
Danda Pani Paudel
MoMeVLM
151
5
0
23 Sep 2024
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
Mohammad Shahab Sepehri
Zalan Fabian
Maryam Soltanolkotabi
Mahdi Soltanolkotabi
MedIm
142
6
0
23 Sep 2024
OmniBench: Towards The Future of Universal Omni-Language Models
OmniBench: Towards The Future of Universal Omni-Language Models
Yizhi Li
Ge Zhang
Yinghao Ma
Ruibin Yuan
Kang Zhu
...
Zhaoxiang Zhang
Zachary Liu
Emmanouil Benetos
Wenhao Huang
Chenghua Lin
LRM
184
19
0
23 Sep 2024
SOS: Segment Object System for Open-World Instance Segmentation With
  Object Priors
SOS: Segment Object System for Open-World Instance Segmentation With Object Priors
Christian Wilms
Tim Rolff
Maris Hillemann
Robert Johanson
Simone Frintrop
VLM
85
1
0
22 Sep 2024
What Are They Doing? Joint Audio-Speech Co-Reasoning
What Are They Doing? Joint Audio-Speech Co-Reasoning
Yingzhi Wang
Pooneh Mousavi
Artem Ploujnikov
Mirco Ravanelli
AuLLM
99
2
0
22 Sep 2024
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video
  Understanding
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Yan Shu
Peitian Zhang
Zheng Liu
Minghao Qin
Yueze Wang
Tiejun Huang
Bo Zhao
VLM
141
59
0
22 Sep 2024
LLMs are One-Shot URL Classifiers and Explainers
LLMs are One-Shot URL Classifiers and Explainers
Fariza Rashid
Nishavi Ranaweera
Ben Doyle
Suranga Seneviratne
LRM
87
3
0
22 Sep 2024
Dormant: Defending against Pose-driven Human Image Animation
Dormant: Defending against Pose-driven Human Image Animation
Jiachen Zhou
Mingsi Wang
Tianlin Li
Guozhu Meng
Kai Chen
160
5
0
22 Sep 2024
BrainDreamer: Reasoning-Coherent and Controllable Image Generation from
  EEG Brain Signals via Language Guidance
BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance
Ling Wang
Chen Wu
Lin Wang
DiffM
66
0
0
21 Sep 2024
AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity
AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity
Zhibin Lan
Liqiang Niu
Fandong Meng
Wenbo Li
Jie Zhou
Jinsong Su
VLM
60
3
0
20 Sep 2024
FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs
FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs
Bowen Yan
Zhengsong Zhang
Liqiang Jing
Eftekhar Hossain
Xinya Du
118
3
0
20 Sep 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
Zhecan Wang
Junzhang Liu
Chia-Wei Tang
Hani Alomari
Anushka Sivakumar
...
Haoxuan You
A. Ishmam
Kai-Wei Chang
Shih-Fu Chang
Chris Thomas
CoGeVLM
175
2
0
19 Sep 2024
LARE: Latent Augmentation using Regional Embedding with Vision-Language
  Model
LARE: Latent Augmentation using Regional Embedding with Vision-Language Model
Kosuke Sakurai
Tatsuya Ishii
Ryotaro Shimizu
Linxin Song
Masayuki Goto
VLM
76
0
0
19 Sep 2024
StoryMaker: Towards Holistic Consistent Characters in Text-to-image
  Generation
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
Zhengguang Zhou
Jing Li
Huaxia Li
Nemo Chen
Xu Tang
DiffMVGen
82
11
0
19 Sep 2024
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
  Mathematical Reasoning
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
Xiaotian Han
Yiren Jian
Xuefeng Hu
Haogeng Liu
Yiqi Wang
...
Yuang Ai
Huaibo Huang
Ran He
Zhenheng Yang
Quanzeng You
LRMAI4CE
59
22
0
19 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
152
2
0
19 Sep 2024
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation
Junjie Wen
Yinlin Zhu
Jinming Li
Minjie Zhu
Kun Wu
...
Ran Cheng
Yaxin Peng
Chaomin Shen
Feifei Feng
Jian Tang
LM&Ro
182
70
0
19 Sep 2024
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
Yongqi Wang
Xinxiao Wu
Shuo Yang
Jiebo Luo
458
1
0
19 Sep 2024
Previous
123...242526...464748
Next