Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.12597
Cited By
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
30 January 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models"
50 / 833 papers shown
Title
Connecting Dreams with Visual Brainstorming Instruction
Yasheng Sun
Bohan Li
Mingchen Zhuge
Deng-Ping Fan
Salman Khan
Fahad Shahbaz Khan
Hideki Koike
DiffM
42
0
0
14 Aug 2024
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
Sayna Ebrahimi
Sercan Ö. Arik
Tejas Nama
Tomas Pfister
44
1
0
13 Aug 2024
D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods
Onkar Susladkar
Gayatri S Deshmukh
Sparsh Mittal
Parth Shastri
DiffM
44
3
0
07 Aug 2024
AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging
Senkang Hu
Zhengru Fang
Zihan Fang
Yiqin Deng
Xianhao Chen
Yuguang Fang
Sam Kwong
54
14
0
07 Aug 2024
Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey
V. T. Truong
Luan Ba Dang
Long Bao Le
DiffM
MedIm
56
16
0
06 Aug 2024
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning
Xin Wang
Kai-xiang Chen
Xingjun Ma
Zhineng Chen
Jingjing Chen
Yu-Gang Jiang
AAML
46
4
0
04 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLM
MLLM
66
19
0
04 Aug 2024
WAS: Dataset and Methods for Artistic Text Segmentation
Xudong Xie
Yuzhe Li
Yang Liu
Zhifei Zhang
Zhaowen Wang
Wei Xiong
Xiang Bai
DiffM
52
2
0
31 Jul 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
49
5
0
31 Jul 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
54
7
0
31 Jul 2024
Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
Danfeng Guo
Sumitaka Honji
LRM
82
0
0
31 Jul 2024
OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance
Yongqiang Yao
Jingru Tan
Jiahao Hu
Feizhao Zhang
Xin Jin
...
Ruihao Gong
Pengfei Liu
Pengfei Liu
Dahua Lin
Ningyi Xu
VLM
52
1
0
30 Jul 2024
Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing
Ekaterina Iakovleva
Fabio Pizzati
Philip Torr
Stéphane Lathuiliere
DiffM
34
0
0
29 Jul 2024
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Pengfei Chen
Lingxi Xie
Xinyue Huo
Xuehui Yu
Xiaopeng Zhang
Yingfei Sun
Zhenjun Han
Qi Tian
VLM
68
1
0
23 Jul 2024
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Mingze Xu
Mingfei Gao
Zhe Gan
Hong-You Chen
Zhengfeng Lai
Haiming Gang
Kai Kang
Afshin Dehghan
62
49
0
22 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
45
9
0
22 Jul 2024
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
36
5
0
18 Jul 2024
ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
VOS
LRM
77
2
0
18 Jul 2024
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
Huiguo He
Huan Yang
Zixi Tuo
Yuan Zhou
Qiuyue Wang
Yuhang Zhang
Zeyu Liu
Wenhao Huang
Hongyang Chao
Jian Yin
DiffM
VGen
62
12
0
17 Jul 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
33
3
0
17 Jul 2024
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju
Haicheng Wang
Haozhe Cheng
Xu Chen
Zhonghua Zhai
Weilin Huang
Jinsong Lan
Shuai Xiao
Bo Zheng
VLM
49
5
0
16 Jul 2024
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao
Chaoyi Zhang
Hang Su
Hwanjun Song
Igor Shalyminov
Weidong Cai
39
1
0
16 Jul 2024
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Jinrui Zhang
Teng Wang
Haigang Zhang
Ping Lu
Feng Zheng
MLLM
LRM
VLM
39
3
0
16 Jul 2024
FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries
Yuqi Jiang
Xudong Lu
Qian Jin
Qi Sun
Hanming Wu
Cheng Zhuo
48
5
0
15 Jul 2024
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
Lecheng Kong
Jiarui Feng
Hao Liu
Chengsong Huang
Jiaxin Huang
Yixin Chen
Muhan Zhang
AI4CE
77
6
0
12 Jul 2024
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Zijie Yue
Miaojing Shi
Hanli Wang
Shuai Ding
Qijun Chen
Shanlin Yang
45
0
0
11 Jul 2024
Robotic Control via Embodied Chain-of-Thought Reasoning
Michał Zawalski
William Chen
Karl Pertsch
Oier Mees
Chelsea Finn
Sergey Levine
LRM
LM&Ro
39
56
0
11 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
66
4
0
09 Jul 2024
Vision-Language Models under Cultural and Inclusive Considerations
Antonia Karamolegkou
Phillip Rust
Yong Cao
Ruixiang Cui
Anders Søgaard
Daniel Hershcovich
VLM
53
7
0
08 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
47
52
0
30 Jun 2024
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
Yisong Xiao
Aishan Liu
QianJia Cheng
Zhenfei Yin
Siyuan Liang
Jiapeng Li
Jing Shao
Xianglong Liu
Dacheng Tao
51
4
0
30 Jun 2024
Urban Visual Appeal According to ChatGPT: Contrasting AI and Human Insights
M. Malekzadeh
Elias S Willberg
Jussi Torkko
T. Toivonen
43
1
0
29 Jun 2024
GM-DF: Generalized Multi-Scenario Deepfake Detection
Yingxin Lai
Zitong Yu
Jing Yang
Bin Li
Xiangui Kang
Linlin Shen
40
8
0
28 Jun 2024
Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction
Akash Awasthi
Ngan Le
Zhigang Deng
Rishi Agrawal
Carol C. Wu
Hien Van Nguyen
MedIm
23
0
0
28 Jun 2024
PathAlign: A vision-language model for whole slide images in histopathology
Faruk Ahmed
Andrew Sellergren
Lin Yang
Shawn Xu
Boris Babenko
...
S. Shetty
Daniel Golden
Yun-Hui Liu
David F. Steiner
Ellery Wulczyn
LM&MA
VLM
36
15
0
27 Jun 2024
Curriculum Learning with Quality-Driven Data Selection
Biao Wu
Fang Meng
Ling-Hao Chen
36
2
0
27 Jun 2024
Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation
H. Kerdegari
Kyle Higgins
Dennis Veselkov
I. Laponogov
I. Poļaka
...
Junior Andrea Pescino
M. Leja
M. Dinis-Ribeiro
T. F. Kanonnikoff
Kirill Veselkov
35
3
0
26 Jun 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang
Shixin Jiang
Zekun Wang
Haojie Pan
Zerui Chen
Zheng Chu
Ming Liu
Ruiji Fu
Zhongyuan Wang
Bing Qin
29
2
0
26 Jun 2024
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Xiangyu Zhao
Xiangtai Li
Haodong Duan
Haian Huang
Yining Li
Kai Chen
Hua Yang
VLM
MLLM
45
10
0
25 Jun 2024
DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation
Ahmad Mohammadshirazi
Ali Nosrati Firoozsalari
Mengxi Zhou
Dheeraj Kulshrestha
R. Ramnath
38
0
0
25 Jun 2024
DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution
Aiwen Jiang
Zhi Wei
Long Peng
Feiqiang Liu
Wenbo Li
Mingwen Wang
DiffM
54
2
0
24 Jun 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
77
31
0
24 Jun 2024
LLM2FEA: Discover Novel Designs with Generative Evolutionary Multitasking
Melvin Wong
Jiao Liu
Thiago Rios
Stefan Menzel
Yew-Soon Ong
57
2
0
21 Jun 2024
Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation
Eyal Michaeli
Ohad Fried
62
1
0
20 Jun 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLM
VLM
43
23
0
18 Jun 2024
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Yunxin Li
Xinyu Chen
Baotian Hu
Longyue Wang
Haoyuan Shi
Min-Ling Zhang
MLLM
LRM
56
26
0
17 Jun 2024
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Han-Hung Lee
Yiming Zhang
Angel X. Chang
3DPC
48
3
0
17 Jun 2024
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
Robert Honig
Javier Rando
Nicholas Carlini
Florian Tramèr
WIGM
AAML
55
16
0
17 Jun 2024
What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?
Weijie Tu
Weijian Deng
Liang Zheng
Tom Gedeon
40
0
0
14 Jun 2024
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad A Khan
VLM
MLLM
40
51
0
13 Jun 2024
Previous
1
2
3
...
8
9
10
...
15
16
17
Next