Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16990
Cited By
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
22 May 2025
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding"
20 / 20 papers shown
Title
Joint Vision-Language Social Bias Removal for CLIP
Haoyu Zhang
Yangyang Guo
Mohan S. Kankanhalli
VLM
122
1
0
19 Nov 2024
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Shansan Gong
Shivam Agarwal
Yizhe Zhang
Jiacheng Ye
Lin Zheng
...
Peilin Zhao
W. Bi
Jiawei Han
Hao Peng
Dianbo Sui
AI4CE
102
24
0
23 Oct 2024
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Jiacheng Ye
Jiahui Gao
Shansan Gong
Lin Zheng
Xin Jiang
Zhiyu Li
Dianbo Sui
DiffM
LRM
117
20
0
18 Oct 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
91
578
0
25 Apr 2024
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
153
833
0
27 Nov 2023
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
Aaron Lou
Chenlin Meng
Stefano Ermon
DiffM
67
91
0
25 Oct 2023
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
96
2,593
0
05 Oct 2023
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu
Hritik Bansal
Tony Xia
Jiacheng Liu
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRM
MLLM
79
541
0
03 Oct 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
81
871
0
24 Aug 2023
MMBench: Is Your Multi-modal Model an All-around Player?
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
...
Jiaqi Wang
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
74
992
0
12 Jul 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Yunhang Shen
Yulei Qin
Mengdan Zhang
...
Xiawu Zheng
Ke Li
Xing Sun
Zhenyu Qiu
Rongrong Ji
ELM
MLLM
69
806
0
23 Jun 2023
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
230
748
0
17 May 2023
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
370
4,607
0
17 Apr 2023
A Reparameterized Discrete Diffusion Model for Text Generation
Lin Zheng
Jianbo Yuan
Lei Yu
Lingpeng Kong
DiffM
76
65
0
11 Feb 2023
SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers
Hongyi Yuan
Zheng Yuan
Chuanqi Tan
Fei Huang
Songfang Huang
DiffM
69
68
0
20 Dec 2022
CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning
Shi-You Xu
VLM
DiffM
63
12
0
10 Oct 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
232
1,188
0
20 Sep 2022
Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions
Emiel Hoogeboom
Didrik Nielsen
P. Jaini
Patrick Forré
Max Welling
DiffM
273
414
0
10 Feb 2021
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
57
1,174
0
18 Apr 2019
A Diagram Is Worth A Dozen Images
Aniruddha Kembhavi
M. Salvato
Eric Kolve
Minjoon Seo
Hannaneh Hajishirzi
Ali Farhadi
3DV
47
472
0
24 Mar 2016
1