Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.17811
Cited By
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
29 January 2025
Xiaokang Chen
Zhiyu Wu
Xingchao Liu
Zizheng Pan
Wen Liu
Zhenda Xie
X. Yu
Chong Ruan
AI4TS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling"
50 / 88 papers shown
Title
A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
Shengyuan Liu
Boyun Zheng
Wenting Chen
Zhihao Peng
Zhenfei Yin
Jing Shao
Jiancong Hu
Yixuan Yuan
ELM
7
0
0
29 May 2025
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation
Kaijie Chen
Zihao Lin
Zhiyang Xu
Ying Shen
Yuguang Yao
Joy Rimchala
Jiaxin Zhang
Lifu Huang
EGVM
LRM
29
0
0
29 May 2025
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
Qingyu Shi
Jinbin Bai
Zhuoran Zhao
Wenhao Chai
Kaidong Yu
...
Shuangyong Song
Yunhai Tong
Xiangtai Li
X. Li
Shuicheng Yan
28
0
0
29 May 2025
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
Jihai Zhang
Tianle Li
Linjie Li
Zhengyuan Yang
Yu Cheng
16
1
0
29 May 2025
SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model
Yifan Chang
Yukang Feng
Jianwen Sun
Jiaxin Ai
Chuanhao Li
Sizhuo Zhou
Kaipeng Zhang
EGVM
9
0
0
28 May 2025
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
Xuanwen Ding
Chengjun Pan
Zejun Li
Jiwen Zhang
Siyuan Wang
Zhongyu Wei
7
0
0
27 May 2025
MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models
Hang Hua
Ziyun Zeng
Yizhi Song
Yunlong Tang
Liu He
Daniel G. Aliaga
Wei Xiong
Jiebo Luo
EGVM
13
0
0
26 May 2025
LlamaSeg: Image Segmentation via Autoregressive Mask Generation
Jiru Deng
Tengjin Weng
Tianyu Yang
Wenhan Luo
Zhiheng Li
Wenhao Jiang
VLM
83
0
0
26 May 2025
StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
Yi Wu
Lingting Zhu
Shengju Qian
Lei Liu
Wandi Qiao
Lequan Yu
Bin Li
12
0
0
26 May 2025
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang
Yao Lai
Aoxue Li
Shifeng Zhang
Jiacheng Sun
Ning Kang
Chengyue Wu
Zhenguo Li
Ping Luo
18
0
0
26 May 2025
Jodi: Unification of Visual Generation and Understanding via Joint Modeling
Yifeng Xu
Zhenliang He
Meina Kan
Shiguang Shan
Xilin Chen
VLM
29
0
0
25 May 2025
InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts
Minzhi Lin
Tianchi Xie
Mengchen Liu
Yilin Ye
C. L. Philip Chen
Shixia Liu
20
0
0
25 May 2025
OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks
Jiayu Wang
Yang Jiao
Yue Yu
Tianwen Qian
Shaoxiang Chen
Jingjing Chen
Yu Jiang
MLLM
LM&MA
ELM
32
0
0
24 May 2025
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback
Litao Guo
Xinli Xu
Luozhou Wang
Jiantao Lin
Jinsong Zhou
Zixin Zhang
Bolan Su
Ying-Cong Chen
LLMAG
LRM
39
0
0
23 May 2025
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang
Chongjie Si
Jun Luo
Hanwang Zhang
Chao Ma
101
0
0
23 May 2025
Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution
Jiawei Du
Jinlong Wu
Yuzheng Chen
Yucheng Hu
Bing Li
Joey Tianyi Zhou
93
0
0
23 May 2025
Conditional Panoramic Image Generation via Masked Autoregressive Modeling
Chaoyang Wang
Xiangtai Li
Lu Qi
X. Lin
Jinbin Bai
Qianyu Zhou
Yunhai Tong
DiffM
45
1
0
22 May 2025
FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design
Renjie Wei
Songqiang Xu
Qingyu Guo
Meng Li
MQ
28
0
0
22 May 2025
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong
Ziyu Guo
Renrui Zhang
Wenyu Shan
Xinyu Wei
Zhenghao Xing
Hongsheng Li
Pheng-Ann Heng
EGVM
OffRL
LRM
50
0
0
22 May 2025
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
Zebin You
Shen Nie
Xiaolu Zhang
Jun Hu
Jun Zhou
Zhiwu Lu
J. Wen
Chongxuan Li
MLLM
VLM
51
0
0
22 May 2025
Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
Michal Golovanevsky
William Rudman
Michael Lepori
Amir Bar
Ritambhara Singh
Carsten Eickhoff
40
0
0
21 May 2025
MMaDA: Multimodal Large Diffusion Language Models
Ling Yang
Ye Tian
Bowen Li
Xinchen Zhang
Ke Shen
Yunhai Tong
Mengdi Wang
VLM
LRM
86
2
0
21 May 2025
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models
Woody Haosheng Gan
Deqing Fu
Julian Asilis
Ollie Liu
Dani Yogatama
Vatsal Sharan
Robin Jia
Willie Neiswanger
LLMSV
46
0
0
20 May 2025
VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation
Huawei Lin
Tong Geng
Zhaozhuo Xu
Weijie Zhao
VLM
93
1
0
19 May 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao
Lin Song
Yukang Chen
Yingmin Luo
Yuxin Chen
Yukang Gan
Wei Huang
Xiu Li
Xiaojuan Qi
Ying Shan
LRM
30
2
0
19 May 2025
Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?
Zihao Dongfang
Xu Zheng
Ziqiao Weng
Yuanhuiyi Lyu
Danda Pani Paudel
Luc Van Gool
Kailun Yang
Xuming Hu
LRM
43
0
0
17 May 2025
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?
An-Lan Wang
Jingqun Tang
Liao Lei
Hao Feng
Qi Liu
...
Wen Liu
Hao Liu
Yang Liu
Xiang Bai
Can Huang
83
1
0
16 May 2025
Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning?
Md Tahmid Rahman Laskar
Mohammed Saidul Islam
Ridwan Mahbub
Ahmed Masry
Mizanur Rahman
Amran Bhuiyan
Mir Tafseer Nayeem
Shafiq Joty
Enamul Hoque
Jimmy Xiangji Huang
ELM
44
0
0
13 May 2025
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu
Gongye Liu
Jiajun Liang
Yongqian Li
Jiaheng Liu
Xinyu Wang
Pengfei Wan
Di Zhang
Wanli Ouyang
AI4CE
110
2
0
08 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
94
1
0
08 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
161
0
0
05 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
111
1
0
05 May 2025
Improving Physical Object State Representation in Text-to-Image Generative Systems
Tianle Chen
Chaitanya Chakka
Deepti Ghadiyaram
52
0
0
04 May 2025
WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation
D. Zhang
Che Jiang
Ruoshi Xu
Biaoxiang Chen
Zijian Jin
Yutian Lu
Jianguo Zhang
Liang Yong
Jiebo Luo
Shengda Luo
VLM
62
0
0
02 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
Haoyang Li
LRM
103
14
0
01 May 2025
Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing
Hong Zhang
Zhongjie Duan
Xingjun Wang
Yuze Zhao
Weiyi Lu
Zhipeng Di
Yongjun Xu
Yingda Chen
Yu Zhang
MLLM
124
3
0
30 Apr 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo
Thao Nguyen
Xun Huang
Siddharth Srinivasan Iyer
Yijun Li
...
Eli Shechtman
Krishna Kumar Singh
Yong Jae Lee
Bolei Zhou
Yuheng Li
102
0
0
29 Apr 2025
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang
Wenliang Zheng
Aashrith Madasu
Peng Shi
Ryo Kamoi
...
Ranran Haoran Zhang
Avitej Iyer
Renze Lou
Wenpeng Yin
Rui Zhang
169
0
0
25 Apr 2025
Unveiling the Lack of LVLM Robustness to Fundamental Visual Variations: Why and Path Forward
Zhiyuan Fan
Yumeng Wang
Sandeep Polisetty
Yi R. Fung
70
0
0
23 Apr 2025
AGI Is Coming... Right After AI Learns to Play Wordle
Sarath Shekkizhar
Romain Cosentino
LLMAG
63
0
0
21 Apr 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
67
4
0
20 Apr 2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu
Jinjie Ni
Zijian Wu
Chao Du
Longxu Dou
Haoran Wang
Tianyu Pang
Michael Shieh
OffRL
LRM
338
7
0
17 Apr 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
Junke Wang
Zhi Tian
Xinyu Wang
Xinyu Zhang
Weilin Huang
Zuxuan Wu
Yu Jiang
VGen
116
13
0
15 Apr 2025
GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions
Jo-Ku Cheng
Zeren Zhang
Ran Chen
Jingyang Deng
Ziran Qin
Jinwen Ma
49
0
0
14 Apr 2025
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing
Taihang Hu
Linxuan Li
Kai Wang
Yaxing Wang
Jian Yang
Ming-Ming Cheng
DiffM
VGen
35
0
0
14 Apr 2025
Don't Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs
Pengkun Jiao
Bin Zhu
Jingjing Chen
Chong-Wah Ngo
Yu Jiang
47
0
0
13 Apr 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong
Jun Hao Liew
Zilong Huang
Jiashi Feng
Xihui Liu
50
0
0
11 Apr 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
Jifang Wang
Xue Yang
Longyue Wang
Zhenran Xu
Yansen Wang
Yaowei Wang
Weihua Luo
Kaifu Zhang
Baotian Hu
Min Zhang
EGVM
DiffM
82
1
0
09 Apr 2025
SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets
Yuhang Yang
Fengqi Liu
Yixing Lu
Qin Zhao
Pingyu Wu
...
Ran Yi
Yang Cao
Lizhuang Ma
Zheng-jun Zha
Junting Dong
3DGS
56
0
0
09 Apr 2025
Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition
Sergio Romero-Tapiador
Ruben Tolosana
Blanca Lacruz-Pleguezuelos
L. Marcos-Zambrano
Guadalupe X.Bazán
Isabel Espinosa-Salinas
Julian Fierrez
Javier-Ortega Garcia
Enrique Carrillo-de Santa Pau
Aythami Morales
CoGe
42
0
0
09 Apr 2025
1
2
Next