Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.09818
Cited By
Chameleon: Mixed-Modal Early-Fusion Foundation Models
16 May 2024
Chameleon Team
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Chameleon: Mixed-Modal Early-Fusion Foundation Models"
50 / 262 papers shown
Title
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
Qingyu Shi
Jinbin Bai
Zhuoran Zhao
Wenhao Chai
Kaidong Yu
...
Shuangyong Song
Yunhai Tong
Xiangtai Li
X. Li
Shuicheng Yan
45
0
0
29 May 2025
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Danny Driess
Jost Tobias Springenberg
Brian Ichter
Lili Yu
Adrian Li-Bell
...
Allen Z. Ren
Homer Walke
Quan Vuong
Lucy Xiaoyang Shi
Sergey Levine
27
0
0
29 May 2025
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
Jihai Zhang
Tianle Li
Linjie Li
Zhengyuan Yang
Yu Cheng
29
1
0
29 May 2025
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Yunze Man
De-An Huang
Guilin Liu
Shiwei Sheng
Shilong Liu
Liang-Yan Gui
Jan Kautz
Yu Wang
Zhiding Yu
MLLM
LRM
31
0
0
29 May 2025
Thinking with Generated Images
Ethan Chern
Zhulin Hu
Steffi Chern
Siqi Kou
Jiadi Su
Yan Ma
Zhijie Deng
Pengfei Liu
LRM
27
0
0
28 May 2025
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang
Yao Lai
Aoxue Li
Shifeng Zhang
Jiacheng Sun
Ning Kang
Chengyue Wu
Zhenguo Li
Ping Luo
34
0
0
26 May 2025
Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots
Guangting Zheng
Yehao Li
Yingwei Pan
Jiajun Deng
Ting Yao
Yanyong Zhang
Tao Mei
DiffM
17
0
0
26 May 2025
StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
Yi Wu
Lingting Zhu
Shengju Qian
Lei Liu
Wandi Qiao
Lequan Yu
Bin Li
36
0
0
26 May 2025
Jodi: Unification of Visual Generation and Understanding via Joint Modeling
Yifeng Xu
Zhenliang He
Meina Kan
Shiguang Shan
Xilin Chen
VLM
46
0
0
25 May 2025
MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection
Shuyu Wang
Weiqi Li
Qian Wang
Shijie Zhao
Jian Zhang
DiffM
31
0
0
25 May 2025
STRICT: Stress Test of Rendering Images Containing Text
Tianyu Zhang
Xinyu Wang
Zhenghan Tai
Lu Li
Jijun Chi
Jingrui Tian
Hailin He
Suyuchen Wang
28
0
0
25 May 2025
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
Jiwan Chung
Junhyeok Kim
Siyeol Kim
Jaeyoung Lee
Min Soo Kim
Youngjae Yu
LRM
41
0
0
24 May 2025
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
Shuang Zeng
Xinyuan Chang
Mengwei Xie
Xinran Liu
Yifan Bai
Zheng Pan
Mu Xu
Xing Wei
LRM
80
0
0
23 May 2025
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback
Litao Guo
Xinli Xu
Luozhou Wang
Jiantao Lin
Jinsong Zhou
Zixin Zhang
Bolan Su
Ying-Cong Chen
LLMAG
LRM
53
0
0
23 May 2025
ChemMLLM: Chemical Multimodal Large Language Model
Qian Tan
Dongzhan Zhou
Peng Xia
Wanhao Liu
Wanli Ouyang
Lei Bai
Yuqiang Li
Tianfan Fu
MLLM
31
0
0
22 May 2025
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
Zebin You
Shen Nie
Xiaolu Zhang
Jun Hu
Jun Zhou
Zhiwu Lu
J. Wen
Chongxuan Li
MLLM
VLM
64
0
0
22 May 2025
T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning
Amartya Chakraborty
Paresh Dashore
Nadia Bathaee
Anmol Jain
Anirban Das
Shi-Xiong Zhang
Sambit Sahu
Milind Naphade
Genta Indra Winata
LLMAG
34
0
0
22 May 2025
MMaDA: Multimodal Large Diffusion Language Models
Ling Yang
Ye Tian
Bowen Li
Xinchen Zhang
Ke Shen
Yunhai Tong
Mengdi Wang
VLM
LRM
91
2
0
21 May 2025
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models
Woody Haosheng Gan
Deqing Fu
Julian Asilis
Ollie Liu
Dani Yogatama
Vatsal Sharan
Robin Jia
Willie Neiswanger
LLMSV
54
0
0
20 May 2025
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Xuecheng Wu
Jiaxing Liu
Danlei Huang
Xiaoyu Li
Yifan Wang
Chen Chen
Liya Ma
Xuezhi Cao
Junxiao Xue
LRM
59
0
0
20 May 2025
VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation
Huawei Lin
Tong Geng
Zhaozhuo Xu
Weijie Zhao
VLM
113
1
0
19 May 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao
Lin Song
Yukang Chen
Yingmin Luo
Yuxin Chen
Yukang Gan
Wei Huang
Xiu Li
Xiaojuan Qi
Ying Shan
LRM
45
2
0
19 May 2025
Video-GPT via Next Clip Diffusion
Shaobin Zhuang
Zhipeng Huang
Ying Zhang
Fangyikang Wang
Canmiao Fu
Binxin Yang
Chong Sun
Chen Li
Yali Wang
DiffM
VGen
146
0
0
18 May 2025
Context-Aware Autoregressive Models for Multi-Conditional Image Generation
Yixiao Chen
Zhiyuan Ma
Guoli Jia
Che Jiang
Jianjun Li
Bowen Zhou
DiffM
48
0
0
18 May 2025
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
45
0
0
11 May 2025
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung
Jiachen Liu
Jeff J. Ma
Ruofan Wu
Oh Jun Kweon
Yuxuan Xia
Zhiyu Wu
Mosharaf Chowdhury
53
0
0
09 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
99
1
0
08 May 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
Jianfei Chen
Fan Yang
Zheng Zhang
Yan Li
Liang Wang
OffRL
LRM
66
4
0
05 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
118
1
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
187
0
0
05 May 2025
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Cunxin Fan
Xiaosong Jia
Yihang Sun
Yixiao Wang
Jianglan Wei
...
Xiangyu Zhao
Masayoshi Tomizuka
Xue Yang
Junchi Yan
Mingyu Ding
LM&Ro
VLM
76
4
0
04 May 2025
A Survey of Interactive Generative Video
Jiwen Yu
Yiran Qin
Haoxuan Che
Quande Liu
Xinyu Wang
Pengfei Wan
Di Zhang
Kun Gai
Hao Chen
Xihui Liu
VGen
90
0
0
30 Apr 2025
YoChameleon: Personalized Vision and Language Generation
Thao Nguyen
Krishna Kumar Singh
Jing Shi
Trung H. Bui
Yong Jae Lee
Yuheng Li
MLLM
124
1
0
29 Apr 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo
Thao Nguyen
Xun Huang
Siddharth Srinivasan Iyer
Yijun Li
...
Eli Shechtman
Krishna Kumar Singh
Yong Jae Lee
Bolei Zhou
Yuheng Li
110
0
0
29 Apr 2025
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
102
1
0
28 Apr 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
104
2
0
24 Apr 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
72
4
0
20 Apr 2025
Personalized Text-to-Image Generation with Auto-Regressive Models
Kaiyue Sun
Xian Liu
Yao Teng
Xihui Liu
67
1
0
17 Apr 2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu
Jinjie Ni
Zijian Wu
Chao Du
Longxu Dou
Haoran Wang
Tianyu Pang
Michael Shieh
OffRL
LRM
349
7
0
17 Apr 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
Junke Wang
Zhi Tian
Xinyu Wang
Xinyu Zhang
Weilin Huang
Zuxuan Wu
Yu Jiang
VGen
120
13
0
15 Apr 2025
GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions
Jo-Ku Cheng
Zeren Zhang
Ran Chen
Jingyang Deng
Ziran Qin
Jinwen Ma
54
0
0
14 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
Xuelong Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
52
3
0
14 Apr 2025
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Hadi Pouransari
VLM
388
0
0
11 Apr 2025
OmniCaptioner: One Captioner to Rule Them All
Yiting Lu
Jiakang Yuan
Zhen Li
Jike Zhong
Qi Qin
...
Lei Bai
Zhibo Chen
Peng Gao
Bo Zhang
Peng Gao
MLLM
97
1
0
09 Apr 2025
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
Wei Chen
Xin Yan
Bin Wen
Fan Yang
Yan Li
Di Zhang
Long Chen
MLLM
132
0
0
09 Apr 2025
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Ning Li
Jingran Zhang
Justin Cui
MLLM
126
2
0
09 Apr 2025
Transfer between Modalities with MetaQueries
Xichen Pan
Satya Narayan Shukla
Aashu Singh
Zhuokai Zhao
Shlok Kumar Mishra
...
Jiuhai Chen
Kunpeng Li
F. Xu
Ji Hou
Saining Xie
DiffM
67
12
0
08 Apr 2025
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
Wulin Xie
Yize Zhang
Chaoyou Fu
Yang Shi
Bingyan Nie
Hongkai Chen
Zheng Zhang
Liang Wang
Tieniu Tan
65
2
0
04 Apr 2025
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
Xiangyu Zhao
Peiyuan Zhang
Kexian Tang
Hao Li
Zicheng Zhang
...
Guangtao Zhai
Junchi Yan
Hua Yang
Xue Yang
Haodong Duan
VLM
LRM
105
5
0
03 Apr 2025
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
Zhiyuan Yan
Junyan Ye
Weijia Li
Zilong Huang
Shenghai Yuan
Xiangyang He
Kaiqing Lin
Jun-Jian He
Conghui He
Li Yuan
MLLM
EGVM
117
16
0
03 Apr 2025
1
2
3
4
5
6
Next