Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.12528
Cited By
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
22 August 2024
Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen
Zhenheng Yang
Mike Zheng Shou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Show-o: One Single Transformer to Unify Multimodal Understanding and Generation"
28 / 128 papers shown
Title
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie
Wenqiang Zu
Mingyang Zhao
Duo Su
Shilong Liu
Ruohua Shi
Guoqi Li
Shanghang Zhang
Lei Ma
LRM
44
3
0
29 Oct 2024
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Shuhao Gu
Jialing Zhang
Siyuan Zhou
Kevin Yu
Zhaohu Xing
...
Yufeng Cui
Xinlong Wang
Yaoqi Liu
Fangxiang Feng
Guang Liu
SyDa
VLM
MLLM
30
17
0
24 Oct 2024
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Lijie Fan
Tianhong Li
Siyang Qin
Yuanzhen Li
Chen Sun
Michael Rubinstein
Deqing Sun
Kaiming He
Yonglong Tian
VLM
DiffM
48
41
0
17 Oct 2024
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Rongyao Fang
Chengqi Duan
Kun Wang
Hao Li
H. Tian
Xingyu Zeng
Rui Zhao
Jifeng Dai
Hongsheng Li
Xihui Liu
MLLM
36
11
0
17 Oct 2024
γ
−
γ-
γ
−
MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Yaxin Luo
Gen Luo
Jiayi Ji
Yiyi Zhou
Xiaoshuai Sun
Zhiqiang Shen
Rongrong Ji
VLM
MoE
37
1
0
17 Oct 2024
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Chengyue Wu
Xiaokang Chen
Z. F. Wu
Yiyang Ma
Xingchao Liu
...
Wen Liu
Zhenda Xie
Xingkai Yu
Chong Ruan
Ping Luo
AI4TS
57
74
0
17 Oct 2024
MEV Capture Through Time-Advantaged Arbitrage
Robin Fritsch
Maria Ines Silva
A. Mamageishvili
Benjamin Livshits
E. Felten
31
1
0
14 Oct 2024
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Peng Xia
Siwei Han
Shi Qiu
Yiyang Zhou
Zhaoyang Wang
...
Chenhang Cui
Mingyu Ding
Linjie Li
Lijuan Wang
Huaxiu Yao
54
10
0
14 Oct 2024
Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
Huayu Chen
Hang Su
Peize Sun
Jun Zhu
VLM
48
3
0
12 Oct 2024
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
Jiatao Gu
Yuyang Wang
Yizhe Zhang
Qihang Zhang
Dinghuai Zhang
Navdeep Jaitly
Josh Susskind
Shuangfei Zhai
DiffM
31
12
0
10 Oct 2024
ElasticTok: Adaptive Tokenization for Image and Video
Wilson Yan
Matei A. Zaharia
Volodymyr Mnih
Pieter Abbeel
Aleksandra Faust
Hao Liu
VGen
43
6
0
10 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
65
25
0
10 Oct 2024
Intriguing Properties of Large Language and Vision Models
Young-Jun Lee
ByungSoo Ko
Han-Gyu Kim
Yechan Hwang
Ho-Jin Choi
LRM
VLM
43
0
0
07 Oct 2024
ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction
Hyungjin Chung
Dohun Lee
Jong Chul Ye
VGen
DiffM
24
2
0
07 Oct 2024
CAR: Controllable Autoregressive Modeling for Visual Generation
Ziyu Yao
Jialin Li
Yifeng Zhou
Yong Liu
Xi Jiang
Chengjie Wang
Feng Zheng
Yuexian Zou
Lei Li
DiffM
37
13
0
07 Oct 2024
ControlAR: Controllable Image Generation with Autoregressive Models
Zongming Li
Tianheng Cheng
Shoufa Chen
Peize Sun
Haocheng Shen
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
DiffM
134
14
0
03 Oct 2024
ImageFolder: Autoregressive Image Generation with Folded Tokens
Xiang Li
Kai Qiu
Hao Chen
Jason Kuen
Jiuxiang Gu
Bhiksha Raj
Zhe-nan Lin
VLM
36
18
0
02 Oct 2024
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
34
153
0
27 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
60
11
0
26 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
55
7
0
23 Sep 2024
OmniGen: Unified Image Generation
Shitao Xiao
Yueze Wang
Junjie Zhou
Huaying Yuan
Xingrun Xing
Ruiran Yan
Shuting Wang
Tiejun Huang
Zheng Liu
DiffM
VLM
SyDa
57
63
0
17 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
VLM
50
51
0
06 Sep 2024
Generative Visual Instruction Tuning
Jefferson Hernandez
Ruben Villegas
Vicente Ordonez
VLM
38
3
0
17 Jun 2024
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Yuying Ge
Sijie Zhao
Jinguo Zhu
Yixiao Ge
Kun Yi
Lin Song
Chen Li
Xiaohan Ding
Ying Shan
VLM
60
107
0
22 Apr 2024
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
126
375
0
07 Nov 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
519
0
02 Jan 2023
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,781
0
24 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
278
1,082
0
17 Feb 2021
Previous
1
2
3