Title |
---|
![]() A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive
Transformer for Efficient Finegrained Image Generation Liang Chen Sinan Tan Zefan Cai Weichu Xie Haozhe Zhao Yichi Zhang Junyang Lin Jinze Bai Tianyu Liu Baobao Chang |
![]() PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Weifeng Lin Xinyu Wei Renrui Zhang Le Zhuo Shitian Zhao ...Junlin Xie Junlin Xie Yu Qiao Peng Gao Hongsheng Li |
![]() Investigating Neural Audio Codecs for Speech Language Model-Based Speech
Generation Jiaqi Li Dongmei Wang Xiaofei Wang Yao Qian Long Zhou ...Junkun Chen Sheng Zhao Jinyu Li Zhizheng Wu Michael Zeng |
![]() Show-o: One Single Transformer to Unify Multimodal Understanding and
Generation Jinheng Xie Weijia Mao Zechen Bai David Junhao Zhang Weihao Wang Kevin Qinghong Lin Yuchao Gu Zhijie Chen Zhenheng Yang Mike Zheng Shou |