Title |
---|
![]() A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive
Transformer for Efficient Finegrained Image Generation Liang Chen Sinan Tan Zefan Cai Weichu Xie Haozhe Zhao Yichi Zhang Junyang Lin Jinze Bai Tianyu Liu Baobao Chang |
![]() Emu3: Next-Token Prediction is All You Need Xinlong Wang Xiaosong Zhang Zhengxiong Luo Quan-Sen Sun Yufeng Cui ...Xi Yang Jingjing Liu Yonghua Lin Tiejun Huang Zhongyuan Wang |
![]() MIO: A Foundation Model on Multimodal Tokens Zekun Wang King Zhu Chunpu Xu Wangchunshu Zhou Jiaheng Liu ...Yuanxing Zhang Ge Zhang Ke Xu Jie Fu Wenhao Huang |
![]() GP-GPT: Large Language Model for Gene-Phenotype Mapping Yanjun Lyu Zihao Wu Lu Zhang Jing Zhang Yiwei Li ...Rongjie Liu Chao Huang Wentao Li Tianming Liu Dajiang Zhu |
![]() Show-o: One Single Transformer to Unify Multimodal Understanding and
Generation Jinheng Xie Weijia Mao Zechen Bai David Junhao Zhang Weihao Wang Kevin Qinghong Lin Yuchao Gu Zhijie Chen Zhenheng Yang Mike Zheng Shou |