Title |
---|
![]() What If We Recaption Billions of Web Images with LLaMA-3? Xianhang Li Haoqin Tu Mude Hui Zeyu Wang Bingchen Zhao ...Jieru Mei Qing Liu Huangjie Zheng Yuyin Zhou Cihang Xie |
![]() VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via
Monotonic Alignment Bing Han Long Zhou Shujie Liu Sanyuan Chen Lingwei Meng Yanming Qian Yanqing Liu Sheng Zhao Jinyu Li Furu Wei |
![]() DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception Run Luo Yunshui Li Longze Chen Wanwei He Ting-En Lin ...Zikai Song Xiaobo Xia Tongliang Liu Min Yang Binyuan Hui |
![]() AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment Chunyi Li Tengchuan Kou Yixuan Gao Yuhang Cao Wei Sun ...Weixia Zhang Haoning Wu Xiaohong Liu Xiongkuo Min Guangtao Zhai |