Title |
---|
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation Xinlong Chen Yuanxing Zhang Chongling Rao Yushuo Guan Jiaheng Liu Fuzheng Zhang Chengru Song Qiang Liu Di Zhang Tieniu Tan |
![]() Vision Language Models See What You Want but not What You See Qingying Gao Yijiang Li Haiyun Lyu Haoran Sun Dezhi Luo Hokin Deng |
![]() Probing Mechanical Reasoning in Large Vision Language Models Haoran Sun Qingying Gao Haiyun Lyu Dezhi Luo Yijiang Li Hokin Deng |
![]() CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Zhuoyi Yang Jiayan Teng Wendi Zheng Ming Ding Shiyu Huang ...Weihan Wang Yean Cheng Xiaotao Gu Yuxiao Dong Jie Tang |
![]() LVBench: An Extreme Long Video Understanding Benchmark Weihan Wang Zehai He Wenyi Hong Yean Cheng Xiaohan Zhang ...Shiyu Huang Bin Xu Yuxiao Dong Ming Ding Jie Tang |