Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04362
Cited By
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
7 June 2023
Haiyang Xu
Qinghao Ye
Xuan-Wei Wu
Mingshi Yan
Yuan Miao
Jiabo Ye
Guohai Xu
Anwen Hu
Yaya Shi
Guangwei Xu
Chenliang Li
Qingfang Qian
Maofei Que
Ji Zhang
Xiaoyan Zeng
Feiyan Huang
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks"
24 / 24 papers shown
Title
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
C. Kim
Jihwan Moon
Sangwoo Moon
Heeseung Yun
Sihaeng Lee
Aniruddha Kembhavi
Soonyoung Lee
Gunhee Kim
Sangho Lee
Christopher Clark
36
0
0
21 Apr 2025
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
Yangliu Hu
Zikai Song
Na Feng
Yawei Luo
Junqing Yu
Yi-Ping Phoebe Chen
Wei Yang
33
0
0
10 Apr 2025
Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Luozheng Qin
Zhiyu Tan
Mengping Yang
Xiaomeng Yang
Hao Li
90
0
0
12 Mar 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
92
3
0
26 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
105
5
0
12 Feb 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou
Peipei Li
Zekun Li
Huaibo Huang
Xing Cui
Xuannan Liu
Chenghanyu Zhang
Ran He
DeLMO
132
3
0
07 Feb 2025
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering
Ting Yu
Kunhao Fu
Shuhui Wang
Qingming Huang
Jun Yu
52
0
0
12 Oct 2024
ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval
Ruixiang Zhao
Jian Jia
Yan Li
Xuehan Bai
Quan Chen
Han Li
Peng Jiang
Xirong Li
46
0
0
06 Aug 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
65
4
0
11 Jul 2024
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Haoxing Chen
Yan Hong
Zizheng Huang
Zhuoer Xu
Zhangxuan Gu
...
Jun Lan
Huijia Zhu
Jianfu Zhang
Weiqiang Wang
Huaxiong Li
Mamba
86
16
0
30 May 2024
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
60
37
0
26 May 2024
From Sora What We Can See: A Survey of Text-to-Video Generation
Rui Sun
Yumin Zhang
Tejal Shah
Jiahao Sun
Shuoying Zhang
Wenqi Li
Haoran Duan
Bo Wei
R. Ranjan
EGVM
79
20
0
17 May 2024
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
Joseph Cho
Fachrina Dewi Puspitasari
Sheng Zheng
Jingyao Zheng
Lik-Hang Lee
Tae-Ho Kim
Choong Seon Hong
Chaoning Zhang
EGVM
VGen
44
41
0
08 Mar 2024
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval
Xingning Dong
Zipeng Feng
Chunluan Zhou
Xuzheng Yu
Ming Yang
Qingpei Guo
VLM
41
2
0
31 Jan 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Eric Wang
Xin Li
Luisa Verdoliva
Shu Hu
88
58
0
22 Jan 2024
CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios
Xiangshuo Qiao
Xianxin Li
Xiaozhe Qu
Jie M. Zhang
Yang Liu
Yu Luo
Cihang Jin
Jin Ma
VLM
35
0
0
19 Jan 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
70
84
0
29 Dec 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Keli Zhang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
45
118
0
16 Oct 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
212
910
0
27 Apr 2023
Learning Trajectory-Word Alignments for Video-Language Tasks
Xu Yang
Zhang Li
Haiyang Xu
Hanwang Zhang
Qinghao Ye
Chenliang Li
Ming Yan
Yu Zhang
Fei Huang
Songfang Huang
41
7
0
05 Jan 2023
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
Qinghao Ye
Guohai Xu
Ming Yan
Haiyang Xu
Qi Qian
Ji Zhang
Fei Huang
VLM
AI4TS
188
70
0
30 Dec 2022
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
187
690
0
22 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,992
0
09 Feb 2021
Coherent Hierarchical Multi-Label Classification Networks
Eleonora Giunchiglia
Thomas Lukasiewicz
AILaw
45
96
0
20 Oct 2020
1