Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.19535
Cited By
TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs
26 May 2025
Juntong Wang
Jiarui Wang
Huiyu Duan
Guangtao Zhai
Xiongkuo Min
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs"
47 / 47 papers shown
Title
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs
Jiarui Wang
Huiyu Duan
Yu Zhao
Juntong Wang
Guangtao Zhai
Xiongkuo Min
MLLM
EGVM
LM&MA
91
3
0
11 Apr 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
380
1,970
0
22 Jan 2025
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Liao Qu
Huichao Zhang
Yiheng Liu
Xinyu Wang
Yi Jiang
Yiming Gao
Hu Ye
Daniel K. Du
Zehuan Yuan
Xinglong Wu
136
39
0
04 Dec 2024
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
Jiarui Wang
Huiyu Duan
Guangtao Zhai
Juntong Wang
Xiongkuo Min
EGVM
90
8
0
26 Nov 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
77
138
0
09 Aug 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Feng Li
Renrui Zhang
Hao Zhang
Yuanhan Zhang
Bo Li
Wei Li
Zejun Ma
Chunyuan Li
MLLM
VLM
105
233
0
10 Jul 2024
Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices
Nathaniel Cohen
Vladimir Kulikov
Matan Kleiner
Inbar Huberman-Spiegelglas
T. Michaeli
VGen
DiffM
42
17
0
20 May 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
EGVM
113
171
0
01 Apr 2024
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
Shuai Yang
Yifan Zhou
Ziwei Liu
Chen Change Loy
VGen
DiffM
116
32
0
19 Mar 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
254
1,210
0
21 Dec 2023
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
Ozgur Kara
Barışcan Kurtkaya
Hidir Yesiltepe
James M. Rehg
Pinar Yanardag
VGen
DiffM
68
54
0
07 Dec 2023
CVPR 2023 Text Guided Video Editing Competition
Jay Zhangjie Wu
Xiuyu Li
Difei Gao
Zhen Dong
Jinbin Bai
...
Xu Cheng
Jie Tang
Mike Zheng Shou
Kurt Keutzer
Forrest N. Iandola
64
35
0
24 Oct 2023
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Yuren Cong
Mengmeng Xu
Christian Simon
Shoufa Chen
Jiawei Ren
Yanping Xie
Juan-Manuel Perez-Rua
Bodo Rosenhahn
Tao Xiang
Sen He
DiffM
VGen
89
86
0
09 Oct 2023
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
135
2,817
0
05 Oct 2023
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Danfeng Hong
Wenming Weng
Hao Li
Yuhui Yuan
Jing Yao
Chong Luo
Zhibo Chen
Baining Guo
DiffM
VGen
61
49
0
28 Sep 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
133
932
0
24 Aug 2023
StableVQA: A Deep No-Reference Quality Assessment Model for Video Stability
Tengchuan Kou
Xiaohong Liu
Wei Sun
Jun Jia
Xiongkuo Min
Guangtao Zhai
Ning Liu
53
21
0
09 Aug 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Hang Zhang
Xin Li
Lidong Bing
MLLM
164
1,059
0
05 Jun 2023
ControlVideo: Training-free Controllable Text-to-Video Generation
Yabo Zhang
Yuxiang Wei
Dongsheng Jiang
Xiaopeng Zhang
W. Zuo
Qi Tian
VGen
DiffM
111
252
0
22 May 2023
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Yuval Kirstain
Adam Polyak
Uriel Singer
Shahbuland Matiana
Joe Penna
Omer Levy
EGVM
217
416
0
02 May 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Xia Hu
LM&MA
203
675
0
26 Apr 2023
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
569
4,910
0
17 Apr 2023
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Jiazheng Xu
Xiao Liu
Yuchen Wu
Yuxuan Tong
Qinkai Li
Ming Ding
Jie Tang
Yuxiao Dong
133
408
0
12 Apr 2023
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Wen Wang
Yan Jiang
K. Xie
Zide Liu
Hao Chen
Yue Cao
Xinlong Wang
Chunhua Shen
DiffM
VGen
87
116
0
30 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
102
168
0
28 Mar 2023
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
Levon Khachatryan
A. Movsisyan
Vahram Tadevosyan
Roberto Henschel
Zhangyang Wang
Shant Navasardyan
Humphrey Shi
VGen
74
574
0
23 Mar 2023
Pix2Video: Video Editing using Image Diffusion
Duygu Ceylan
C. Huang
Niloy J. Mitra
DiffM
VGen
91
260
0
22 Mar 2023
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
Chenyang Qi
Xiaodong Cun
Yong Zhang
Chenyang Lei
Xintao Wang
Ying Shan
Qifeng Chen
VGen
87
353
0
16 Mar 2023
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Jay Zhangjie Wu
Yixiao Ge
Xintao Wang
Weixian Lei
Yuchao Gu
Yufei Shi
Wynne Hsu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
VGen
119
743
0
22 Dec 2022
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
131
331
0
06 Dec 2022
Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives
Haoning Wu
Erli Zhang
Liang Liao
Chaofeng Chen
Jingwen Hou
Annan Wang
Wenxiu Sun
Qiong Yan
Weisi Lin
80
168
0
09 Nov 2022
FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling
Haoning Wu
Chaofeng Chen
Jingwen Hou
Liang Liao
Annan Wang
Wenxiu Sun
Qiong Yan
Weisi Lin
101
177
0
06 Jul 2022
A Deep Learning based No-reference Quality Assessment Model for UGC Videos
Wei Sun
Xiongkuo Min
Wei Lu
Guangtao Zhai
83
166
0
29 Apr 2022
Surrogate Gap Minimization Improves Sharpness-Aware Training
Juntang Zhuang
Boqing Gong
Liangzhe Yuan
Huayu Chen
Hartwig Adam
Nicha Dvornek
S. Tatikonda
James Duncan
Ting Liu
76
157
0
15 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
555
4,409
0
28 Jan 2022
LiT: Zero-Shot Transfer with Locked-image text Tuning
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
108
560
0
15 Nov 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
116
635
0
18 Jun 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
490
10,496
0
17 Jun 2021
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Xiangning Chen
Cho-Jui Hsieh
Boqing Gong
ViT
87
328
0
03 Jun 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
427
2,685
0
04 May 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
150
1,584
0
18 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
967
29,810
0
26 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
670
41,430
0
22 Oct 2020
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content
Zhengzhong Tu
Yilin Wang
Neil Birkbeck
Balu Adsumilli
A. Bovik
64
261
0
29 May 2020
Quality Assessment of In-the-Wild Videos
Dingquan Li
Tingting Jiang
Ming Jiang
59
299
0
01 Aug 2019
The Kinetics Human Action Video Dataset
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
...
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
254
3,815
0
19 May 2017
The 2017 DAVIS Challenge on Video Object Segmentation
Jordi Pont-Tuset
Federico Perazzi
Sergi Caelles
Pablo Arbeláez
A. Sorkine-Hornung
Luc Van Gool
VGen
VOS
87
1,217
0
03 Apr 2017
1