ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19535
  4. Cited By
TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs

TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs

26 May 2025
Juntong Wang
Jiarui Wang
Huiyu Duan
Guangtao Zhai
Xiongkuo Min
ArXiv (abs)PDFHTML

Papers citing "TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs"

47 / 47 papers shown
Title
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs
Jiarui Wang
Huiyu Duan
Yu Zhao
Juntong Wang
Guangtao Zhai
Xiongkuo Min
MLLMEGVMLM&MA
91
3
0
11 Apr 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
380
1,970
0
22 Jan 2025
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and
  Generation
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Liao Qu
Huichao Zhang
Yiheng Liu
Xinyu Wang
Yi Jiang
Yiming Gao
Hu Ye
Daniel K. Du
Zehuan Yuan
Xinglong Wu
136
39
0
04 Dec 2024
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of
  Text-to-Video Generation with LMM
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
Jiarui Wang
Huiyu Duan
Guangtao Zhai
Juntong Wang
Xiongkuo Min
EGVM
90
8
0
26 Nov 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
  Large Language Models
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLMVLM
77
138
0
09 Aug 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large
  Multimodal Models
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Feng Li
Renrui Zhang
Hao Zhang
Yuanhan Zhang
Bo Li
Wei Li
Zejun Ma
Chunyuan Li
MLLMVLM
105
233
0
10 Jul 2024
Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models
  Using Spatio-Temporal Slices
Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices
Nathaniel Cohen
Vladimir Kulikov
Matan Kleiner
Inbar Huberman-Spiegelglas
T. Michaeli
VGenDiffM
42
17
0
20 May 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
EGVM
113
171
0
01 Apr 2024
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
Shuai Yang
Yifan Zhou
Ziwei Liu
Chen Change Loy
VGenDiffM
116
32
0
19 Mar 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLMMLLM
254
1,210
0
21 Dec 2023
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing
  with Diffusion Models
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
Ozgur Kara
Barışcan Kurtkaya
Hidir Yesiltepe
James M. Rehg
Pinar Yanardag
VGenDiffM
68
54
0
07 Dec 2023
CVPR 2023 Text Guided Video Editing Competition
CVPR 2023 Text Guided Video Editing Competition
Jay Zhangjie Wu
Xiuyu Li
Difei Gao
Zhen Dong
Jinbin Bai
...
Xu Cheng
Jie Tang
Mike Zheng Shou
Kurt Keutzer
Forrest N. Iandola
64
35
0
24 Oct 2023
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video
  editing
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Yuren Cong
Mengmeng Xu
Christian Simon
Shoufa Chen
Jiawei Ren
Yanping Xie
Juan-Manuel Perez-Rua
Bodo Rosenhahn
Tao Xiang
Sen He
DiffMVGen
89
86
0
09 Oct 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLMMLLM
135
2,817
0
05 Oct 2023
CCEdit: Creative and Controllable Video Editing via Diffusion Models
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Danfeng Hong
Wenming Weng
Hao Li
Yuhui Yuan
Jing Yao
Chong Luo
Zhibo Chen
Baining Guo
DiffMVGen
61
49
0
28 Sep 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding,
  Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLMVLMObjD
133
932
0
24 Aug 2023
StableVQA: A Deep No-Reference Quality Assessment Model for Video
  Stability
StableVQA: A Deep No-Reference Quality Assessment Model for Video Stability
Tengchuan Kou
Xiaohong Liu
Wei Sun
Jun Jia
Xiongkuo Min
Guangtao Zhai
Ning Liu
53
21
0
09 Aug 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video
  Understanding
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Hang Zhang
Xin Li
Lidong Bing
MLLM
164
1,059
0
05 Jun 2023
ControlVideo: Training-free Controllable Text-to-Video Generation
ControlVideo: Training-free Controllable Text-to-Video Generation
Yabo Zhang
Yuxiang Wei
Dongsheng Jiang
Xiaopeng Zhang
W. Zuo
Qi Tian
VGenDiffM
111
252
0
22 May 2023
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image
  Generation
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Yuval Kirstain
Adam Polyak
Uriel Singer
Shahbuland Matiana
Joe Penna
Omer Levy
EGVM
217
416
0
02 May 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Xia Hu
LM&MA
203
675
0
26 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
569
4,910
0
17 Apr 2023
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image
  Generation
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Jiazheng Xu
Xiao Liu
Yuchen Wu
Yuxuan Tong
Qinkai Li
Ming Ding
Jie Tang
Yuxiao Dong
133
408
0
12 Apr 2023
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Wen Wang
Yan Jiang
K. Xie
Zide Liu
Hao Chen
Yue Cao
Xinlong Wang
Chunhua Shen
DiffMVGen
87
116
0
30 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
102
168
0
28 Mar 2023
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video
  Generators
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
Levon Khachatryan
A. Movsisyan
Vahram Tadevosyan
Roberto Henschel
Zhangyang Wang
Shant Navasardyan
Humphrey Shi
VGen
74
574
0
23 Mar 2023
Pix2Video: Video Editing using Image Diffusion
Pix2Video: Video Editing using Image Diffusion
Duygu Ceylan
C. Huang
Niloy J. Mitra
DiffMVGen
91
260
0
22 Mar 2023
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
Chenyang Qi
Xiaodong Cun
Yong Zhang
Chenyang Lei
Xintao Wang
Ying Shan
Qifeng Chen
VGen
87
353
0
16 Mar 2023
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for
  Text-to-Video Generation
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Jay Zhangjie Wu
Yixiao Ge
Xintao Wang
Weixian Lei
Yuchao Gu
Yufei Shi
Wynne Hsu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
VGen
119
743
0
22 Dec 2022
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLMVGen
131
331
0
06 Dec 2022
Exploring Video Quality Assessment on User Generated Contents from
  Aesthetic and Technical Perspectives
Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives
Haoning Wu
Erli Zhang
Liang Liao
Chaofeng Chen
Jingwen Hou
Annan Wang
Wenxiu Sun
Qiong Yan
Weisi Lin
80
168
0
09 Nov 2022
FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment
  Sampling
FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling
Haoning Wu
Chaofeng Chen
Jingwen Hou
Liang Liao
Annan Wang
Wenxiu Sun
Qiong Yan
Weisi Lin
101
177
0
06 Jul 2022
A Deep Learning based No-reference Quality Assessment Model for UGC
  Videos
A Deep Learning based No-reference Quality Assessment Model for UGC Videos
Wei Sun
Xiongkuo Min
Wei Lu
Guangtao Zhai
83
166
0
29 Apr 2022
Surrogate Gap Minimization Improves Sharpness-Aware Training
Surrogate Gap Minimization Improves Sharpness-Aware Training
Juntang Zhuang
Boqing Gong
Liangzhe Yuan
Huayu Chen
Hartwig Adam
Nicha Dvornek
S. Tatikonda
James Duncan
Ting Liu
76
157
0
15 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
555
4,409
0
28 Jan 2022
LiT: Zero-Shot Transfer with Locked-image text Tuning
LiT: Zero-Shot Transfer with Locked-image text Tuning
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
108
560
0
15 Nov 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision
  Transformers
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
116
635
0
18 Jun 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
490
10,496
0
17 Jun 2021
When Vision Transformers Outperform ResNets without Pre-training or
  Strong Data Augmentations
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Xiangning Chen
Cho-Jui Hsieh
Boqing Gong
ViT
87
328
0
03 Jun 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
427
2,685
0
04 May 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
150
1,584
0
18 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
967
29,810
0
26 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
670
41,430
0
22 Oct 2020
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated
  Content
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content
Zhengzhong Tu
Yilin Wang
Neil Birkbeck
Balu Adsumilli
A. Bovik
64
261
0
29 May 2020
Quality Assessment of In-the-Wild Videos
Quality Assessment of In-the-Wild Videos
Dingquan Li
Tingting Jiang
Ming Jiang
59
299
0
01 Aug 2019
The Kinetics Human Action Video Dataset
The Kinetics Human Action Video Dataset
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
...
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
254
3,815
0
19 May 2017
The 2017 DAVIS Challenge on Video Object Segmentation
The 2017 DAVIS Challenge on Video Object Segmentation
Jordi Pont-Tuset
Federico Perazzi
Sergi Caelles
Pablo Arbeláez
A. Sorkine-Hornung
Luc Van Gool
VGenVOS
87
1,217
0
03 Apr 2017
1