ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.11310
  4. Cited By
Towards Long-Form Video Understanding

Towards Long-Form Video Understanding

21 June 2021
Chaoxia Wu
Philipp Krahenbuhl
    VLM
    ViT
ArXivPDFHTML

Papers citing "Towards Long-Form Video Understanding"

21 / 121 papers shown
Title
The Anatomy of Video Editing: A Dataset and Benchmark Suite for
  AI-Assisted Video Editing
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing
Dawit Mureja Argaw
Fabian Caba Heilbron
Joon-Young Lee
Markus Woodson
In So Kweon
VGen
50
22
0
20 Jul 2022
It's Time for Artistic Correspondence in Music and Video
It's Time for Artistic Correspondence in Music and Video
Dídac Surís
Carl Vondrick
Bryan C. Russell
Justin Salamon
16
37
0
14 Jun 2022
Bringing Image Scene Structure to Video via Frame-Clip Consistency of
  Object Tokens
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Elad Ben-Avraham
Roei Herzig
K. Mangalam
Amir Bar
Anna Rohrbach
Leonid Karlinsky
Trevor Darrell
Amir Globerson
19
0
0
13 Jun 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
126
62
0
17 May 2022
Hierarchical Self-supervised Representation Learning for Movie
  Understanding
Hierarchical Self-supervised Representation Learning for Movie Understanding
Fanyi Xiao
Kaustav Kundu
Joseph Tighe
Davide Modolo
SSL
44
24
0
06 Apr 2022
Depth-Guided Sparse Structure-from-Motion for Movies and TV Shows
Depth-Guided Sparse Structure-from-Motion for Movies and TV Shows
Shengqing Liu
Xiaohan Nie
Raffay Hamid
12
16
0
05 Apr 2022
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric
  Videos
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos
Shao-Wei Liu
Subarna Tripathi
Somdeb Majumdar
Xiaolong Wang
EgoV
26
93
0
04 Apr 2022
Long Movie Clip Classification with State-Space Video Models
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
43
102
0
04 Apr 2022
Synopses of Movie Narratives: a Video-Language Dataset for Story
  Understanding
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding
Yidan Sun
Qin Chao
Yangfeng Ji
Boyang Albert Li
VGen
33
10
0
11 Mar 2022
Movies2Scenes: Using Movie Metadata to Learn Scene Representation
Movies2Scenes: Using Movie Metadata to Learn Scene Representation
Shixing Chen
Chundi Liu
Xiang Hao
Xiaohan Nie
Maxim Arap
Raffay Hamid
31
17
0
22 Feb 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient
  Long-Term Video Recognition
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
42
198
0
20 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
88
655
0
16 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
51
677
0
02 Dec 2021
Exploring Segment-level Semantics for Online Phase Recognition from
  Surgical Videos
Exploring Segment-level Semantics for Online Phase Recognition from Surgical Videos
Xinpeng Ding
Xiaomeng Li
22
33
0
22 Nov 2021
Object-Region Video Transformers
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
30
82
0
13 Oct 2021
Unsupervised View-Invariant Human Posture Representation
Unsupervised View-Invariant Human Posture Representation
Faegheh Sardari
Bjorn Ommer
Majid Mirmehdi
3DH
25
3
0
17 Sep 2021
Long Short-Term Transformer for Online Action Detection
Long Short-Term Transformer for Online Action Detection
Mingze Xu
Yuanjun Xiong
Hao Chen
Xinyu Li
Wei Xia
Z. Tu
Stefano Soatto
ViT
32
130
0
07 Jul 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
37
127
0
21 Jun 2021
Audiovisual SlowFast Networks for Video Recognition
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
206
0
23 Jan 2020
ECO: Efficient Convolutional Network for Online Video Understanding
ECO: Efficient Convolutional Network for Online Video Understanding
Mohammadreza Zolfaghari
Kamaljeet Singh
Thomas Brox
130
496
0
24 Apr 2018
Previous
123