ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.01692
  4. Cited By
Long Movie Clip Classification with State-Space Video Models
v1v2v3 (latest)

Long Movie Clip Classification with State-Space Video Models

4 April 2022
Md. Mohaiminul Islam
Gedas Bertasius
    VLM
ArXiv (abs)PDFHTMLGithub (55★)

Papers citing "Long Movie Clip Classification with State-Space Video Models"

27 / 77 papers shown
Title
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li
Harkanwar Singh
Aditya Grover
Mamba
173
64
0
08 Feb 2024
U-shaped Vision Mamba for Single Image Dehazing
U-shaped Vision Mamba for Single Image Dehazing
Zhuoran Zheng
Chen Henry Wu
128
38
0
06 Feb 2024
Multi-modal News Understanding with Professionally Labelled Videos
  (ReutersViLNews)
Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)
Shih-Han Chou
Matthew Kowal
Yasmin Niknam
Diana Moyano
Shayaan Mehdi
...
Cheng Zhang
Ian Knopke
S. Kocak
Leonid Sigal
Yalda Mohsenzadeh
137
1
0
23 Jan 2024
Vision Mamba: Efficient Visual Representation Learning with
  Bidirectional State Space Model
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
121
809
0
17 Jan 2024
U-Mamba: Enhancing Long-range Dependency for Biomedical Image
  Segmentation
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
Jun Ma
Feifei Li
Bo Wang
Mamba
143
370
0
09 Jan 2024
A Simple LLM Framework for Long-Range Video Question-Answering
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
192
92
0
28 Dec 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
150
2
0
30 Nov 2023
Query-aware Long Video Localization and Relation Discrimination for Deep
  Video Understanding
Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding
Yuanxing Xu
Yuting Wei
Bin Wu
52
0
0
19 Oct 2023
Incorporating Domain Knowledge Graph into Multimodal Movie Genre
  Classification with Self-Supervised Attention and Contrastive Learning
Incorporating Domain Knowledge Graph into Multimodal Movie Genre Classification with Self-Supervised Attention and Contrastive Learning
Jiaqi Li
Guilin Qi
Chuanyi Zhang
Yongrui Chen
Yiming Tan
Chenlong Xia
Ye Tian
81
3
0
12 Oct 2023
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for
  Long-form Video Understanding
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding
Mohamed Afham
Satya Narayan Shukla
Omid Poursaeed
Pengchuan Zhang
Ashish Shah
Sernam Lim
VLM
57
2
0
20 Sep 2023
Are current long-term video understanding datasets long-term?
Are current long-term video understanding datasets long-term?
Ombretta Strafforello
Klamer Schutte
Jan van Gemert
59
8
0
22 Aug 2023
Long-range Multimodal Pretraining for Movie Understanding
Long-range Multimodal Pretraining for Movie Understanding
Dawit Mureja Argaw
Joon-Young Lee
Markus Woodson
In So Kweon
Fabian Caba Heilbron
VLM
77
9
0
18 Aug 2023
Facing Off World Model Backbones: RNNs, Transformers, and S4
Facing Off World Model Backbones: RNNs, Transformers, and S4
Fei Deng
Junyeong Park
Sungjin Ahn
90
32
0
05 Jul 2023
Decision S4: Efficient Sequence-Based RL via State Spaces Layers
Decision S4: Efficient Sequence-Based RL via State Spaces Layers
Shmuel Bar-David
Itamar Zimerman
Eliya Nachmani
Lior Wolf
OffRL
109
28
0
08 Jun 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
110
2
0
12 Apr 2023
ENTL: Embodied Navigation Trajectory Learner
ENTL: Embodied Navigation Trajectory Learner
Klemen Kotar
Aaron Walsman
Roozbeh Mottaghi
107
7
0
05 Apr 2023
Selective Structured State-Spaces for Long-Form Video Understanding
Selective Structured State-Spaces for Long-Form Video Understanding
Jue Wang
Wenjie Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Raffay Hamid
92
100
0
25 Mar 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
341
299
0
11 Mar 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu
Elliot L. Epstein
Eric N. D. Nguyen
A. Thomas
Michael Zhang
Tri Dao
Atri Rudra
Christopher Ré
64
55
0
13 Feb 2023
HierVL: Learning Hierarchical Video-Language Embeddings
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLMAI4TS
113
59
0
05 Jan 2023
Efficient Movie Scene Detection using State-Space Transformers
Efficient Movie Scene Detection using State-Space Transformers
Md. Mohaiminul Islam
Mahmudul Hasan
Kishan Athrey
Tony Braskich
Gedas Bertasius
ViT
68
45
0
29 Dec 2022
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form
  Video Question Answering
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering
Difei Gao
Luowei Zhou
Lei Ji
Linchao Zhu
Yezhou Yang
Mike Zheng Shou
87
60
0
19 Dec 2022
Spatio-Temporal Crop Aggregation for Video Representation Learning
Spatio-Temporal Crop Aggregation for Video Representation Learning
Sepehr Sameni
Simon Jenni
Paolo Favaro
94
3
0
30 Nov 2022
S4ND: Modeling Images and Videos as Multidimensional Signals Using State
  Spaces
S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces
Eric N. D. Nguyen
Karan Goel
Albert Gu
Gordon W. Downs
Preey Shah
Tri Dao
S. Baccus
Christopher Ré
VLM
93
40
0
12 Oct 2022
Temporally Consistent Transformers for Video Generation
Temporally Consistent Transformers for Video Generation
Wilson Yan
Danijar Hafner
Stephen James
Pieter Abbeel
DiffM
94
31
0
05 Oct 2022
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video
  Temporal Grounding
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding
Zhijian Hou
Wanjun Zhong
Lei Ji
Difei Gao
Kun Yan
W. Chan
Chong-Wah Ngo
Zheng Shou
Nan Duan
AI4TS
118
26
0
22 Sep 2022
Movies2Scenes: Using Movie Metadata to Learn Scene Representation
Movies2Scenes: Using Movie Metadata to Learn Scene Representation
Shixing Chen
Chundi Liu
Xiang Hao
Xiaohan Nie
Maxim Arap
Raffay Hamid
63
17
0
22 Feb 2022
Previous
12