ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.09590
  4. Cited By
v1v2 (latest)

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering

12 March 2025
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Gedas Bertasius
Lorenzo Torresani
ArXiv (abs)PDFHTML

Papers citing "BIMBA: Selective-Scan Compression for Long-Range Video Question Answering"

35 / 35 papers shown
Title
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation
Wentao Ma
Weiming Ren
Yiming Jia
Zhuofeng Li
Ping Nie
Ge Zhang
Wenhu Chen
66
1
0
20 May 2025
Tarsier: Recipes for Training and Evaluating Large Video Description
  Models
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
93
67
0
30 Jun 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
...
Xiawu Zheng
Enhong Chen
Caifeng Shan
Xing Sun
Xing Sun
VLMMLLM
152
418
0
31 May 2024
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang
Shoubin Yu
Elias Stengel-Eskin
Jaehong Yoon
Feng Cheng
Gedas Bertasius
Mohit Bansal
125
69
0
29 May 2024
STAR: A Benchmark for Situated Reasoning in Real-World Videos
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B. Tenenbaum
Chuang Gan
124
195
0
15 May 2024
CinePile: A Long Video Question Answering Dataset and Benchmark
CinePile: A Long Video Question Answering Dataset and Benchmark
Ruchit Rawal
Khalid Saifullah
Ronen Basri
David Jacobs
Gowthami Somepalli
Tom Goldstein
87
55
0
14 May 2024
Simba: Mamba augmented U-ShiftGCN for Skeletal Action Recognition in
  Videos
Simba: Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos
S. Chaudhuri
Saumik Bhattacharya
Mamba
76
9
0
11 Apr 2024
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao
Min Zhang
Wei Zhao
Pengxiang Ding
Siteng Huang
Donglin Wang
Mamba
101
74
0
21 Mar 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and
  Comprehension in Vision-Language Large Model
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLMMLLM
148
267
0
29 Jan 2024
A Simple LLM Framework for Long-Range Video Question-Answering
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
148
92
0
28 Dec 2023
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language
  Understanding
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
K. Mangalam
Raiymbek Akshulakov
Jitendra Malik
113
308
0
17 Aug 2023
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Viorica Puatruaucean
Lucas Smaira
Ankush Gupta
Adrià Recasens Continente
L. Markeeva
...
Y. Aytar
Simon Osindero
Dima Damen
Andrew Zisserman
João Carreira
VLM
171
179
0
23 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
126
84
0
22 May 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLMMLLM
286
955
0
27 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
560
4,910
0
17 Apr 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIPVLM
229
1,200
0
27 Mar 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,699
0
15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
429
4,641
0
30 Jan 2023
Efficient Movie Scene Detection using State-Space Transformers
Efficient Movie Scene Detection using State-Space Transformers
Md. Mohaiminul Islam
Mahmudul Hasan
Kishan Athrey
Tony Braskich
Gedas Bertasius
ViT
59
45
0
29 Dec 2022
On the Parameterization and Initialization of Diagonal State Space
  Models
On the Parameterization and Initialization of Diagonal State Space Models
Albert Gu
Ankit Gupta
Karan Goel
Christopher Ré
86
324
0
23 Jun 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
418
3,585
0
29 Apr 2022
Diagonal State Spaces are as Effective as Structured State Spaces
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta
Albert Gu
Jonathan Berant
119
311
0
27 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
883
13,148
0
04 Mar 2022
Efficiently Modeling Long Sequences with Structured State Spaces
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu
Karan Goel
Christopher Ré
217
1,814
0
31 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
399
1,107
0
13 Oct 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
483
10,496
0
17 Jun 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
92
498
0
18 May 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLMViTMDE
204
1,022
0
04 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
967
29,731
0
26 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
667
41,369
0
22 Oct 2020
HiPPO: Recurrent Memory with Optimal Polynomial Projections
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu
Tri Dao
Stefano Ermon
Atri Rudra
Christopher Ré
123
532
0
17 Aug 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
841
42,332
0
28 May 2020
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
Kexin Yi
Yuta Saito
Yunzhu Li
Pushmeet Kohli
Jiajun Wu
Antonio Torralba
J. Tenenbaum
NAI
123
475
0
03 Oct 2019
Decoupled Weight Decay Regularization
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
149
2,150
0
14 Nov 2017
Towards Automatic Learning of Procedures from Web Instructional Videos
Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou
Chenliang Xu
Jason J. Corso
EgoV
75
830
0
28 Mar 2017
1