Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1806.06157
Cited By
Object Level Visual Reasoning in Videos
16 June 2018
Fabien Baradel
Natalia Neverova
Christian Wolf
J. Mille
Greg Mori
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Object Level Visual Reasoning in Videos"
47 / 47 papers shown
Title
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Dibyadip Chatterjee
Edoardo Remelli
Yale Song
Bugra Tekin
Abhay Mittal
...
Shreyas Hampali
Eric Sauser
Shugao Ma
Angela Yao
Fadime Sener
VLM
51
0
0
10 Apr 2025
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Ajmal Mian
Joey Tianyi Zhou
Chen Chen
LRM
68
1
0
15 Nov 2024
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
51
4
0
28 Dec 2023
An Outlook into the Future of Egocentric Vision
Chiara Plizzari
Gabriele Goletto
Antonino Furnari
Siddhant Bansal
Francesco Ragusa
G. Farinella
Dima Damen
Tatiana Tommasi
EgoV
40
38
0
14 Aug 2023
Difference-Masking: Choosing What to Mask in Continued Pretraining
Alex Wilf
Syeda Nahida Akter
Leena Mathur
Paul Pu Liang
Sheryl Mathew
Mengrou Shou
Eric Nyberg
Louis-Philippe Morency
CLL
SSL
32
4
0
23 May 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers
A. Gritsenko
Xuehan Xiong
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
Anurag Arnab
ViT
42
13
0
24 Apr 2023
Unbiased Scene Graph Generation in Videos
Sayak Nag
Kyle Min
Subarna Tripathi
A. Roy-Chowdhury
34
29
0
03 Apr 2023
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries
Jinjie Mai
Abdullah Hamdi
Silvio Giancola
Chen Zhao
Guohao Li
EgoV
40
14
0
14 Dec 2022
Egocentric Video Task Translation
Zihui Xue
Yale Song
Kristen Grauman
Lorenzo Torresani
EgoV
33
12
0
13 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data
Roei Herzig
Ofir Abramovich
Elad Ben-Avraham
Assaf Arbelle
Leonid Karlinsky
Ariel Shamir
Trevor Darrell
Amir Globerson
43
16
0
08 Dec 2022
Teaching Structured Vision&Language Concepts to Vision&Language Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Yikang Shen
Roei Herzig
...
Donghyun Kim
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
56
71
0
21 Nov 2022
Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions
Yong-Lu Li
Hongwei Fan
Zuoyu Qiu
Yiming Dou
Liang Xu
...
Peiyang Guo
Haisheng Su
Dongliang Wang
Wei Wu
Cewu Lu
35
7
0
14 Nov 2022
Video Event Extraction via Tracking Visual States of Arguments
Guang Yang
Manling Li
Jiajie Zhang
Xudong Lin
Shih-Fu Chang
Heng Ji
32
9
0
03 Nov 2022
EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
Ahmad Darkhalil
Dandan Shan
Bin Zhu
Jian Ma
Amlan Kar
Richard E. L. Higgins
Sanja Fidler
David Fouhey
Dima Damen
VOS
55
98
0
26 Sep 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
37
27
0
20 Jul 2022
Beyond Transfer Learning: Co-finetuning for Action Localisation
Anurag Arnab
Xuehan Xiong
A. Gritsenko
Rob Romijnders
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
38
8
0
08 Jul 2022
Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields
Mingtong Zhang
Shuhong Zheng
Zhipeng Bao
M. Hebert
Yu-xiong Wang
29
12
0
09 Jun 2022
Hierarchical Self-supervised Representation Learning for Movie Understanding
Fanyi Xiao
Kaustav Kundu
Joseph Tighe
Davide Modolo
SSL
44
24
0
06 Apr 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
35
19
0
23 Mar 2022
Distillation of Human-Object Interaction Contexts for Action Recognition
Muna Almushyti
Frederick W. Li
34
3
0
17 Dec 2021
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
30
82
0
13 Oct 2021
Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering
Long Hoang Dang
T. Le
Vuong Le
T. Tran
30
60
0
25 Jun 2021
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
54
166
0
21 Jun 2021
Modeling long-term interactions to enhance action recognition
Alejandro Cartas
Petia Radeva
Mariella Dimiccoli
EgoV
27
6
0
23 Apr 2021
Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos
Yanghao Li
Tushar Nagarajan
Bo Xiong
Kristen Grauman
EgoV
51
84
0
16 Apr 2021
Coarse-Fine Networks for Temporal Activity Detection in Videos
Kumara Kahatapitiya
Michael S. Ryoo
AI4TS
53
38
0
01 Mar 2021
Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
EgoV
27
15
0
16 Feb 2021
AssembleNet++: Assembling Modality Representations via Attention Connections
Michael S. Ryoo
A. Piergiovanni
Juhana Kangaspunta
A. Angelova
15
44
0
18 Aug 2020
Detecting Human-Object Interactions with Action Co-occurrence Priors
Dong-Jin Kim
Xiao Sun
Jinsoo Choi
Stephen Lin
In So Kweon
24
124
0
17 Jul 2020
Understanding Human Hands in Contact at Internet Scale
Dandan Shan
Jiaqi Geng
Michelle Shu
David Fouhey
42
319
0
11 Jun 2020
Dynamic Language Binding in Relational Visual Reasoning
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
NAI
26
19
0
30 Apr 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
207
0
23 Jan 2020
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors
Lei Wang
Piotr Koniusz
24
50
0
14 Jan 2020
Action Genome: Actions as Composition of Spatio-temporal Scene Graphs
Jingwei Ji
Ranjay Krishna
Li Fei-Fei
Juan Carlos Niebles
39
336
0
15 Dec 2019
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
21
15
0
06 Dec 2019
Action Anticipation for Collaborative Environments: The Impact of Contextual Information and Uncertainty-Based Prediction
Clebeson Canuto dos Santos
Plinio Moreno
J. L. A. Samatelo
R. Vassallo
J. Santos-Victor
25
7
0
01 Oct 2019
Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
Chenxu Luo
Alan Yuille
130
150
0
28 Sep 2019
Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019
Xiaohan Wang
Yu Wu
Linchao Zhu
Yi Yang
24
19
0
22 Jun 2019
What Makes Training Multi-Modal Classification Networks Hard?
Weiyao Wang
Du Tran
Matt Feiszli
34
442
0
29 May 2019
Representation Learning on Visual-Symbolic Graphs for Video Understanding
E. Mavroudi
Benjamín Béjar Haro
René Vidal
27
8
0
17 May 2019
Egocentric Hand Track and Object-based Human Action Recognition
G. Kapidis
R. Poppe
E. V. Dam
L. Noldus
R. Veltkamp
EgoV
21
36
0
02 May 2019
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions
Bugra Tekin
Federica Bogo
Marc Pollefeys
EgoV
27
252
0
10 Apr 2019
Long-Term Feature Banks for Detailed Video Understanding
Chao-Yuan Wu
Christoph Feichtenhofer
Haoqi Fan
Kaiming He
Philipp Krahenbuhl
Ross B. Girshick
62
477
0
12 Dec 2018
A Structured Model For Action Detection
Yubo Zhang
P. Tokmakov
M. Hebert
Cordelia Schmid
28
101
0
09 Dec 2018
Video Action Transformer Network
Rohit Girdhar
João Carreira
Carl Doersch
Andrew Zisserman
ViT
28
702
0
06 Dec 2018
Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification
Yang Du
Chunfen Yuan
Bing Li
Lili Zhao
Yangxi Li
Weiming Hu
81
79
0
03 Aug 2018
Interaction Networks for Learning about Objects, Relations and Physics
Peter W. Battaglia
Razvan Pascanu
Matthew Lai
Danilo Jimenez Rezende
Koray Kavukcuoglu
AI4CE
OCL
PINN
GNN
283
1,401
0
01 Dec 2016
1