Object Level Visual Reasoning in Videos

16 June 2018

Natalia Neverova

Papers citing "Object Level Visual Reasoning in Videos"

47 / 47 papers shown

Title
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding Dibyadip Chatterjee Edoardo Remelli Yale Song Bugra Tekin Abhay Mittal ... Shreyas Hampali Eric Sauser Shugao Ma Angela Yao Fadime Sener VLM 51 0 0 10 Apr 2025
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level Andong Deng Tongjia Chen Shoubin Yu Taojiannan Yang Lincoln Spencer Yapeng Tian Ajmal Mian Joey Tianyi Zhou Chen Chen LRM 68 1 0 15 Nov 2024
3VL: Using Trees to Improve Vision-Language Models' Interpretability Nir Yellinek Leonid Karlinsky Raja Giryes CoGe VLM 51 4 0 28 Dec 2023
An Outlook into the Future of Egocentric Vision Chiara Plizzari Gabriele Goletto Antonino Furnari Siddhant Bansal Francesco Ragusa G. Farinella Dima Damen Tatiana Tommasi EgoV 40 38 0 14 Aug 2023
Difference-Masking: Choosing What to Mask in Continued Pretraining Alex Wilf Syeda Nahida Akter Leena Mathur Paul Pu Liang Sheryl Mathew Mengrou Shou Eric Nyberg Louis-Philippe Morency CLL SSL 32 4 0 23 May 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers A. Gritsenko Xuehan Xiong Josip Djolonga Mostafa Dehghani Chen Sun Mario Lucic Cordelia Schmid Anurag Arnab ViT 42 13 0 24 Apr 2023
Unbiased Scene Graph Generation in Videos Sayak Nag Kyle Min Subarna Tripathi A. Roy-Chowdhury 34 29 0 03 Apr 2023
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries Jinjie Mai Abdullah Hamdi Silvio Giancola Chen Zhao Guohao Li EgoV 40 14 0 14 Dec 2022
Egocentric Video Task Translation Zihui Xue Yale Song Kristen Grauman Lorenzo Torresani EgoV 33 12 0 13 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data Roei Herzig Ofir Abramovich Elad Ben-Avraham Assaf Arbelle Leonid Karlinsky Ariel Shamir Trevor Darrell Amir Globerson 43 16 0 08 Dec 2022
Teaching Structured Vision&Language Concepts to Vision&Language Models Sivan Doveh Assaf Arbelle Sivan Harary Yikang Shen Roei Herzig ... Donghyun Kim Raja Giryes Rogerio Feris S. Ullman Leonid Karlinsky VLM CoGe 56 71 0 21 Nov 2022
Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions Yong-Lu Li Hongwei Fan Zuoyu Qiu Yiming Dou Liang Xu ... Peiyang Guo Haisheng Su Dongliang Wang Wei Wu Cewu Lu 35 7 0 14 Nov 2022
Video Event Extraction via Tracking Visual States of Arguments Guang Yang Manling Li Jiajie Zhang Xudong Lin Shih-Fu Chang Heng Ji 32 9 0 03 Nov 2022
EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations Ahmad Darkhalil Dandan Shan Bin Zhu Jian Ma Amlan Kar Richard E. L. Higgins Sanja Fidler David Fouhey Dima Damen VOS 55 98 0 26 Sep 2022
Is an Object-Centric Video Representation Beneficial for Transfer? Chuhan Zhang Ankush Gupta Andrew Zisserman ViT 37 27 0 20 Jul 2022
Beyond Transfer Learning: Co-finetuning for Action Localisation Anurag Arnab Xuehan Xiong A. Gritsenko Rob Romijnders Josip Djolonga Mostafa Dehghani Chen Sun Mario Lucic Cordelia Schmid 38 8 0 08 Jul 2022
Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields Mingtong Zhang Shuhong Zheng Zhipeng Bao M. Hebert Yu-xiong Wang 29 12 0 09 Jun 2022
Hierarchical Self-supervised Representation Learning for Movie Understanding Fanyi Xiao Kaustav Kundu Joseph Tighe Davide Modolo SSL 44 24 0 06 Apr 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs Hazel Doughty Cees G. M. Snoek 35 19 0 23 Mar 2022
Distillation of Human-Object Interaction Contexts for Action Recognition Muna Almushyti Frederick W. Li 34 3 0 17 Dec 2021
Object-Region Video Transformers Roei Herzig Elad Ben-Avraham K. Mangalam Amir Bar Gal Chechik Anna Rohrbach Trevor Darrell Amir Globerson ViT 30 82 0 13 Oct 2021
Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering Long Hoang Dang T. Le Vuong Le T. Tran 30 60 0 25 Jun 2021
Towards Long-Form Video Understanding Chaoxia Wu Philipp Krahenbuhl VLM ViT 54 166 0 21 Jun 2021
Modeling long-term interactions to enhance action recognition Alejandro Cartas Petia Radeva Mariella Dimiccoli EgoV 27 6 0 23 Apr 2021
Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos Yanghao Li Tushar Nagarajan Bo Xiong Kristen Grauman EgoV 51 84 0 16 Apr 2021
Coarse-Fine Networks for Temporal Activity Detection in Videos Kumara Kahatapitiya Michael S. Ryoo AI4TS 53 38 0 01 Mar 2021
Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries Swathikiran Sudhakaran Sergio Escalera Oswald Lanz EgoV 27 15 0 16 Feb 2021
AssembleNet++: Assembling Modality Representations via Attention Connections Michael S. Ryoo A. Piergiovanni Juhana Kangaspunta A. Angelova 15 44 0 18 Aug 2020
Detecting Human-Object Interactions with Action Co-occurrence Priors Dong-Jin Kim Xiao Sun Jinsoo Choi Stephen Lin In So Kweon 24 124 0 17 Jul 2020
Understanding Human Hands in Contact at Internet Scale Dandan Shan Jiaqi Geng Michelle Shu David Fouhey 42 319 0 11 Jun 2020
Dynamic Language Binding in Relational Visual Reasoning T. Le Vuong Le Svetha Venkatesh T. Tran NAI 26 19 0 30 Apr 2020
Audiovisual SlowFast Networks for Video Recognition Fanyi Xiao Yong Jae Lee Kristen Grauman Jitendra Malik Christoph Feichtenhofer 197 207 0 23 Jan 2020
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors Lei Wang Piotr Koniusz 24 50 0 14 Jan 2020
Action Genome: Actions as Composition of Spatio-temporal Scene Graphs Jingwei Ji Ranjay Krishna Li Fei-Fei Juan Carlos Niebles 39 336 0 15 Dec 2019
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks Corentin Kervadec G. Antipov M. Baccouche Christian Wolf 21 15 0 06 Dec 2019
Action Anticipation for Collaborative Environments: The Impact of Contextual Information and Uncertainty-Based Prediction Clebeson Canuto dos Santos Plinio Moreno J. L. A. Samatelo R. Vassallo J. Santos-Victor 25 7 0 01 Oct 2019
Grouped Spatial-Temporal Aggregation for Efficient Action Recognition Chenxu Luo Alan Yuille 130 150 0 28 Sep 2019
Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019 Xiaohan Wang Yu Wu Linchao Zhu Yi Yang 24 19 0 22 Jun 2019
What Makes Training Multi-Modal Classification Networks Hard? Weiyao Wang Du Tran Matt Feiszli 34 442 0 29 May 2019
Representation Learning on Visual-Symbolic Graphs for Video Understanding E. Mavroudi Benjamín Béjar Haro René Vidal 27 8 0 17 May 2019
Egocentric Hand Track and Object-based Human Action Recognition G. Kapidis R. Poppe E. V. Dam L. Noldus R. Veltkamp EgoV 21 36 0 02 May 2019
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions Bugra Tekin Federica Bogo Marc Pollefeys EgoV 27 252 0 10 Apr 2019
Long-Term Feature Banks for Detailed Video Understanding Chao-Yuan Wu Christoph Feichtenhofer Haoqi Fan Kaiming He Philipp Krahenbuhl Ross B. Girshick 62 477 0 12 Dec 2018
A Structured Model For Action Detection Yubo Zhang P. Tokmakov M. Hebert Cordelia Schmid 28 101 0 09 Dec 2018
Video Action Transformer Network Rohit Girdhar João Carreira Carl Doersch Andrew Zisserman ViT 28 702 0 06 Dec 2018
Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification Yang Du Chunfen Yuan Bing Li Lili Zhao Yangxi Li Weiming Hu 81 79 0 03 Aug 2018
Interaction Networks for Learning about Objects, Relations and Physics Peter W. Battaglia Razvan Pascanu Matthew Lai Danilo Jimenez Rezende Koray Kavukcuoglu AI4CE OCL PINN GNN 283 1,401 0 01 Dec 2016