M&M Mix: A Multimodal Multiview Transformer Ensemble

20 June 2022

Papers citing "M&M Mix: A Multimodal Multiview Transformer Ensemble"

10 / 10 papers shown

Title
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization Anna Kukleva Fadime Sener Edoardo Remelli Bugra Tekin Eric Sauser Bernt Schiele Shugao Ma VLM EgoV 42 1 0 28 Mar 2024
Training a Large Video Model on a Single Machine in a Day Yue Zhao Philipp Krahenbuhl VLM 34 15 0 28 Sep 2023
IndGIC: Supervised Action Recognition under Low Illumination Jing-Teng Zeng 29 1 0 29 Aug 2023
An Outlook into the Future of Egocentric Vision Chiara Plizzari Gabriele Goletto Antonino Furnari Siddhant Bansal Francesco Ragusa G. Farinella Dima Damen Tatiana Tommasi EgoV 40 38 0 14 Aug 2023
Multimodal Distillation for Egocentric Action Recognition Gorjan Radevski Dusan Grujicic Marie-Francine Moens Matthew Blaschko Tinne Tuytelaars EgoV 23 23 0 14 Jul 2023
Epic-Sounds: A Large-scale Dataset of Actions That Sound Jaesung Huh Jacob Chalk Evangelos Kazakos Dima Damen Andrew Zisserman EgoV 18 41 0 01 Feb 2023
Vision Transformers for Action Recognition: A Survey Anwaar Ulhaq Naveed Akhtar Ganna Pogrebna Ajmal Saeed Mian ViT 19 44 0 13 Sep 2022
Omnivore: A Single Model for Many Visual Modalities Rohit Girdhar Mannat Singh Nikhil Ravi L. V. D. van der Maaten Armand Joulin Ishan Misra 223 225 0 20 Jan 2022
SCENIC: A JAX Library for Computer Vision Research and Beyond Mostafa Dehghani A. Gritsenko Anurag Arnab Matthias Minderer Yi Tay 46 68 0 18 Oct 2021
Multi-modal Transformer for Video Retrieval Valentin Gabeur Chen Sun Alahari Karteek Cordelia Schmid ViT 424 596 0 21 Jul 2020