ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.09852
  4. Cited By
M&M Mix: A Multimodal Multiview Transformer Ensemble

M&M Mix: A Multimodal Multiview Transformer Ensemble

20 June 2022
Xuehan Xiong
Anurag Arnab
Arsha Nagrani
Cordelia Schmid
    ViT
ArXivPDFHTML

Papers citing "M&M Mix: A Multimodal Multiview Transformer Ensemble"

10 / 10 papers shown
Title
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action
  Generalization
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLM
EgoV
42
1
0
28 Mar 2024
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
34
15
0
28 Sep 2023
IndGIC: Supervised Action Recognition under Low Illumination
IndGIC: Supervised Action Recognition under Low Illumination
Jing-Teng Zeng
29
1
0
29 Aug 2023
An Outlook into the Future of Egocentric Vision
An Outlook into the Future of Egocentric Vision
Chiara Plizzari
Gabriele Goletto
Antonino Furnari
Siddhant Bansal
Francesco Ragusa
G. Farinella
Dima Damen
Tatiana Tommasi
EgoV
40
38
0
14 Aug 2023
Multimodal Distillation for Egocentric Action Recognition
Multimodal Distillation for Egocentric Action Recognition
Gorjan Radevski
Dusan Grujicic
Marie-Francine Moens
Matthew Blaschko
Tinne Tuytelaars
EgoV
23
23
0
14 Jul 2023
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Jaesung Huh
Jacob Chalk
Evangelos Kazakos
Dima Damen
Andrew Zisserman
EgoV
18
41
0
01 Feb 2023
Vision Transformers for Action Recognition: A Survey
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Saeed Mian
ViT
19
44
0
13 Sep 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
223
225
0
20 Jan 2022
SCENIC: A JAX Library for Computer Vision Research and Beyond
SCENIC: A JAX Library for Computer Vision Research and Beyond
Mostafa Dehghani
A. Gritsenko
Anurag Arnab
Matthias Minderer
Yi Tay
46
68
0
18 Oct 2021
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
424
596
0
21 Jul 2020
1