EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception

26 June 2025

Sanjoy Chowdhury

Subrata Biswas

Sayan Nag

Tushar Nagarajan

Calvin Murdock

Ishwarya Ananthabhotla

Yijun Qian

Vamsi Krishna Ithapu

Dinesh Manocha

Ruohan Gao

EgoV

ArXiv (abs)PDF HTML

Main:8 Pages

13 Figures

Bibliography:6 Pages

15 Tables

Appendix:11 Pages

Abstract

Modern perception models, particularly those designed for multisensory egocentric tasks, have achieved remarkable performance but often come with substantial computational costs. These high demands pose challenges for real-world deployment, especially in resource-constrained environments. In this paper, we introduce EgoAdapt, a framework that adaptively performs cross-modal distillation and policy learning to enable efficient inference across different egocentric perception tasks, including egocentric action recognition, active speaker localization, and behavior anticipation. Our proposed policy module is adaptable to task-specific action spaces, making it broadly applicable. Experimental results on three challenging egocentric datasets EPIC-Kitchens, EasyCom, and Aria Everyday Activities demonstrate that our method significantly enhances efficiency, reducing GMACs by up to 89.09%, parameters up to 82.02%, and energy up to 9.6x, while still on-par and in many cases outperforming, the performance of corresponding state-of-the-art models.

View on arXiv

Comments on this paper