ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.02550
69
2
v1v2 (latest)

Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025

3 June 2025
Qiaohui Chu
Haoyu Zhang
Yisen Feng
Meng Liu
Weili Guan
Yaowei Wang
Liqiang Nie
ArXiv (abs)PDFHTML
Main:3 Pages
3 Figures
Bibliography:2 Pages
2 Tables
Abstract

In this report, we present a novel three-stage framework developed for the Ego4D Long-Term Action Anticipation (LTA) task. Inspired by recent advances in foundation models, our method consists of three stages: feature extraction, action recognition, and long-term action anticipation. First, visual features are extracted using a high-performance visual encoder. The features are then fed into a Transformer to predict verbs and nouns, with a verb-noun co-occurrence matrix incorporated to enhance recognition accuracy. Finally, the predicted verb-noun pairs are formatted as textual prompts and input into a fine-tuned large language model (LLM) to anticipate future action sequences. Our framework achieves first place in this challenge at CVPR 2025, establishing a new state-of-the-art in long-term action prediction. Our code will be released at this https URL.

View on arXiv
Comments on this paper