ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2512.14442
45
0

A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

16 December 2025
Zixin Zhang
Kanghao Chen
Hanqing Wang
Hongfei Zhang
Harold Haodong Chen
Chenfei Liao
Litao Guo
Ying-Cong Chen
    LM&Ro
ArXiv (abs)PDFHTML
Main:8 Pages
14 Figures
Bibliography:3 Pages
8 Tables
Appendix:8 Pages
Abstract

Affordance prediction, which identifies interaction regions on objects based on language instructions, is critical for embodied AI. Prevailing end-to-end models couple high-level reasoning and low-level grounding into a single monolithic pipeline and rely on training over annotated datasets, which leads to poor generalization on novel objects and unseen environments. In this paper, we move beyond this paradigm by proposing A4-Agent, a training-free agentic framework that decouples affordance prediction into a three-stage pipeline. Our framework coordinates specialized foundation models at test time: (1) a Dreamer\textbf{Dreamer}Dreamer that employs generative models to visualize how\textit{how}how an interaction would look; (2) a Thinker\textbf{Thinker}Thinker that utilizes large vision-language models to decide what\textit{what}what object part to interact with; and (3) a Spotter\textbf{Spotter}Spotter that orchestrates vision foundation models to precisely locate where\textit{where}where the interaction area is. By leveraging the complementary strengths of pre-trained models without any task-specific fine-tuning, our zero-shot framework significantly outperforms state-of-the-art supervised methods across multiple benchmarks and demonstrates robust generalization to real-world settings.

View on arXiv
Comments on this paper