ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.00820
59
12

AutoGLM: Autonomous Foundation Agents for GUIs

28 October 2024
Xiao Liu
Bo Qin
Dongzhu Liang
Guang Dong
Hanyu Lai
Hanchen Zhang
Hanlin Zhao
Iat Long Iong
Jiadai Sun
Jiaqi Wang
Junjie Gao
Junjun Shan
Kangning Liu
Shudan Zhang
Shuntian Yao
Siyi Cheng
Wentao Yao
Wenyi Zhao
Xinghan Liu
Xinyi Liu
Xinying Chen
X. J. Yang
Yang Yang
Yifan Xu
Yu Yang
Yujia Wang
Yongjun Xu
Zehan Qi
Yuxiao Dong
Jie Tang
    LLMAG
ArXivPDFHTML
Abstract

We present AutoGLM, a new series in the ChatGLM family, designed to serve as foundation agents for autonomous control of digital devices through Graphical User Interfaces (GUIs). While foundation models excel at acquiring human knowledge, they often struggle with decision-making in dynamic real-world environments, limiting their progress toward artificial general intelligence. This limitation underscores the importance of developing foundation agents capable of learning through autonomous environmental interactions by reinforcing existing models. Focusing on Web Browser and Phone as representative GUI scenarios, we have developed AutoGLM as a practical foundation agent system for real-world GUI interactions. Our approach integrates a comprehensive suite of techniques and infrastructures to create deployable agent systems suitable for user delivery. Through this development, we have derived two key insights: First, the design of an appropriate "intermediate interface" for GUI control is crucial, enabling the separation of planning and grounding behaviors, which require distinct optimization for flexibility and accuracy respectively. Second, we have developed a novel progressive training framework that enables self-evolving online curriculum reinforcement learning for AutoGLM. Our evaluations demonstrate AutoGLM's effectiveness across multiple domains. For web browsing, AutoGLM achieves a 55.2% success rate on VAB-WebArena-Lite (improving to 59.1% with a second attempt) and 96.2% on OpenTable evaluation tasks. In Android device control, AutoGLM attains a 36.2% success rate on AndroidLab (VAB-Mobile) and 89.7% on common tasks in popular Chinese APPs.

View on arXiv
Comments on this paper