UFO2: The Desktop AgentOS
- LLMAG

Abstract
Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows through natural language. However, most existing CUAs remain conceptual prototypes, hindered by shallow OS integration, fragile screenshot-based interaction, and disruptive execution.
View on arXiv@article{zhang2025_2504.14603, title={ UFO2: The Desktop AgentOS }, author={ Chaoyun Zhang and He Huang and Chiming Ni and Jian Mu and Si Qin and Shilin He and Lu Wang and Fangkai Yang and Pu Zhao and Chao Du and Liqun Li and Yu Kang and Zhao Jiang and Suzhen Zheng and Rujia Wang and Jiaxu Qian and Minghua Ma and Jian-Guang Lou and Qingwei Lin and Saravan Rajmohan and Dongmei Zhang }, journal={arXiv preprint arXiv:2504.14603}, year={ 2025 } }
Comments on this paper