ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.10458
  4. Cited By
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
v1v2v3 (latest)

GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

14 April 2025
Run Luo
Lu Wang
Wanwei He
Xiaobo Xia
    LLMAG
ArXiv (abs)PDFHTML

Papers citing "GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents"

41 / 41 papers shown
Title
Table-R1: Inference-Time Scaling for Table Reasoning
Table-R1: Inference-Time Scaling for Table Reasoning
Zheyuan Yang
Lyuhao Chen
Arman Cohan
Yilun Zhao
LMTDReLMLRM
81
2
0
29 May 2025
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Chenyu Yang
Shiqian Su
Shi-Qi Liu
Xuan Dong
Yue Yu
...
Hao Li
Wenhai Wang
Yu Qiao
Xizhou Zhu
Jifeng Dai
OffRL
125
0
0
29 May 2025
UI-Evol: Automatic Knowledge Evolving for Computer Use Agents
UI-Evol: Automatic Knowledge Evolving for Computer Use Agents
Ziyun Zhang
Xinyi Liu
Xiaoyi Zhang
Jun Wang
Gang Chen
Yan Lu
LLMAG
92
0
0
28 May 2025
Large Language Models for Planning: A Comprehensive and Systematic Survey
Large Language Models for Planning: A Comprehensive and Systematic Survey
Pengfei Cao
Tianyi Men
Wencan Liu
Jingwen Zhang
Xuzhao Li
Xixun Lin
Dianbo Sui
Yanan Cao
Kang Liu
Jun Zhao
LLMAGLM&RoOffRLELMLRM
79
0
0
26 May 2025
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization
Yunxin Li
Xinyu Chen
Zitao Li
Zhenyu Liu
L. Wang
Wenhan Luo
Baotian Hu
Min Zhang
OffRLLRM
117
0
0
25 May 2025
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
Haoyuan Sun
Jiaqi Wu
Bo Xia
Yifu Luo
Yifei Zhao
Kai Qin
Xufei Lv
Tiantian Zhang
Yongzhe Chang
Xueqian Wang
OffRLLRM
196
0
0
24 May 2025
ProgRM: Build Better GUI Agents with Progress Rewards
ProgRM: Build Better GUI Agents with Progress Rewards
Danyang Zhang
Situo Zhang
Ziyue Yang
Zichen Zhu
Zihan Zhao
Ruisheng Cao
Lu Chen
Kai Yu
66
0
0
23 May 2025
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Jiaqi Wang
Kevin Qinghong Lin
James Cheng
Mike Zheng Shou
OffRLReLMLRM
94
0
0
22 May 2025
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
Kaixuan Fan
Kaituo Feng
Haoming Lyu
Dongzhan Zhou
Xiangyu Yue
ReLMLRM
106
0
0
22 May 2025
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
Wei Liu
Siya Qi
Xinyu Wang
Chen Qian
Yali Du
Yulan He
OffRLLRM
74
0
0
21 May 2025
An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents
An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents
Bowen Jin
Jinsung Yoon
Priyanka Kargupta
Sercan O. Arik
Jiawei Han
LRM
109
1
0
21 May 2025
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search
Hyunseok Lee
Jeonghoon Kim
Beomjun Kim
Jihoon Tack
Chansong Jo
Jaehong Lee
Cheonbok Park
Sookyo In
Jinwoo Shin
Kang Min Yoo
90
0
0
21 May 2025
Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models
Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models
Wenhui Zhu
Xuanzhao Dong
Xin Li
Peijie Qiu
Xiwen Chen
Abolfazl Razi
Aris Sotiras
Yi Su
Yalin Wang
OffRLLM&MA
95
0
0
20 May 2025
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning
Longxi Gao
Li Zhang
Mengwei Xu
61
1
0
18 May 2025
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Xinbin Yuan
Jian Zhang
K. Li
Zhuoxuan Cai
Lujian Yao
...
Enguang Wang
Qibin Hou
Jinwei Chen
Peng-Tao Jiang
Bo Li
98
1
0
18 May 2025
Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale
Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale
Jiale Liu
Yifan Zeng
Shaokun Zhang
Chi Zhang
Malte Højmark-Bertelsen
Marie Normann Gadeberg
Hongru Wang
Qingyun Wu
77
1
0
06 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRLLRM
166
5
0
30 Apr 2025
A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement Learning
A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement Learning
Jiahao Li
Kaer Huang
LLMAGLM&Ro3DV
130
0
0
29 Apr 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Ziqiang Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLMVLM
242
1
0
28 Apr 2025
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners
Yuhang Liu
Pengxiang Li
C. Xie
Xavier Hu
Xiaotian Han
Shengyu Zhang
Hongxia Yang
Fei Wu
LLMAGLM&RoLRMAI4CE
133
12
0
19 Apr 2025
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use
Kaixin Li
Ziyang Meng
Hongzhan Lin
Ziyang Luo
Yuchen Tian
Jing Ma
Zhiyong Huang
Tat-Seng Chua
86
20
0
04 Apr 2025
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
Zhenyu Pan
Han Liu
OffRLLRM
107
7
0
24 Mar 2025
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Wenxuan Huang
Bohan Jia
Zijie Zhai
Shaosheng Cao
Zheyu Ye
Fei Zhao
Zhe Xu
Yao Hu
Shaohui Lin
MUOffRLLRMMLLMReLMVLM
135
105
0
09 Mar 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu
Zeyi Sun
Yuhang Zang
Xiaoyi Dong
Yuhang Cao
Haodong Duan
Dahua Lin
Jiaqi Wang
ObjDVLMLRM
130
96
0
03 Mar 2025
Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs
Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs
Weixiang Zhao
Yulin Hu
Yang Deng
Jiahe Guo
Xingyu Sui
...
An Zhang
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
143
7
0
28 Feb 2025
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
319
546
0
20 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
373
1,692
0
22 Jan 2025
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Yujia Qin
Yining Ye
Junjie Fang
Han Wang
Shihao Liang
...
Haifeng Liu
F. Lin
Tao Peng
Xin Liu
Guang Shi
LLMAGLM&Ro
98
56
0
21 Jan 2025
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Zhiyong Wu
Zhenyu Wu
Fangzhi Xu
Yian Wang
Qiushi Sun
...
Kanzhi Cheng
Zichen Ding
Lixing Chen
Paul Pu Liang
Yu Qiao
74
72
0
30 Oct 2024
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Boyu Gou
Ruohan Wang
Boyuan Zheng
Yanan Xie
Cheng Chang
Yiheng Shu
Huan Sun
Yu Su
LM&RoLLMAG
180
89
0
07 Oct 2024
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
Yuxiang Chai
Siyuan Huang
Yazhe Niu
Han Xiao
Liang Liu
Dingyu Zhang
Shuai Ren
Hongsheng Li
LLMAG
102
36
0
03 Jul 2024
The FineWeb Datasets: Decanting the Web for the Finest Text Data at
  Scale
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Guilherme Penedo
Hynek Kydlícek
Loubna Ben Allal
Anton Lozhkov
Margaret Mitchell
Colin Raffel
Leandro von Werra
Thomas Wolf
117
248
0
25 Jun 2024
GUICourse: From General Vision Language Models to Versatile GUI Agents
GUICourse: From General Vision Language Models to Versatile GUI Agents
Wentong Chen
Junbo Cui
Jinyi Hu
Yujia Qin
Junjie Fang
...
Yupeng Huo
Yuan Yao
Yankai Lin
Zhiyuan Liu
Maosong Sun
LLMAG
90
41
0
17 Jun 2024
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on
  Mobile Devices
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu
Wenqi Shao
Zitao Liu
Fanqing Meng
Boxuan Li
Botong Chen
Siyuan Huang
Kaipeng Zhang
Yu Qiao
Ping Luo
97
43
0
12 Jun 2024
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Zhangchi Feng
Yongqiang Ma
144
523
0
20 Mar 2024
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist
  Autonomous Agents for Desktop and Web
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor
Y. Butala
M. Russak
Jing Yu Koh
Kiran Kamble
Waseem Alshikh
Ruslan Salakhutdinov
LLMAG
106
56
0
27 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLMLRM
138
1,119
0
05 Feb 2024
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Kanzhi Cheng
Qiushi Sun
Yougang Chu
Fangzhi Xu
Yantao Li
Jianbing Zhang
Zhiyong Wu
LLMAG
237
179
0
17 Jan 2024
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model
  Collaboration
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Qiushi Sun
Zhangyue Yin
Xiang Li
Zhiyong Wu
Xipeng Qiu
Lingpeng Kong
LRMLLMAG
78
47
0
30 Sep 2023
Cognitive Architectures for Language Agents
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas Griffiths
LLMAGLM&Ro
121
171
0
05 Sep 2023
A Survey on Large Language Model based Autonomous Agents
A Survey on Large Language Model based Autonomous Agents
Lei Wang
Chengbang Ma
Xueyang Feng
Zeyu Zhang
Hao-ran Yang
...
Xu Chen
Yankai Lin
Wayne Xin Zhao
Zhewei Wei
Ji-Rong Wen
LLMAGAI4CELM&Ro
89
1,275
0
22 Aug 2023
1