Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.04893
Cited By
Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels
13 January 2021
Xiaoyi Zhang
Lilian de Greef
Amanda Swearngin
Samuel White
Kyle I. Murray
Lisa Yu
Qi Shan
Jeffrey Nichols
Jason Wu
Chris Fleizach
Aaron Everitt
Jeffrey P. Bigham
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels"
23 / 23 papers shown
Title
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
Yuxuan Liu
Hongda Sun
Wei Liu
Jian Luan
Bo Du
Rui Yan
55
2
0
24 Feb 2025
GUI Agents with Foundation Models: A Comprehensive Survey
Shuai Wang
Wei Liu
Jingxuan Chen
Weinan Gan
Xingshan Zeng
...
Bin Wang
Chuhan Wu
Yasheng Wang
Ruiming Tang
Jianye Hao
LLMAG
73
14
0
07 Nov 2024
MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling
Sidong Feng
Suyu Ma
Han Wang
David Kong
Chunyang Chen
36
9
0
11 May 2024
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You
Haotian Zhang
E. Schoop
Floris Weers
Amanda Swearngin
Jeffrey Nichols
Yinfei Yang
Zhe Gan
MLLM
47
82
0
08 Apr 2024
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Jiwen Zhang
Jihao Wu
Yihua Teng
Minghui Liao
Nuo Xu
Xiao Xiao
Zhongyu Wei
Duyu Tang
LLMAG
LM&Ro
34
50
0
05 Mar 2024
AI Assistance for UX: A Literature Review Through Human-Centered AI
Yuwen Lu
Yuewen Yang
Qinyi Zhao
Chengzhi Zhang
Toby Jia-Jun Li
19
16
0
08 Feb 2024
Intelligent Virtual Assistants with LLM-based Process Automation
Yanchu Guan
Dong Wang
Zhixuan Chu
Shiyu Wang
Feiyue Ni
Ruihua Song
Longfei Li
Jinjie Gu
Chenyi Zhuang
27
20
0
04 Dec 2023
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Zhuosheng Zhang
Yao Yao
Aston Zhang
Xiangru Tang
Xinbei Ma
...
Yiming Wang
Mark B. Gerstein
Rui Wang
Gongshen Liu
Hai Zhao
LLMAG
LM&Ro
LRM
36
53
0
20 Nov 2023
EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning
Liuqing Chen
Yunnong Chen
Shuhong Xiao
Yaxuan Song
Lingyun Sun
Yankun Zhen
Tingting Zhou
Yan-fang Chang
41
4
0
18 Sep 2023
Video2Action: Reducing Human Interactions in Action Annotation of App Tutorial Videos
Sidong Feng
Chunyang Chen
Zhenchang Xing
24
11
0
07 Aug 2023
Android in the Wild: A Large-Scale Dataset for Android Device Control
Christopher Rawles
Alice Li
Daniel Rodriguez
Oriana Riva
Timothy Lillicrap
LM&Ro
28
139
0
19 Jul 2023
MenuCraft: Interactive Menu System Design with Large Language Models
Amir Hossein Kargaran
Nafiseh Nikeghbal
Abbas Heydarnoori
Hinrich Schütze
LLMAG
28
4
0
08 Mar 2023
WebUI: A Dataset for Enhancing Visual UI Understanding with Web Semantics
Jason Wu
Siyan Wang
Siman Shen
Yi-Hao Peng
Jeffrey Nichols
Jeffrey P. Bigham
21
68
0
30 Jan 2023
Screen Correspondence: Mapping Interchangeable Elements between UIs
Jason Wu
Amanda Swearngin
Xiaoyi Zhang
Jeffrey Nichols
Jeffrey P. Bigham
37
7
0
20 Jan 2023
Enabling Conversational Interaction with Mobile UI using Large Language Models
Bryan Wang
Gang Li
Yang Li
175
132
0
18 Sep 2022
UI Layers Merger: Merging UI layers via Visual Learning and Boundary Prior
Yunnong Chen
Yankun Zhen
Chuhao Shi
Jiazhi Li
Liuqing Chen
Z. Li
Lingyun Sun
Tingting Zhou
Yan-fang Chang
31
4
0
18 Jun 2022
Psychologically-Inspired, Unsupervised Inference of Perceptual Groups of GUI Widgets from GUI Images
Mulong Xie
Zhenchang Xing
Sidong Feng
Chunyang Chen
Liming Zhu
Xiwei Xu
24
28
0
15 Jun 2022
Predicting and Explaining Mobile UI Tappability with Vision Modeling and Saliency Analysis
E. Schoop
Xin Zhou
Gang Li
Zhourong Chen
Björn Hartmann
Yang Li
HAI
FAtt
32
32
0
05 Apr 2022
"I Shake The Package To Check If It's Mine": A Study of Package Fetching Practices and Challenges of Blind and Low Vision People in China
Wentao Lei
Mingming Fan
Juliann Thang
19
9
0
06 Feb 2022
Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at Scale
Gang Li
Gilles Baechler
Manuel Tragut
Yang Li
18
49
0
11 Jan 2022
VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling
Yang Li
Gang Li
Xin Zhou
Mostafa Dehghani
A. Gritsenko
MLLM
29
35
0
10 Dec 2021
Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots
Jason Wu
Xiaoyi Zhang
Jeffrey Nichols
Jeffrey P. Bigham
3DV
163
71
0
17 Sep 2021
Multimodal Icon Annotation For Mobile Applications
Xiaoxue Zang
Ying Xu
Jindong Chen
14
19
0
09 Jul 2021
1