ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.03353
  4. Cited By
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

7 August 2021
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
ArXivPDFHTML

Papers citing "Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning"

27 / 27 papers shown
Title
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
95
0
0
26 Mar 2025
SpiritSight Agent: Advanced GUI Agent with One Look
SpiritSight Agent: Advanced GUI Agent with One Look
Zhiyuan Huang
Ziming Cheng
Junting Pan
Zhaohui Hou
Mingjie Zhan
LLMAG
101
2
0
05 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei-Ming Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
45
0
0
04 Mar 2025
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
Yuxuan Liu
Hongda Sun
Wei Liu
Jian Luan
Bo Du
Rui Yan
55
2
0
24 Feb 2025
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
Fanhu Zeng
Haiyang Guo
Fei Zhu
Li Shen
Hao Tang
MoMe
54
1
0
24 Feb 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision Team
Leonid Karlinsky
Assaf Arbelle
Abraham Daniels
A. Nassar
...
Sriram Raghavan
T. Syeda-Mahmood
Peter W. J. Staar
Tal Drory
Rogerio Feris
VLM
AI4TS
114
0
0
14 Feb 2025
GUI Agents with Foundation Models: A Comprehensive Survey
GUI Agents with Foundation Models: A Comprehensive Survey
Shuai Wang
W. Liu
Jingxuan Chen
Weinan Gan
Xingshan Zeng
...
Bin Wang
Chuhan Wu
Yasheng Wang
Ruiming Tang
Jianye Hao
LLMAG
68
14
0
07 Nov 2024
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal
  Large Language Models
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu
Zeyang Zhou
Kexin Huang
Dandan Liang
Yixu Wang
...
Keqing Wang
Yujiu Yang
Yan Teng
Yu Qiao
Yingchun Wang
ELM
47
12
0
11 Jun 2024
MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern
  Style UI Modeling
MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling
Sidong Feng
Suyu Ma
Han Wang
David Kong
Chunyang Chen
34
9
0
11 May 2024
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Jiwen Zhang
Jihao Wu
Yihua Teng
Minghui Liao
Nuo Xu
Xiao Xiao
Zhongyu Wei
Duyu Tang
LLMAG
LM&Ro
32
50
0
05 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
67
12
0
05 Mar 2024
AI Assistance for UX: A Literature Review Through Human-Centered AI
AI Assistance for UX: A Literature Review Through Human-Centered AI
Yuwen Lu
Yuewen Yang
Qinyi Zhao
Chengzhi Zhang
Toby Jia-Jun Li
16
16
0
08 Feb 2024
Designing with Language: Wireframing UI Design Intent with Generative
  Large Language Models
Designing with Language: Wireframing UI Design Intent with Generative Large Language Models
Sidong Feng
Mingyue Yuan
Jieshan Chen
Zhenchang Xing
Chunyang Chen
AI4CE
3DV
19
7
0
12 Dec 2023
ECHO: An Automated Contextual Inquiry Framework for Anonymous
  Qualitative Studies using Conversational Assistants
ECHO: An Automated Contextual Inquiry Framework for Anonymous Qualitative Studies using Conversational Assistants
Rishika Dwaraghanath
Rahul Majethia
Sanjana Gautam
15
1
0
10 Dec 2023
BLIP-Adapter: Parameter-Efficient Transfer Learning for Mobile
  Screenshot Captioning
BLIP-Adapter: Parameter-Efficient Transfer Learning for Mobile Screenshot Captioning
Ching-Yu Chiang
I-Hua Chang
Shih-Wei Liao
44
1
0
26 Sep 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
51
187
0
29 May 2023
Evaluation of Sketch-Based and Semantic-Based Modalities for Mockup
  Generation
Evaluation of Sketch-Based and Semantic-Based Modalities for Mockup Generation
Tommaso Calò
Luigi De Russis
15
0
0
22 Mar 2023
WebUI: A Dataset for Enhancing Visual UI Understanding with Web
  Semantics
WebUI: A Dataset for Enhancing Visual UI Understanding with Web Semantics
Jason Wu
Siyan Wang
Siman Shen
Yi-Hao Peng
Jeffrey Nichols
Jeffrey P. Bigham
19
68
0
30 Jan 2023
Screen Correspondence: Mapping Interchangeable Elements between UIs
Screen Correspondence: Mapping Interchangeable Elements between UIs
Jason Wu
Amanda Swearngin
Xiaoyi Zhang
Jeffrey Nichols
Jeffrey P. Bigham
33
7
0
20 Jan 2023
Enabling Conversational Interaction with Mobile UI using Large Language
  Models
Enabling Conversational Interaction with Mobile UI using Large Language Models
Bryan Wang
Gang Li
Yang Li
175
132
0
18 Sep 2022
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
Yu-Chung Hsiao
Fedir Zubach
Maria Wang
Jindong Chen
Victor Carbune
Jason Lin
Maria Wang
Yun Zhu
Jindong Chen
RALM
157
25
0
16 Sep 2022
Beyond Text Generation: Supporting Writers with Continuous Automatic
  Text Summaries
Beyond Text Generation: Supporting Writers with Continuous Automatic Text Summaries
Hai Dang
Karim Benharrak
Florian Lehmann
Daniel Buschek
24
82
0
19 Aug 2022
SummaryLens -- A Smartphone App for Exploring Interactive Use of
  Automated Text Summarization in Everyday Life
SummaryLens -- A Smartphone App for Exploring Interactive Use of Automated Text Summarization in Everyday Life
Karim Benharrak
Florian Lehmann
Hai Dang
Daniel Buschek
14
6
0
04 Feb 2022
Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at
  Scale
Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at Scale
Gang Li
Gilles Baechler
Manuel Tragut
Yang Li
16
49
0
11 Jan 2022
VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface
  Modeling
VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling
Yang Li
Gang Li
Xin Zhou
Mostafa Dehghani
A. Gritsenko
MLLM
27
35
0
10 Dec 2021
Creating User Interface Mock-ups from High-Level Text Descriptions with
  Deep-Learning Models
Creating User Interface Mock-ups from High-Level Text Descriptions with Deep-Learning Models
Forrest Huang
Gang Li
Xin Zhou
John F. Canny
Yang Li
DiffM
31
19
0
14 Oct 2021
Screen Recognition: Creating Accessibility Metadata for Mobile
  Applications from Pixels
Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels
Xiaoyi Zhang
Lilian de Greef
Amanda Swearngin
Samuel White
Kyle I. Murray
...
Jeffrey Nichols
Jason Wu
Chris Fleizach
Aaron Everitt
Jeffrey P. Bigham
194
167
0
13 Jan 2021
1