Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.16048
Cited By
GUIDE: Graphical User Interface Data for Execution
9 April 2024
Rajat Chawla
Adarsh Jha
Muskaan Kumar
NS Mukunda
Ishaan Bhola
LLMAG
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"GUIDE: Graphical User Interface Data for Execution"
14 / 14 papers shown
Title
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Zhangheng Li
Keen You
Hao Zhang
Di Feng
Harsh Agrawal
Xiujun Li
Mohana Prasad Sathya Moorthy
Jeff Nichols
Yue Yang
Zhe Gan
MLLM
110
21
0
24 Oct 2024
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
Chenyu Wang
Weixin Luo
Qianyu Chen
Haonan Mai
Jindi Guo
Sixun Dong
Xiaohua Xuan
MLLM
LLMAG
107
20
0
19 Jan 2024
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu
Y. Xu
Yuante Li
W. Li
Zhe Chen
Zhuowen Tu
MLLM
VLM
82
133
0
19 Aug 2023
DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc
Emad Elwany
Allison Hegel
Marina Shah
Brendan Roof
Genevieve Peaslee
Quentin Rivet
32
2
0
17 Aug 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
67
551
0
03 Jun 2022
Robotic Process Automation -- A Systematic Literature Review and Assessment Framework
Judith Wewerka
M. Reichert
34
26
0
22 Dec 2020
TAPAS: Weakly Supervised Table Parsing via Pre-training
Jonathan Herzig
Pawel Krzysztof Nowak
Thomas Müller
Francesco Piccinno
Julian Martin Eisenschlos
LMTD
RALM
107
655
0
05 Apr 2020
TextCaps: a Dataset for Image Captioning with Reading Comprehension
Oleksii Sidorov
Ronghang Hu
Marcus Rohrbach
Amanpreet Singh
87
418
0
24 Mar 2020
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
247
2,488
0
20 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
674
24,528
0
26 Jul 2019
VizWiz Grand Challenge: Answering Visual Questions from Blind People
Danna Gurari
Qing Li
Abigale Stangl
Anhong Guo
Chi Lin
Kristen Grauman
Jiebo Luo
Jeffrey P. Bigham
CoGe
106
858
0
22 Feb 2018
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
311
2,386
0
20 Dec 2016
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
214
5,497
0
03 May 2015
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
422
43,777
0
01 May 2014
1