ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.13919
  4. Cited By
WebVoyager: Building an End-to-End Web Agent with Large Multimodal
  Models

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

25 January 2024
Hongliang He
Wenlin Yao
Kaixin Ma
Wenhao Yu
Yong Dai
Hongming Zhang
Zhenzhong Lan
Dong Yu
    LLMAG
ArXivPDFHTML

Papers citing "WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"

30 / 30 papers shown
Title
Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution
Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution
Jiawei Du
Jinlong Wu
Yuzheng Chen
Yucheng Hu
Bing Li
Joey Tianyi Zhou
76
0
0
23 May 2025
lmgame-Bench: How Good are LLMs at Playing Games?
lmgame-Bench: How Good are LLMs at Playing Games?
Lanxiang Hu
Mingjia Huo
Yu Zhang
Haoyang Yu
Eric P. Xing
Ion Stoica
Tajana Rosing
Haojian Jin
Hao Zhang
58
1
0
21 May 2025
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
Ivan Evtimov
Arman Zharmagambetov
Aaron Grattafiori
Chuan Guo
Kamalika Chaudhuri
AAML
47
2
0
22 Apr 2025
TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials
TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials
Bofei Zhang
Zirui Shang
Zhi Gao
Wang Zhang
Rui Xie
Xiaojian Ma
Tao Yuan
Xinxiao Wu
Song-Chun Zhu
Qing Li
LLMAG
62
3
0
17 Apr 2025
An Illusion of Progress? Assessing the Current State of Web Agents
An Illusion of Progress? Assessing the Current State of Web Agents
Tianci Xue
Weijian Qi
Tianneng Shi
Chan Hee Song
Boyu Gou
D. Song
Huan Sun
Yu Su
LLMAG
ELM
Presented at ResearchTrend Connect | LLMAG on 21 May 2025
160
8
1
02 Apr 2025
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
Lutfi Eren Erdogan
Nicholas Lee
Sehoon Kim
Suhong Moon
Hiroki Furuta
Gopala Anumanchipalli
Kemal Kurniawan
Amir Gholami
LLMAG
LM&Ro
AIFin
121
2
0
12 Mar 2025
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
Wenjia Jiang
Yangyang Zhuang
Chenxi Song
Xu Yang
Chi Zhang
Chi Zhang
LLMAG
114
3
0
04 Mar 2025
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Vardaan Pahuja
Yadong Lu
Corby Rosset
Boyu Gou
Arindam Mitra
Spencer Whitehead
Yu Su
Ahmed Awadallah
LLMAG
LM&Ro
Presented at ResearchTrend Connect | LLMAG on 14 Mar 2025
180
5
1
17 Feb 2025
InSTA: Towards Internet-Scale Training For Agents
InSTA: Towards Internet-Scale Training For Agents
Brandon Trabucco
Gunnar Sigurdsson
Robinson Piramuthu
Ruslan Salakhutdinov
ALM
111
2
0
10 Feb 2025
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Zehan Qi
Xiao-Chang Liu
Iat Long Iong
Hanyu Lai
Xingwu Sun
...
Shuntian Yao
Tianjie Zhang
Wei Xu
J. Tang
Yuxiao Dong
118
27
0
28 Jan 2025
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Zhenhailong Wang
Haiyang Xu
Junyang Wang
Xi Zhang
Ming Yan
Junxuan Zhang
Fei Huang
Heng Ji
90
16
0
20 Jan 2025
WebWalker: Benchmarking LLMs in Web Traversal
WebWalker: Benchmarking LLMs in Web Traversal
Jialong Wu
Wenbiao Yin
Yong Jiang
Zhenglin Wang
Zekun Xi
...
Linhai Zhang
Yulan He
Deyu Zhou
Pengjun Xie
Fei Huang
70
11
0
13 Jan 2025
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Han Zhang
Xiaoman Pan
Hongwei Wang
Kaixin Ma
Wenhao Yu
Dong Yu
LLMAG
80
4
0
03 Jan 2025
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Frank F. Xu
Yufan Song
Boxuan Li
Yuxuan Tang
Kritanjali Jain
...
Wayne Chi
Lawrence Jang
Yiqing Xie
Shuyan Zhou
Graham Neubig
LLMAG
154
29
0
18 Dec 2024
The BrowserGym Ecosystem for Web Agent Research
The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier De Chezelles
Maxime Gasse
Alexandre Lacoste
Alexandre Drouin
Massimo Caccia
...
Siva Reddy
Quentin Cappart
Graham Neubig
Ruslan Salakhutdinov
Nicolas Chapados
LLMAG
124
12
0
06 Dec 2024
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Zhangheng Li
Keen You
Hao Zhang
Di Feng
Harsh Agrawal
Xiujun Li
Mohana Prasad Sathya Moorthy
Jeff Nichols
Yue Yang
Zhe Gan
MLLM
78
19
0
24 Oct 2024
Large Language Models Empowered Personalized Web Agents
Large Language Models Empowered Personalized Web Agents
Hongru Cai
Yongqi Li
Wenjie Wang
Fengbin Zhu
Xiaoyu Shen
Wenjie Li
Tat-Seng Chua
LLMAG
94
12
0
22 Oct 2024
Beyond Browsing: API-Based Web Agents
Beyond Browsing: API-Based Web Agents
Yueqi Song
Frank F. Xu
Shuyan Zhou
Graham Neubig
76
16
0
21 Oct 2024
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
Ke Yang
Yao Liu
Sapana Chaudhary
Rasool Fakoor
Pratik Chaudhari
George Karypis
Huzefa Rangwala
LLMAG
LM&Ro
78
20
0
17 Oct 2024
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Richard Fang
Antony Kellermann
Akul Gupta
Qiusi Zhan
Richard Fang
R. Bindu
Daniel Kang
LLMAG
45
32
0
02 Jun 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
85
53
0
23 May 2024
Tur[k]ingBench: A Challenge Benchmark for Web Agents
Tur[k]ingBench: A Challenge Benchmark for Web Agents
Kevin Xu
Yeganeh Kordi
Kate Sanders
Yizhong Wang
Adam Byerly
Kate Sanders
Adam Byerly
Jingyu Zhang
Benjamin Van Durme
Daniel Khashabi
LLMAG
82
6
0
18 Mar 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
147
112
0
08 Feb 2024
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone
  GUI Navigation
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
An Yan
Zhengyuan Yang
Wanrong Zhu
Kevin Qinghong Lin
Linjie Li
...
Yiwu Zhong
Julian McAuley
Jianfeng Gao
Zicheng Liu
Lijuan Wang
LLMAG
LM&Ro
88
105
0
13 Nov 2023
WebArena: A Realistic Web Environment for Building Autonomous Agents
WebArena: A Realistic Web Environment for Building Autonomous Agents
Shuyan Zhou
Frank F. Xu
Hao Zhu
Xuhui Zhou
Robert Lo
...
Tianyue Ou
Yonatan Bisk
Daniel Fried
Uri Alon
Graham Neubig
LLMAG
66
420
0
25 Jul 2023
A Real-World WebAgent with Planning, Long Context Understanding, and
  Program Synthesis
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&Ro
LLMAG
78
208
0
24 Jul 2023
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
91
789
0
24 Aug 2021
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
96
1,939
0
09 Aug 2019
From Recognition to Cognition: Visual Commonsense Reasoning
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
121
873
0
27 Nov 2018
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
279
3,187
0
02 Dec 2016
1