Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.00217
Cited By
WebFormer: The Web-page Transformer for Structure Information Extraction
1 February 2022
Qifan Wang
Yi Fang
Anirudh Ravula
Fuli Feng
Xiaojun Quan
Dongfang Liu
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WebFormer: The Web-page Transformer for Structure Information Extraction"
20 / 20 papers shown
Title
ReaderLM-v2: Small Language Model for HTML to Markdown and JSON
Feng Wang
Zesheng Shi
Bo Wang
Nan Wang
Han Xiao
RALM
81
1
0
03 Mar 2025
From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents
Nalin Tiwary
Vardhan Dongre
Sanil Arun Chawla
Ashwin Lamani
Dilek Hakkani-Tur
LLMAG
23
0
0
31 Oct 2024
A Survey on Retrieval-Augmented Text Generation for Large Language Models
Yizheng Huang
Jimmy X. Huang
3DV
RALM
66
46
0
17 Apr 2024
Coarse-Tuning for Ad-hoc Document Retrieval Using Pre-trained Language Models
Atsushi Keyaki
Ribeka Keyaki
39
0
0
25 Mar 2024
Hypertext Entity Extraction in Webpage
Yifei Yang
Tianqiao Liu
Bo Shao
Hai Zhao
Linjun Shou
Ming Gong
Daxin Jiang
44
0
0
04 Mar 2024
Cleaner Pretraining Corpus Curation with Neural Web Scraping
Zhipeng Xu
Zhenghao Liu
Yukun Yan
Zhiyuan Liu
Ge Yu
Chenyan Xiong
CLIP
OnRL
27
4
0
22 Feb 2024
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lù
Zdeněk Kasner
Siva Reddy
34
59
0
08 Feb 2024
High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models
Songchi Zhou
Sheng Yu
23
0
0
13 Dec 2023
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&Ro
LLMAG
39
198
0
24 Jul 2023
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Hugo Laurenccon
Lucile Saulnier
Léo Tronchon
Stas Bekman
Amanpreet Singh
...
Siddharth Karamcheti
Alexander M. Rush
Douwe Kiela
Matthieu Cord
Victor Sanh
25
230
0
21 Jun 2023
PIVOINE: Instruction Tuning for Open-world Information Extraction
Keming Lu
Xiaoman Pan
Kaiqiang Song
Hongming Zhang
Dong Yu
Jianshu Chen
28
10
0
24 May 2023
Towards Zero-shot Relation Extraction in Web Mining: A Multimodal Approach with Relative XML Path
Zilong Wang
Jingbo Shang
49
0
0
23 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
36
92
0
19 May 2023
FormLM: Recommending Creation Ideas for Online Forms by Modelling Semantic and Structural Information
Yijia Shao
Mengyu Zhou
Yifan Zhong
Tao Wu
Hongwei Han
Shi Han
Gideon Huang
Dongmei Zhang
3DV
17
2
0
10 Nov 2022
Cross-domain Generalization for AMR Parsing
Xuefeng Bai
Sen Yang
Leyang Cui
Linfeng Song
Yue Zhang
49
2
0
22 Oct 2022
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIP
VLM
169
263
0
07 Oct 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding
Jingye Chen
Tengchao Lv
Lei Cui
Changrong Zhang
Furu Wei
50
13
0
06 Oct 2022
The Klarna Product Page Dataset: Web Element Nomination with Graph Neural Networks and Large Language Models
A. Hotti
Riccardo Sven Risuleo
Stefan Magureanu
Aref Moradi
J. Lagergren
27
4
0
03 Nov 2021
Simplified DOM Trees for Transferable Attribute Extraction from the Web
Yichao Zhou
Ying Sheng
N. Vo
Nick Edmonds
Sandeep Tata
124
28
0
07 Jan 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
153
498
0
29 Dec 2020
1