Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.00676
Cited By
VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups
1 June 2021
Zejiang Shen
Kyle Lo
Lucy Lu Wang
Bailey Kuehl
Daniel S. Weld
Doug Downey
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups"
20 / 20 papers shown
Title
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
Jake Poznanski
Jon Borchardt
Jason Dunkelberger
Regan Huff
Daniel Lin
Aman Rangapur
Christopher Wilhelm
Kyle Lo
Luca Soldaini
97
2
0
25 Feb 2025
Uncovering the New Accessibility Crisis in Scholarly PDFs
Anukriti Kumar
Lucy Lu Wang
18
3
0
03 Oct 2024
READoc: A Unified Benchmark for Realistic Document Structured Extraction
Zichao Li
Aizier Abulaiti
Yaojie Lu
Xuanang Chen
Jia Zheng
Hongyu Lin
Xianpei Han
Le Sun
41
5
0
08 Sep 2024
Position: Measure Dataset Diversity, Don't Just Claim It
Dora Zhao
Jerone T. A. Andrews
Orestis Papakyriakopoulos
Alice Xiang
66
14
0
11 Jul 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
52
2
0
12 Jun 2024
Needle In A Multimodal Haystack
Weiyun Wang
Shuibo Zhang
Yiming Ren
Yuchen Duan
Tiantong Li
...
Ping Luo
Yu Qiao
Jifeng Dai
Wenqi Shao
Wenhai Wang
VLM
59
17
0
11 Jun 2024
Know Your Audience: The benefits and pitfalls of generating plain language summaries beyond the "general" audience
Tal August
Kyle Lo
Noah A. Smith
Katharina Reinecke
37
11
0
08 Mar 2024
CASIMIR: A Corpus of Scientific Articles enhanced with Multiple Author-Integrated Revisions
Léane Jourdan
Florian Boudin
Nicolas Hernandez
Richard Dufour
37
4
0
01 Mar 2024
A Scalable Framework for Table of Contents Extraction from Complex ESG Annual Reports
Xinyu Wang
Lin Gui
Yulan He
LMTD
31
2
0
27 Oct 2023
The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices
Hancheng Cao
Jesse Dodge
Kyle Lo
Daniel A. McFarland
Lucy Lu Wang
AI4CE
32
5
0
04 Oct 2023
appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit
Atsuki Yamaguchi
Terufumi Morishita
24
1
0
02 Oct 2023
Papeos: Augmenting Research Papers with Talk Videos
Tae Soo Kim
Matt Latzke
Jonathan Bragg
Amy X. Zhang
Joseph Chee Chang
22
10
0
29 Aug 2023
Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents
Catherine Chen
Zejiang Shen
Dan Klein
Gabriel Stanovsky
Doug Downey
Kyle Lo
32
2
0
01 Jun 2023
The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Kyle Lo
Joseph Chee Chang
Andrew Head
Jonathan Bragg
Amy X. Zhang
...
Caroline M Wu
Jiangjiang Yang
Angele Zamarron
Marti A. Hearst
Daniel S. Weld
34
19
0
25 Mar 2023
Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections
Srishti Palani
Aakanksha Naik
Doug Downey
Amy X. Zhang
Jonathan Bragg
Joseph Chee Chang
24
34
0
13 Feb 2023
Evaluating TCFD Reporting: A New Application of Zero-Shot Analysis to Climate-Related Financial Disclosures
Alix Auzepy
Elena Tönjes
David Lenz
C. Funk
33
5
0
01 Feb 2023
The Semantic Scholar Open Data Platform
Rodney Michael Kinney
Chloe Anastasiades
Russell Authur
Iz Beltagy
Jonathan Bragg
...
Caroline M Wu
Jiangjiang Yang
Angele Zamarron
Madeleine van Zuylen
Daniel S. Weld
32
93
0
24 Jan 2023
An Inclusive Notion of Text
Ilia Kuznetsov
Iryna Gurevych
35
0
0
10 Nov 2022
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
153
501
0
29 Dec 2020
PySBD: Pragmatic Sentence Boundary Disambiguation
Nipun Sadvilkar
Mark Neumann
58
78
0
19 Oct 2020
1