SelfDoc: Self-Supervised Document Representation Learning

7 June 2021

Jiuxiang Gu

Papers citing "SelfDoc: Self-Supervised Document Representation Learning"

38 / 38 papers shown

Title
DocMamba: Efficient Document Pre-training with State Space Model Pengfei Hu Zhenrong Zhang Jiefeng Ma Shuhang Liu Jun Du Jianshu Zhang Mamba 42 1 0 18 Sep 2024
GOPT: Generalizable Online 3D Bin Packing via Transformer-based Deep Reinforcement Learning Heng Xiong Changrong Guo Jian Peng Kai Ding Wenjie Chen Xuchong Qiu Long Bai Jianfeng Xu OffRL 26 4 0 09 Sep 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data Yufan Shen Chuwei Luo Zhaoqing Zhu Yang Chen Qi Zheng Zhi Yu Jiajun Bu Cong Yao 42 2 0 17 Jul 2024
Memorizing Documents with Guidance in Large Language Models Bumjin Park Jaesik Choi KELM RALM 36 1 0 23 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications Jordy Van Landeghem Subhajit Maity Ayan Banerjee Matthew Blaschko Marie-Francine Moens Josep Lladós Sanket Biswas 50 2 0 12 Jun 2024
Improve Academic Query Resolution through BERT-based Question Extraction from Images Nidhi Kamal Saurabh Yadav Jorawar Singh Aditi Avasthi 31 0 0 28 Apr 2024
Noise-Aware Training of Layout-Aware Language Models Ritesh Sarkhel Xiaoqi Ren Lauro Beltrao Costa Guolong Su Vincent Perot Yanan Xie Emmanouil Koukoumidis Arnab Nandi VLM 44 0 0 30 Mar 2024
DocGraphLM: Documental Graph Language Model for Information Extraction Dongsheng Wang Zhiqiang Ma Armineh Nourbakhsh Kang Gu Sameena Shah 36 8 0 05 Jan 2024
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap Daehee Kim Yoon Kim Donghyun Kim Yumin Lim Geewook Kim Taeho Kil 31 3 0 21 Sep 2023
On Evaluation of Document Classification using RVL-CDIP Stefan Larson Gordon Lim Kevin Leach 26 3 0 21 Jun 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models Hiroki Furuta Kuang-Huei Lee Ofir Nachum Yutaka Matsuo Aleksandra Faust S. Gu Izzeddin Gur LM&Ro 36 92 0 19 May 2023
Language Independent Neuro-Symbolic Semantic Parsing for Form Understanding Bhanu Prakash Voutharoja Lizhen Qu Fatemeh Shiri 27 1 0 08 May 2023
ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules Zhi-Qi Cheng Qianwen Dai Siyao Li Jingdong Sun Teruko Mitamura Alexander G. Hauptmann 29 21 0 05 Apr 2023
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild Zhibo Yang Rujiao Long Pengfei Wang Sibo Song Humen Zhong Wenqing Cheng X. Bai Cong Yao 34 19 0 23 Mar 2023
Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review Asim Waqas Aakash Tripathi Ravichandran Ramachandran Paul Stewart Ghulam Rasool AI4CE 37 31 0 11 Mar 2023
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction Jiabang He Lei Wang Yingpeng Hu Ning Liu Hui-juan Liu Xingdong Xu Hengtao Shen MLLM 6 47 0 09 Mar 2023
ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents Sana Khamekhem Jemni Sourour Ammar Mohamed Ali Souibgui Yousri Kessentini A. Cheddad 23 3 0 06 Mar 2023
Unifying Vision, Text, and Layout for Universal Document Processing Zineng Tang Ziyi Yang Guoxin Wang Yuwei Fang Yang Liu Chenguang Zhu Michael Zeng Chao-Yue Zhang Joey Tianyi Zhou VLM 32 105 0 05 Dec 2022
ObjCAViT: Improving Monocular Depth Estimation Using Natural Language Models And Image-Object Cross-Attention Dylan Auty K. Mikolajczyk VLM 22 3 0 30 Nov 2022
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models Lei Wang Jian He Xingdong Xu Ning Liu Hui-juan Liu 36 2 0 27 Nov 2022
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding Wenjin Wang Zhengjie Huang Bin Luo Qianglong Chen Qiming Peng ... Weichong Yin Shi Feng Yu Sun Dianhai Yu Yin Zhang ViT 30 11 0 18 Sep 2022
One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text Abhinav Java Shripad Deshmukh Milan Aggarwal Surgan Jandial Mausoom Sarkar Balaji Krishnamurthy 32 3 0 12 Sep 2022
Knowing Where and What: Unified Word Block Pretraining for Document Understanding Song Tao Zijian Wang Tiantian Fan Canjie Luo Can Huang SSL 38 2 0 28 Jul 2022
Test-Time Adaptation for Visual Document Understanding Sayna Ebrahimi Sercan Ö. Arik Tomas Pfister OOD 33 6 0 15 Jun 2022
Exploiting Temporal Relations on Radar Perception for Autonomous Driving Peizhao Li Puzuo Wang K. Berntorp Hongfu Liu 21 43 0 03 Apr 2022
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding Zhangxuan Gu Changhua Meng Ke Wang Jun Lan Weiqiang Wang Ming Gu Liqing Zhang 34 76 0 14 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer Junlong Li Yiheng Xu Tengchao Lv Lei Cui Chaoxi Zhang Furu Wei ViT VLM 35 159 0 04 Mar 2022
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding Jiapeng Wang Lianwen Jin Kai Ding VLM 30 138 0 28 Feb 2022
OCR-IDL: OCR Annotations for Industry Document Library Dataset Ali Furkan Biten Rubèn Pérez Tito Lluís Gómez Ernest Valveny Dimosthenis Karatzas 25 26 0 25 Feb 2022
DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer Sanket Biswas Ayan Banerjee Josep Lladós Umapada Pal ViT 16 23 0 27 Jan 2022
DocEnTr: An End-to-End Document Image Enhancement Transformer Mohamed Ali Souibgui Sanket Biswas Sana Khamekhem Jemni Yousri Kessentini Alicia Fornés Josep Lladós Umapada Pal ViT 58 45 0 25 Jan 2022
OCR-free Document Understanding Transformer Geewook Kim Teakgyu Hong Moonbin Yim Jeongyeon Nam Jinyoung Park Jinyeong Yim Wonseok Hwang Sangdoo Yun Dongyoon Han Seunghyun Park ViT 50 262 0 30 Nov 2021
Document AI: Benchmarks, Models and Applications Lei Cui Yiheng Xu Tengchao Lv Furu Wei VLM 21 69 0 16 Nov 2021
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding Junlong Li Yiheng Xu Lei Cui Furu Wei VLM 3DGS 28 59 0 16 Oct 2021
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents Teakgyu Hong Donghyun Kim Mingi Ji Wonseok Hwang Daehyun Nam Sungrae Park VLM 34 150 0 10 Aug 2021
VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups Zejiang Shen Kyle Lo Lucy Lu Wang Bailey Kuehl Daniel S. Weld Doug Downey VLM 16 34 0 01 Jun 2021
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents Guillaume Jaume H. K. Ekenel Jean-Philippe Thiran 134 355 0 27 May 2019
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Z. Tu Kaiming He 297 10,220 0 16 Nov 2016