ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.16618
  4. Cited By
StructFormer: Document Structure-based Masked Attention and its Impact
  on Language Model Pre-Training

StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training

25 November 2024
Kaustubh Ponkshe
Venkatapathy Subramanian
Natwar Modani
Ganesh Ramakrishnan
ArXiv (abs)PDFHTML

Papers citing "StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training"

9 / 9 papers shown
Title
HEGEL: Hypergraph Transformer for Long Document Summarization
HEGEL: Hypergraph Transformer for Long Document Summarization
Haopeng Zhang
Xiao Liu
Jiawei Zhang
71
45
0
09 Oct 2022
Should You Mask 15% in Masked Language Modeling?
Should You Mask 15% in Masked Language Modeling?
Alexander Wettig
Tianyu Gao
Zexuan Zhong
Danqi Chen
CVBM
93
166
0
16 Feb 2022
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
582
2,105
0
28 Jul 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALMVLM
187
4,105
0
10 Apr 2020
PubLayNet: largest dataset ever for document layout analysis
PubLayNet: largest dataset ever for document layout analysis
Xu Zhong
Jianbin Tang
Antonio Jimeno Yepes
54
462
0
16 Aug 2019
A Multiscale Visualization of Attention in the Transformer Model
A Multiscale Visualization of Attention in the Transformer Model
Jesse Vig
ViT
81
583
0
12 Jun 2019
What Does BERT Look At? An Analysis of BERT's Attention
What Does BERT Look At? An Analysis of BERT's Attention
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
235
1,607
0
11 Jun 2019
Generating Long Sequences with Sparse Transformers
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
136
1,919
0
23 Apr 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,201
0
20 Apr 2018
1