ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.02972
  4. Cited By
Blockwise Self-Attention for Long Document Understanding

Blockwise Self-Attention for Long Document Understanding

7 November 2019
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
ArXivPDFHTML

Papers citing "Blockwise Self-Attention for Long Document Understanding"

50 / 75 papers shown
Title
Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation
Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation
Rongzhao He
Weihao Zheng
Leilei Zhao
Ying Wang
Dalin Zhu
Dan Wu
Bin Hu
Mamba
102
0
0
21 Feb 2025
Rethinking Transformer for Long Contextual Histopathology Whole Slide
  Image Analysis
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Pingyi Chen
Zhongyi Shui
Chenglu Zhu
Lin Yang
MedIm
62
5
0
18 Oct 2024
ELASTIC: Efficient Linear Attention for Sequential Interest Compression
Jiaxin Deng
Shiyao Wang
Song Lu
Yinfeng Li
Xinchen Luo
Yuanjun Liu
Peixing Xu
Guorui Zhou
52
0
0
18 Aug 2024
Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding
Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding
Lei Huang
Jiaming Guo
Guanhua He
Xishan Zhang
Rui Zhang
Shaohui Peng
Shaoli Liu
Tianshi Chen
41
2
0
16 Aug 2024
DeepGate3: Towards Scalable Circuit Representation Learning
DeepGate3: Towards Scalable Circuit Representation Learning
Zhengyuan Shi
Ziyang Zheng
Sadaf Khan
Jianyuan Zhong
Min Li
Qiang Xu
GNN
AI4CE
63
9
0
15 Jul 2024
Let the Code LLM Edit Itself When You Edit the Code
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Zongzhang Zhang
Di He
KELM
50
1
0
03 Jul 2024
Short-Long Convolutions Help Hardware-Efficient Linear Attention to
  Focus on Long Sequences
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
43
8
0
12 Jun 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
66
6
0
28 Feb 2024
Multimodal Transformer With a Low-Computational-Cost Guarantee
Multimodal Transformer With a Low-Computational-Cost Guarantee
Sungjin Park
Edward Choi
54
1
0
23 Feb 2024
Zebra: Extending Context Window with Layerwise Grouped Local-Global
  Attention
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Kaiqiang Song
Xiaoyang Wang
Sangwoo Cho
Xiaoman Pan
Dong Yu
47
7
0
14 Dec 2023
SCCA: Shifted Cross Chunk Attention for long contextual semantic
  expansion
SCCA: Shifted Cross Chunk Attention for long contextual semantic expansion
Yuxiang Guo
27
0
0
12 Dec 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for
  Histopathology Whole Slide Image Analysis
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
58
4
0
21 Nov 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
41
15
0
28 Sep 2023
Associative Transformer
Associative Transformer
Yuwei Sun
H. Ochiai
Zhirong Wu
Stephen Lin
Ryota Kanai
ViT
72
0
0
22 Sep 2023
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Yukang Chen
Shengju Qian
Haotian Tang
Xin Lai
Zhijian Liu
Song Han
Jiaya Jia
64
155
0
21 Sep 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action
  Recognition
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
55
19
0
13 Jul 2023
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Ziwei He
Meng Yang
Minwei Feng
Jingcheng Yin
Xiang Wang
Jingwen Leng
Zhouhan Lin
ViT
51
13
0
24 May 2023
CageViT: Convolutional Activation Guided Efficient Vision Transformer
CageViT: Convolutional Activation Guided Efficient Vision Transformer
Hao Zheng
Jinbao Wang
Xiantong Zhen
Hao Chen
Jingkuan Song
Feng Zheng
ViT
37
0
0
17 May 2023
A Survey on Long Text Modeling with Transformers
A Survey on Long Text Modeling with Transformers
Zican Dong
Tianyi Tang
Lunyi Li
Wayne Xin Zhao
VLM
46
55
0
28 Feb 2023
Transformer-based Models for Long-Form Document Matching: Challenges and
  Empirical Analysis
Transformer-based Models for Long-Form Document Matching: Challenges and Empirical Analysis
Akshita Jha
Adithya Samavedhi
Vineeth Rakesh
J. Chandrashekar
Chandan K. Reddy
32
0
0
07 Feb 2023
Efficient Long Sequence Modeling via State Space Augmented Transformer
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
140
36
0
15 Dec 2022
Full Contextual Attention for Multi-resolution Transformers in Semantic
  Segmentation
Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation
Loic Themyr
Clément Rambour
Nicolas Thome
Toby Collins
Alexandre Hostettler
ViT
32
10
0
15 Dec 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative
  Latent Attention
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
29
9
0
21 Nov 2022
SeDR: Segment Representation Learning for Long Documents Dense Retrieval
SeDR: Segment Representation Learning for Long Documents Dense Retrieval
Junying Chen
Qingcai Chen
Dongfang Li
Yutao Huang
33
6
0
20 Nov 2022
Language models are good pathologists: using attention-based sequence
  reduction and text-pretrained transformers for efficient WSI classification
Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
Juan Pisula
Katarzyna Bozek
VLM
MedIm
50
3
0
14 Nov 2022
How Long Is Enough? Exploring the Optimal Intervals of Long-Range
  Clinical Note Language Modeling
How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling
Samuel Cahyawijaya
Bryan Wilie
Holy Lovenia
Huang Zhong
Mingqian Zhong
Yuk-Yu Nancy Ip
Pascale Fung
LM&MA
38
2
0
25 Oct 2022
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Haoran You
Zhanyi Sun
Huihong Shi
Zhongzhi Yu
Yang Zhao
Yongan Zhang
Chaojian Li
Baopu Li
Yingyan Lin
ViT
30
81
0
18 Oct 2022
Bird-Eye Transformers for Text Generation Models
Bird-Eye Transformers for Text Generation Models
Lei Sha
Yuhang Song
Yordan Yordanov
Tommaso Salvatori
Thomas Lukasiewicz
35
0
0
08 Oct 2022
Sparse Attention Acceleration with Synergistic In-Memory Pruning and
  On-Chip Recomputation
Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation
Amir Yazdanbakhsh
Ashkan Moradifirouzabadi
Zheng Li
Mingu Kang
40
32
0
01 Sep 2022
Deep is a Luxury We Don't Have
Deep is a Luxury We Don't Have
Ahmed Taha
Yen Nhi Truong Vu
Brent Mombourquette
Thomas P. Matthews
Jason Su
Sadanand Singh
ViT
MedIm
31
2
0
11 Aug 2022
Momentum Transformer: Closing the Performance Gap Between Self-attention
  and Its Linearization
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
70
9
0
01 Aug 2022
Neural Architecture Search on Efficient Transformers and Beyond
Neural Architecture Search on Efficient Transformers and Beyond
Zexiang Liu
Dong Li
Kaiyue Lu
Zhen Qin
Weixuan Sun
Jiacheng Xu
Yiran Zhong
35
19
0
28 Jul 2022
Test2Vec: An Execution Trace Embedding for Test Case Prioritization
Test2Vec: An Execution Trace Embedding for Test Case Prioritization
E. Jabbar
Soheila Zangeneh
Hadi Hemmati
R. Feldt
61
5
0
28 Jun 2022
Long Range Language Modeling via Gated State Spaces
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
44
232
0
27 Jun 2022
Separable Self-attention for Mobile Vision Transformers
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViT
MQ
39
253
0
06 Jun 2022
Leveraging Locality in Abstractive Text Summarization
Leveraging Locality in Abstractive Text Summarization
Yixin Liu
Ansong Ni
Linyong Nan
Budhaditya Deb
Chenguang Zhu
Ahmed Hassan Awadallah
Dragomir R. Radev
43
18
0
25 May 2022
ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for
  Efficient Feature Matching
ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching
Yanxing Shi
Junxiong Cai
Yoli Shavit
Tai-Jiang Mu
Wensen Feng
Kai Zhang
GNN
32
77
0
25 Apr 2022
Revisiting Transformer-based Models for Long Document Classification
Revisiting Transformer-based Models for Long Document Classification
Xiang Dai
Ilias Chalkidis
S. Darkner
Desmond Elliott
VLM
30
68
0
14 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
34
7
0
11 Apr 2022
Accelerating Attention through Gradient-Based Learned Runtime Pruning
Accelerating Attention through Gradient-Based Learned Runtime Pruning
Zheng Li
Soroush Ghodrati
Amir Yazdanbakhsh
H. Esmaeilzadeh
Mingu Kang
32
17
0
07 Apr 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
  Selection
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
26
14
0
27 Mar 2022
Diagonal State Spaces are as Effective as Structured State Spaces
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta
Albert Gu
Jonathan Berant
64
293
0
27 Mar 2022
Linearizing Transformer with Key-Value Memory
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
38
5
0
23 Mar 2022
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through
  Regularized Self-Attention
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention
Yang Liu
Jiaxiang Liu
L. Chen
Yuxiang Lu
Shi Feng
Zhida Feng
Yu Sun
Hao Tian
Huancheng Wu
Hai-feng Wang
36
9
0
23 Mar 2022
BOAT: Bilateral Local Attention Vision Transformer
BOAT: Bilateral Local Attention Vision Transformer
Tan Yu
Gangming Zhao
Ping Li
Yizhou Yu
ViT
38
27
0
31 Jan 2022
Fast Monte-Carlo Approximation of the Attention Mechanism
Fast Monte-Carlo Approximation of the Attention Mechanism
Hyunjun Kim
Jeonggil Ko
27
2
0
30 Jan 2022
FEDformer: Frequency Enhanced Decomposed Transformer for Long-term
  Series Forecasting
FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting
Tian Zhou
Ziqing Ma
Qingsong Wen
Xue Wang
Liang Sun
Rong Jin
AI4TS
35
1,329
0
30 Jan 2022
Efficient Visual Tracking with Exemplar Transformers
Efficient Visual Tracking with Exemplar Transformers
Philippe Blatter
Menelaos Kanakis
Martin Danelljan
Luc Van Gool
ViT
30
80
0
17 Dec 2021
Self-attention Does Not Need $O(n^2)$ Memory
Self-attention Does Not Need O(n2)O(n^2)O(n2) Memory
M. Rabe
Charles Staats
LRM
31
144
0
10 Dec 2021
Couplformer:Rethinking Vision Transformer with Coupling Attention Map
Couplformer:Rethinking Vision Transformer with Coupling Attention Map
Hai Lan
Xihao Wang
Xian Wei
ViT
39
3
0
10 Dec 2021
12
Next