Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.04241
Cited By
v1
v2 (latest)
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
7 May 2023
Zhanpeng Zeng
Cole Hawkins
Min-Fong Hong
Aston Zhang
Nikolaos Pappas
Vikas Singh
Shuai Zheng
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens"
25 / 25 papers shown
Title
Efficient Transformers with Dynamic Token Pooling
Piotr Nawrot
J. Chorowski
Adrian Lañcucki
Edoardo Ponti
81
46
0
17 Nov 2022
Memorizing Transformers
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
106
178
0
16 Mar 2022
SCROLLS: Standardized CompaRison Over Long Language Sequences
Uri Shaham
Elad Segal
Maor Ivgi
Avia Efrat
Ori Yoran
...
Ankit Gupta
Wenhan Xiong
Mor Geva
Jonathan Berant
Omer Levy
RALM
109
139
0
10 Jan 2022
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Mandy Guo
Joshua Ainslie
David C. Uthus
Santiago Ontanon
Jianmo Ni
Yun-hsuan Sung
Yinfei Yang
VLM
81
316
0
15 Dec 2021
You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
Zhanpeng Zeng
Yunyang Xiong
Sathya Ravi
Shailesh Acharya
G. Fung
Vikas Singh
72
19
0
18 Nov 2021
QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization
Ming Zhong
Da Yin
Tao Yu
A. Zaidi
Mutethia Mutuma
...
Ahmed Hassan Awadallah
Asli Celikyilmaz
Yang Liu
Xipeng Qiu
Dragomir R. Radev
RALM
91
339
0
13 Apr 2021
Efficient Attentions for Long Document Summarization
L. Huang
Shuyang Cao
Nikolaus Nova Parulian
Heng Ji
Lu Wang
133
289
0
05 Apr 2021
MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization
Chenguang Zhu
Yang Liu
Jie Mei
Michael Zeng
85
137
0
11 Mar 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
478
2,123
0
31 Dec 2020
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Sixiao Zheng
Jiachen Lu
Hengshuang Zhao
Xiatian Zhu
Zekun Luo
...
Yanwei Fu
Jianfeng Feng
Tao Xiang
Philip Torr
Li Zhang
ViT
198
2,912
0
31 Dec 2020
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
163
730
0
08 Nov 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
186
1,604
0
30 Sep 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
582
2,105
0
28 Jul 2020
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai
Guokun Lai
Yiming Yang
Quoc V. Le
98
236
0
05 Jun 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
456
13,153
0
26 May 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
187
4,105
0
10 Apr 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
703
24,572
0
26 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
236
8,455
0
19 Jun 2019
Analysing Mathematical Reasoning Abilities of Neural Models
D. Saxton
Edward Grefenstette
Felix Hill
Pushmeet Kohli
LRM
212
431
0
02 Apr 2019
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
217
2,703
0
25 Sep 2018
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
Arman Cohan
Franck Dernoncourt
Doo Soon Kim
Trung Bui
Seokhwan Kim
W. Chang
Nazli Goharian
493
763
0
16 Apr 2018
The NarrativeQA Reading Comprehension Challenge
Tomás Kociský
Jonathan Richard Schwarz
Phil Blunsom
Chris Dyer
Karl Moritz Hermann
Gábor Melis
Edward Grefenstette
142
787
0
19 Dec 2017
Constructing Datasets for Multi-hop Reading Comprehension Across Documents
Johannes Welbl
Pontus Stenetorp
Sebastian Riedel
SyDa
RALM
116
514
0
17 Oct 2017
Get To The Point: Summarization with Pointer-Generator Networks
A. See
Peter J. Liu
Christopher D. Manning
3DPC
313
4,031
0
14 Apr 2017
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
437
10,548
0
21 Jul 2016
1