Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.08913
Cited By
Memorizing Transformers
16 March 2022
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Memorizing Transformers"
50 / 140 papers shown
Title
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
Luiza Amador Pozzobon
B. Ermiş
Patrick Lewis
Sara Hooker
30
20
0
11 Oct 2023
How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances
Zihan Zhang
Meng Fang
Lingxi Chen
Mohammad-Reza Namazi-Rad
Jun Wang
KELM
24
21
0
11 Oct 2023
CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving
Yuhan Liu
Hanchen Li
Yihua Cheng
Siddhant Ray
Yuyang Huang
...
Ganesh Ananthanarayanan
Michael Maire
Henry Hoffmann
Ari Holtzman
Junchen Jiang
50
41
0
11 Oct 2023
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
Howard Chen
Ramakanth Pasunuru
Jason Weston
Asli Celikyilmaz
RALM
68
72
0
08 Oct 2023
Scaling Laws for Associative Memories
Vivien A. Cabannes
Elvis Dohmatob
A. Bietti
24
19
0
04 Oct 2023
Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture
Sangjun Park
Jinyeong Bak
CLL
21
5
0
04 Oct 2023
Resolving Knowledge Conflicts in Large Language Models
Yike Wang
Shangbin Feng
Heng Wang
Weijia Shi
Vidhisha Balachandran
Tianxing He
Yulia Tsvetkov
53
12
0
02 Oct 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
28
15
0
28 Sep 2023
Attention Sorting Combats Recency Bias In Long Context Language Models
A. Peysakhovich
Adam Lerer
LRM
RALM
34
42
0
28 Sep 2023
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Yukang Chen
Shengju Qian
Haotian Tang
Xin Lai
Zhijian Liu
Song Han
Jiaya Jia
42
152
0
21 Sep 2023
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Dawei Zhu
Nan Yang
Liang Wang
Yifan Song
Wenhao Wu
Furu Wei
Sujian Li
73
78
0
19 Sep 2023
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models
Chi Han
Qifan Wang
Hao Peng
Wenhan Xiong
Yu Chen
Heng Ji
Sinong Wang
47
49
0
30 Aug 2023
MEMORY-VQ: Compression for Tractable Internet-Scale Memory
Yury Zemlyanskiy
Michiel de Jong
Luke Vilnis
Santiago Ontañón
William W. Cohen
Sumit Sanghai
Joshua Ainslie
RALM
MQ
35
0
0
28 Aug 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Yushi Bai
Xin Lv
Jiajie Zhang
Hong Lyu
Jiankai Tang
...
Aohan Zeng
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
LLMAG
RALM
31
496
0
28 Aug 2023
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
55
19
0
23 Aug 2023
WeaverBird: Empowering Financial Decision-Making with Large Language Model, Knowledge Base, and Search Engine
Siqiao Xue
Fan Zhou
Y. Xu
Ming Jin
Qingsong Wen
...
Jun Zhou
Shuo Xie
D. Xiu
James Y. Zhang
Hongyuan Mei
RALM
AIFin
31
15
0
10 Aug 2023
In-context Autoencoder for Context Compression in a Large Language Model
Tao Ge
Jing Hu
Lei Wang
Xun Wang
Si-Qing Chen
Furu Wei
RALM
32
66
0
13 Jul 2023
Focused Transformer: Contrastive Training for Context Scaling
Szymon Tworkowski
Konrad Staniszewski
Mikolaj Pacek
Yuhuai Wu
Henryk Michalewski
Piotr Milo's
29
136
0
06 Jul 2023
RecallM: An Adaptable Memory Mechanism with Temporal Understanding for Large Language Models
Brandon Kynoch
Hugo Latapie
Dwane van der Sluis
CLL
LLMAG
KELM
25
2
0
06 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
41
151
0
05 Jul 2023
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Kaiyu Yang
Aidan M. Swope
Alex Gu
Rahul Chalamala
Peiyang Song
Shixing Yu
Saad Godil
R. Prenger
Anima Anandkumar
RALM
19
208
0
27 Jun 2023
Extending Context Window of Large Language Models via Positional Interpolation
Shouyuan Chen
Sherman Wong
Liangjian Chen
Yuandong Tian
12
494
0
27 Jun 2023
Long-range Language Modeling with Self-retrieval
Ohad Rubin
Jonathan Berant
RALM
KELM
19
18
0
23 Jun 2023
GLIMMER: generalized late-interaction memory reranker
Michiel de Jong
Yury Zemlyanskiy
Nicholas FitzGerald
Sumit Sanghai
William W. Cohen
Joshua Ainslie
RALM
46
4
0
17 Jun 2023
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
C. Pal
Pierre-Luc Bacon
Ross Goroshin
34
17
0
15 Jun 2023
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Junting Pan
Ziyi Lin
Yuying Ge
Xiatian Zhu
Renrui Zhang
Yi Wang
Yu Qiao
Hongsheng Li
MLLM
24
26
0
15 Jun 2023
Recurrent Action Transformer with Memory
A. Staroverov
A. Bessonov
Dmitry A. Yudin
A. Kovalev
Aleksandr I. Panov
OffRL
33
4
0
15 Jun 2023
Retrieval-Enhanced Contrastive Vision-Text Models
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIP
VLM
31
26
0
12 Jun 2023
ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory
Chenxu Hu
Jie Fu
Chenzhuang Du
Simian Luo
J. Zhao
Hang Zhao
KELM
LLMAG
27
105
0
06 Jun 2023
Exposing Attention Glitches with Flip-Flop Language Modeling
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
LRM
27
46
0
01 Jun 2023
Landmark Attention: Random-Access Infinite Context Length for Transformers
Amirkeivan Mohtashami
Martin Jaggi
LLMAG
19
149
0
25 May 2023
Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models
Ehsan Doostmohammadi
Tobias Norlund
Marco Kuhlmann
Richard Johansson
RALM
25
7
0
25 May 2023
AWESOME: GPU Memory-constrained Long Document Summarization using Memory Mechanism and Global Salient Content
Shuyang Cao
Lu Wang
27
5
0
24 May 2023
KNN-LM Does Not Improve Open-ended Text Generation
Shufan Wang
Yixiao Song
Andrew Drozdov
Aparna Garimella
Varun Manjunatha
Mohit Iyyer
RALM
25
8
0
24 May 2023
RET-LLM: Towards a General Read-Write Memory for Large Language Models
Ali Modarressi
Ayyoob Imani
Mohsen Fayyaz
Hinrich Schütze
KELM
LLMAG
11
33
0
23 May 2023
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
Zhanpeng Zeng
Cole Hawkins
Min-Fong Hong
Aston Zhang
Nikolaos Pappas
Vikas Singh
Shuai Zheng
21
6
0
07 May 2023
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Amanda Bertsch
Uri Alon
Graham Neubig
Matthew R. Gormley
RALM
96
122
0
02 May 2023
Analogy-Forming Transformers for Few-Shot 3D Parsing
N. Gkanatsios
M. Singh
Zhaoyuan Fang
Shubham Tulsiani
Katerina Fragkiadaki
3DPC
3DV
21
2
0
27 Apr 2023
Scaling Transformer to 1M tokens and beyond with RMT
Aydar Bulatov
Yuri Kuratov
Yermek Kapushev
Mikhail Burtsev
LRM
25
87
0
19 Apr 2023
Learning to Compress Prompts with Gist Tokens
Jesse Mu
Xiang Lisa Li
Noah D. Goodman
VLM
44
206
0
17 Apr 2023
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
Ahmet Iscen
Alireza Fathi
Cordelia Schmid
VLM
3DV
33
25
0
11 Apr 2023
Editable User Profiles for Controllable Text Recommendation
Sheshera Mysore
Mahmood Jasim
Andrew McCallum
Hamed Zamani
17
16
0
09 Apr 2023
MoViT: Memorizing Vision Transformers for Medical Image Analysis
Yiqing Shen
Pengfei Guo
Jinpu Wu
Qi Huang
Nhat Le
Jinyuan Zhou
Shanshan Jiang
Mathias Unberath
ViT
MedIm
31
10
0
27 Mar 2023
Magnushammer: A Transformer-Based Approach to Premise Selection
Maciej Mikuła
Szymon Tworkowski
Szymon Antoniak
Bartosz Piotrowski
Albert Qiaochu Jiang
Jinyi Zhou
Christian Szegedy
Lukasz Kuciñski
Piotr Milo's
Yuhuai Wu
47
42
0
08 Mar 2023
Semiparametric Language Models Are Scalable Continual Learners
Guangyue Peng
Tao Ge
Si-Qing Chen
Furu Wei
Houfeng Wang
KELM
44
10
0
02 Mar 2023
ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics
Zhangir Azerbayev
Bartosz Piotrowski
Hailey Schoelkopf
Edward W. Ayers
Dragomir R. Radev
J. Avigad
AIMat
11
67
0
24 Feb 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu
Elliot L. Epstein
Eric N. D. Nguyen
A. Thomas
Michael Zhang
Tri Dao
Atri Rudra
Christopher Ré
16
52
0
13 Feb 2023
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies
Hongyu Hè
Marko Kabić
25
2
0
13 Feb 2023
One Transformer for All Time Series: Representing and Training with Time-Dependent Heterogeneous Tabular Data
Simone Luetto
Fabrizio Garuti
E. Sangineto
L. Forni
Rita Cucchiara
LMTD
AI4TS
87
10
0
13 Feb 2023
The Power of External Memory in Increasing Predictive Model Capacity
Cenk Baykal
D. Cutler
Nishanth Dikkala
Nikhil Ghosh
Rina Panigrahy
Xin Wang
KELM
18
0
0
31 Jan 2023
Previous
1
2
3
Next