ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.08913
  4. Cited By
Memorizing Transformers

Memorizing Transformers

16 March 2022
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
    RALM
ArXivPDFHTML

Papers citing "Memorizing Transformers"

50 / 140 papers shown
Title
Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference
Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference
Wei Tao
Bin Zhang
Xiaoyang Qu
Jiguang Wan
Jianzong Wang
39
1
0
30 Mar 2025
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
Sihyun Yu
Meera Hahn
Dan Kondratyuk
Jinwoo Shin
Agrim Gupta
José Lezama
Irfan Essa
David A. Ross
Jonathan Huang
DiffM
VGen
77
0
0
18 Feb 2025
Associative Recurrent Memory Transformer
Associative Recurrent Memory Transformer
Ivan Rodkin
Yuri Kuratov
Aydar Bulatov
Mikhail Burtsev
68
2
0
17 Feb 2025
Memorizing SAM: 3D Medical Segment Anything Model with Memorizing
  Transformer
Memorizing SAM: 3D Medical Segment Anything Model with Memorizing Transformer
Xinyuan Shao
Yiqing Shen
Mathias Unberath
MedIm
82
0
0
18 Dec 2024
Emotional RAG: Enhancing Role-Playing Agents through Emotional Retrieval
Emotional RAG: Enhancing Role-Playing Agents through Emotional Retrieval
Le Huang
Hengzhi Lan
Zijun Sun
Chuan Shi
Ting Bai
148
0
0
30 Oct 2024
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Haotian Tang
Yecheng Wu
Shang Yang
Enze Xie
Junsong Chen
Junyu Chen
Zhuoyang Zhang
Han Cai
Yaojie Lu
Song Han
71
33
0
14 Oct 2024
MELODI: Exploring Memory Compression for Long Contexts
MELODI: Exploring Memory Compression for Long Contexts
Yinpeng Chen
DeLesley Hutchins
Aren Jansen
Andrey Zhmoginov
David Racz
Jesper Andersen
38
2
0
04 Oct 2024
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language
  Models
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models
David Castillo-Bolado
Joseph Davidson
Finlay Gray
Marek Rosa
34
3
0
30 Sep 2024
CSPS: A Communication-Efficient Sequence-Parallelism based Serving
  System for Transformer based Models with Long Prompts
CSPS: A Communication-Efficient Sequence-Parallelism based Serving System for Transformer based Models with Long Prompts
Zeyu Zhang
Haiying Shen
VLM
29
0
0
23 Sep 2024
Towards LifeSpan Cognitive Systems
Towards LifeSpan Cognitive Systems
Yu Wang
Chi Han
Tongtong Wu
Xiaoxin He
Wangchunshu Zhou
...
Zexue He
Wei Wang
Gholamreza Haffari
Heng Ji
Julian McAuley
KELM
CLL
144
1
0
20 Sep 2024
Schrodinger's Memory: Large Language Models
Schrodinger's Memory: Large Language Models
Wei Wang
Qing Li
37
1
0
16 Sep 2024
Introducing Gating and Context into Temporal Action Detection
Introducing Gating and Context into Temporal Action Detection
Aglind Reka
Diana Laura Borza
Dominick Reilly
Michal Balazia
Francois Bremond
23
0
0
06 Sep 2024
QEDCartographer: Automating Formal Verification Using Reward-Free
  Reinforcement Learning
QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning
Alex Sanchez-Stern
Abhishek Varghese
Zhanna Kaufman
Dylan Zhang
Talia Ringer
Yuriy Brun
18
2
0
17 Aug 2024
Towards flexible perception with visual memory
Towards flexible perception with visual memory
Robert Geirhos
P. Jaini
Austin Stone
Sourabh Medapati
Xi Yi
G. Toderici
Abhijit Ogale
Jonathon Shlens
42
1
0
15 Aug 2024
Human-like Episodic Memory for Infinite Context LLMs
Human-like Episodic Memory for Infinite Context LLMs
Z. Fountas
Martin A Benfeghoul
Adnan Oomerjee
Fenia Christopoulou
Gerasimos Lampouras
Haitham Bou-Ammar
Jun Wang
31
18
0
12 Jul 2024
$\text{Memory}^3$: Language Modeling with Explicit Memory
Memory3\text{Memory}^3Memory3: Language Modeling with Explicit Memory
Hongkang Yang
Zehao Lin
Wenjin Wang
Hao Wu
Zhiyu Li
...
Yu Yu
Kai Chen
Feiyu Xiong
Linpeng Tang
Weinan E
50
11
0
01 Jul 2024
BABILong: Testing the Limits of LLMs with Long Context
  Reasoning-in-a-Haystack
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Ivan Rodkin
Dmitry Sorokin
Artyom Sorokin
Mikhail Burtsev
RALM
ALM
LRM
ReLM
ELM
49
59
0
14 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
74
56
0
11 Jun 2024
Memorization in deep learning: A survey
Memorization in deep learning: A survey
Jiaheng Wei
Yanjun Zhang
Leo Yu Zhang
Ming Ding
Chao Chen
Kok-Leong Ong
Jun Zhang
Yang Xiang
47
6
0
06 Jun 2024
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal
  Learning
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Alex Jinpeng Wang
Linjie Li
Yiqi Lin
Min Li
Lijuan Wang
Mike Zheng Shou
VLM
25
3
0
04 Jun 2024
Extended Mind Transformers
Extended Mind Transformers
Phoebe Klett
Thomas Ahle
RALM
21
0
0
04 Jun 2024
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs
Jialiang Xu
Michael Moor
J. Leskovec
32
2
0
29 May 2024
XL3M: A Training-free Framework for LLM Length Extension Based on
  Segment-wise Inference
XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference
Shengnan Wang
Youhui Bai
Lin Zhang
Pingyi Zhou
Shixiong Zhao
Gong Zhang
Sen Wang
Renhai Chen
Hua Xu
Hongwei Sun
31
3
0
28 May 2024
SelfCP: Compressing Over-Limit Prompt via the Frozen Large Language
  Model Itself
SelfCP: Compressing Over-Limit Prompt via the Frozen Large Language Model Itself
Jun Gao
Ziqiang Cao
Wenjie Li
25
4
0
27 May 2024
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
William Brandon
Mayank Mishra
Aniruddha Nrusimha
Rameswar Panda
Jonathan Ragan-Kelley
MQ
44
40
0
21 May 2024
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory
Ali Modarressi
Abdullatif Köksal
Ayyoob Imani
Mohsen Fayyaz
Hinrich Schütze
KELM
109
9
0
17 Apr 2024
Hierarchical Context Merging: Better Long Context Understanding for
  Pre-trained LLMs
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
Woomin Song
Seunghyuk Oh
Sangwoo Mo
Jaehyung Kim
Sukmin Yun
Jung-Woo Ha
Jinwoo Shin
30
14
0
16 Apr 2024
kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually
  Expanding Large Vocabularies
kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies
Zhongrui Gui
Shuyang Sun
Runjia Li
Jianhao Yuan
Zhaochong An
Karsten Roth
Ameya Prabhu
Philip H. S. Torr
VLM
CLL
32
6
0
15 Apr 2024
TransformerFAM: Feedback attention is working memory
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
32
12
0
14 Apr 2024
Leave No Context Behind: Efficient Infinite Context Transformers with
  Infini-attention
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Tsendsuren Munkhdalai
Manaal Faruqui
Siddharth Gopal
LRM
LLMAG
CLL
91
102
0
10 Apr 2024
Streaming Dense Video Captioning
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
41
32
0
01 Apr 2024
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
Cunxiang Wang
Ruoxi Ning
Boqi Pan
Tonghui Wu
Qipeng Guo
...
Guangsheng Bao
Xiangkun Hu
Zheng Zhang
Qian Wang
Yue Zhang
RALM
106
3
0
18 Mar 2024
Online Adaptation of Language Models with a Memory of Amortized Contexts
Online Adaptation of Language Models with a Memory of Amortized Contexts
Jihoon Tack
Jaehyung Kim
Eric Mitchell
Jinwoo Shin
Yee Whye Teh
Jonathan Richard Schwarz
KELM
47
18
0
07 Mar 2024
Reliable, Adaptable, and Attributable Language Models with Retrieval
Reliable, Adaptable, and Attributable Language Models with Retrieval
Akari Asai
Zexuan Zhong
Danqi Chen
Pang Wei Koh
Luke Zettlemoyer
Hanna Hajishirzi
Wen-tau Yih
KELM
RALM
46
53
0
05 Mar 2024
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Penghao Zhao
Hailin Zhang
Qinhan Yu
Zhengren Wang
Yunteng Geng
Fangcheng Fu
Ling Yang
Wentao Zhang
Jie Jiang
Bin Cui
3DV
115
228
0
29 Feb 2024
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs
  Miss
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Dmitry Sorokin
Artyom Sorokin
Mikhail Burtsev
RALM
119
33
0
16 Feb 2024
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Kuang-Huei Lee
Xinyun Chen
Hiroki Furuta
John F. Canny
Ian S. Fischer
RALM
55
29
0
15 Feb 2024
Changes by Butterflies: Farsighted Forecasting with Group Reservoir
  Transformer
Changes by Butterflies: Farsighted Forecasting with Group Reservoir Transformer
Md Kowsher
Abdul Rafae Khan
Jia Xu
29
0
0
14 Feb 2024
Entropy-Regularized Token-Level Policy Optimization for Language Agent
  Reinforcement
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement
Muning Wen
Junwei Liao
Cheng Deng
Jun Wang
Weinan Zhang
Ying Wen
28
1
0
09 Feb 2024
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to
  256K
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K
Tao Yuan
Xuefei Ning
Dong Zhou
Zhijie Yang
Shiyao Li
...
Dahua Lin
Boxun Li
Guohao Dai
Shengen Yan
Yu-Xiang Wang
ALM
36
34
0
06 Feb 2024
UniMem: Towards a Unified View of Long-Context Large Language Models
UniMem: Towards a Unified View of Long-Context Large Language Models
Junjie Fang
Likai Tang
Hongzhe Bi
Yujia Qin
Si Sun
...
Xiaodong Shi
Sen Song
Yankai Lin
Zhiyuan Liu
Maosong Sun
19
3
0
05 Feb 2024
Flexibly Scaling Large Language Models Contexts Through Extensible
  Tokenization
Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization
Ninglu Shao
Shitao Xiao
Zheng Liu
Peitian Zhang
28
4
0
15 Jan 2024
Human-Instruction-Free LLM Self-Alignment with Limited Samples
Human-Instruction-Free LLM Self-Alignment with Limited Samples
Hongyi Guo
Yuanshun Yao
Wei Shen
Jiaheng Wei
Xiaoying Zhang
Zhaoran Wang
Yang Liu
95
20
0
06 Jan 2024
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved
  Pre-Training
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Alex Jinpeng Wang
Linjie Li
K. Lin
Jianfeng Wang
Kevin Lin
Zhengyuan Yang
Lijuan Wang
Mike Zheng Shou
VLM
VGen
29
12
0
01 Jan 2024
Structured Packing in LLM Training Improves Long Context Utilization
Structured Packing in LLM Training Improves Long Context Utilization
Konrad Staniszewski
Szymon Tworkowski
Sebastian Jaszczur
Yu Zhao
Henryk Michalewski
Lukasz Kuciñski
Piotr Milo's
41
13
0
28 Dec 2023
Compressed Context Memory For Online Language Model Interaction
Compressed Context Memory For Online Language Model Interaction
Jang-Hyun Kim
Junyoung Yeom
Sangdoo Yun
Hyun Oh Song
KELM
44
14
1
06 Dec 2023
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long
  Documents
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents
James Enouen
Hootan Nakhost
Sayna Ebrahimi
Sercan Ö. Arik
Yan Liu
Tomas Pfister
33
5
0
03 Dec 2023
Advancing Transformer Architecture in Long-Context Large Language
  Models: A Comprehensive Survey
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAG
KELM
36
54
0
21 Nov 2023
LILO: Learning Interpretable Libraries by Compressing and Documenting
  Code
LILO: Learning Interpretable Libraries by Compressing and Documenting Code
Gabriel Grand
L. Wong
Matthew Bowers
Theo X. Olausson
Muxin Liu
Joshua B. Tenenbaum
Jacob Andreas
21
21
0
30 Oct 2023
Heterogenous Memory Augmented Neural Networks
Heterogenous Memory Augmented Neural Networks
Zihan Qiu
Zhen Liu
Shuicheng Yan
Shanghang Zhang
Jie Fu
15
0
0
17 Oct 2023
123
Next