How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities

11 July 2024

Papers citing "How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities"

36 / 36 papers shown

Title
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval Xinyu Wang Linrui Ma Jerry Huang Peng Lu Prasanna Parthasarathi Xiao-Wen Chang Boxing Chen Yufei Cui KELM 101 1 0 28 Mar 2025
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning Bidipta Sarkar Warren Xia C. Karen Liu Dorsa Sadigh LLMAG 76 4 0 09 Feb 2025
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination Jerry Huang Prasanna Parthasarathi Mehdi Rezagholizadeh Boxing Chen Sarath Chandar 112 0 0 22 Oct 2024
Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications Sean Kim Raja Mazumder 42 0 0 23 Sep 2024
Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks Zi Yang 35 2 0 10 Sep 2024
What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices Zhi Chen Qiguang Chen Libo Qin Qipeng Guo Haijun Lv Yicheng Zou Wanxiang Che Hang Yan Kai Chen Dahua Lin SyDa 78 4 0 03 Sep 2024
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Tri Dao Albert Gu Mamba 68 489 0 31 May 2024
The Illusion of State in State-Space Models William Merrill Jackson Petty Ashish Sabharwal 64 54 0 12 Apr 2024
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Aleksandar Botev Soham De Samuel L. Smith Anushan Fernando George-Christian Muraru ... Koray Kavukcuoglu Demis Hassabis R. Hadsell Yee Whye Teh Nando de Frietas VLM RALM 60 29 0 11 Apr 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Soham De Samuel L. Smith Anushan Fernando Aleksandar Botev George-Christian Muraru ... David Budden Yee Whye Teh Razvan Pascanu Nando de Freitas Çağlar Gülçehre Mamba 89 129 0 29 Feb 2024
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks Jongho Park Jaeseung Park Zheyang Xiong Nayoung Lee Jaewoong Cho Samet Oymak Kangwook Lee Dimitris Papailiopoulos 73 72 0 06 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at Copying Samy Jelassi David Brandfonbrener Sham Kakade Eran Malach 118 91 0 01 Feb 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Albert Gu Tri Dao Mamba 118 2,630 0 01 Dec 2023
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning Mengzhou Xia Tianyu Gao Zhiyuan Zeng Danqi Chen 98 295 0 10 Oct 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding Yushi Bai Xin Lv Jiajie Zhang Hong Lyu Jiankai Tang ... Aohan Zeng Lei Hou Yuxiao Dong Jie Tang Juanzi Li LLMAG RALM 69 581 0 28 Aug 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning Tri Dao LRM 105 1,269 0 17 Jul 2023
Lost in the Middle: How Language Models Use Long Contexts Nelson F. Liu Kevin Lin John Hewitt Ashwin Paranjape Michele Bevilacqua Fabio Petroni Percy Liang RALM 84 1,570 0 06 Jul 2023
Landmark Attention: Random-Access Infinite Context Length for Transformers Amirkeivan Mohtashami Martin Jaggi LLMAG 105 163 0 25 May 2023
RWKV: Reinventing RNNs for the Transformer Era Bo Peng Eric Alcaide Quentin G. Anthony Alon Albalak Samuel Arcadinho ... Qihang Zhao P. Zhou Qinghua Zhou Jian Zhu Rui-Jie Zhu 179 590 0 22 May 2023
State Spaces Aren't Enough: Machine Translation Needs Attention Ali Vardasbi Telmo Pires Robin M. Schmidt Stephan Peitz 42 10 0 25 Apr 2023
Resurrecting Recurrent Neural Networks for Long Sequences Antonio Orvieto Samuel L. Smith Albert Gu Anushan Fernando Çağlar Gülçehre Razvan Pascanu Soham De 252 287 0 11 Mar 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models Michael Poli Stefano Massaroli Eric Q. Nguyen Daniel Y. Fu Tri Dao S. Baccus Yoshua Bengio Stefano Ermon Christopher Ré VLM 65 294 0 21 Feb 2023
Efficient Long-Text Understanding with Short-Text Models Maor Ivgi Uri Shaham Jonathan Berant VLM 48 83 0 01 Aug 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Tri Dao Daniel Y. Fu Stefano Ermon Atri Rudra Christopher Ré VLM 185 2,199 0 27 May 2022
SCROLLS: Standardized CompaRison Over Long Language Sequences Uri Shaham Elad Segal Maor Ivgi Avia Efrat Ori Yoran ... Ankit Gupta Wenhan Xiong Mor Geva Jonathan Berant Omer Levy RALM 76 137 0 10 Jan 2022
Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers Albert Gu Isys Johnson Karan Goel Khaled Kamal Saab Tri Dao Atri Rudra Christopher Ré 97 587 0 26 Oct 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 297 745 0 27 Aug 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding Jianlin Su Yu Lu Shengfeng Pan Ahmed Murtadha Bo Wen Yunfeng Liu 193 2,415 0 20 Apr 2021
HiPPO: Recurrent Memory with Optimal Polynomial Projections Albert Gu Tri Dao Stefano Ermon Atri Rudra Christopher Ré 84 512 0 17 Aug 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Angelos Katharopoulos Apoorv Vyas Nikolaos Pappas Franccois Fleuret 166 1,755 0 29 Jun 2020
Generating Long Sequences with Sparse Transformers R. Child Scott Gray Alec Radford Ilya Sutskever 88 1,894 0 23 Apr 2019
Know What You Don't Know: Unanswerable Questions for SQuAD Pranav Rajpurkar Robin Jia Percy Liang RALM ELM 230 2,835 0 11 Jun 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 573 130,942 0 12 Jun 2017
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 435 27,260 0 01 Sep 2014
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Kyunghyun Cho B. V. Merrienboer Çağlar Gülçehre Dzmitry Bahdanau Fethi Bougares Holger Schwenk Yoshua Bengio AIMat 791 23,311 0 03 Jun 2014
On the difficulty of training Recurrent Neural Networks Razvan Pascanu Tomas Mikolov Yoshua Bengio ODL 168 5,334 0 21 Nov 2012