Exposing Attention Glitches with Flip-Flop Language Modeling

1 June 2023

Papers citing "Exposing Attention Glitches with Flip-Flop Language Modeling"

23 / 23 papers shown

Title
TRA: Better Length Generalisation with Threshold Relative Attention Mattia Opper Roland Fernandez P. Smolensky Jianfeng Gao 46 0 0 29 Mar 2025
Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation Hieu Nguyen Zihao He Shoumik Atul Gandre Ujjwal Pasupulety Sharanya Kumari Shivakumar Kristina Lerman HILM 59 1 0 16 Feb 2025
Reasoning in Large Language Models: A Geometric Perspective Romain Cosentino Sarath Shekkizhar LRM 44 2 0 02 Jul 2024
Separations in the Representational Capabilities of Transformers and Recurrent Architectures S. Bhattamishra Michael Hahn Phil Blunsom Varun Kanade GNN 44 9 0 13 Jun 2024
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs Joseph Spracklen Raveen Wijewickrama A. H. M. N. Sakib Anindya Maiti Murtuza Jadliwala Murtuza Jadliwala 48 10 0 12 Jun 2024
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks Mahdi Sabbaghi George Pappas Hamed Hassani Surbhi Goel 43 4 0 04 Jun 2024
Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea 39 1 0 01 Feb 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models Ziwei Xu Sanjay Jain Mohan S. Kankanhalli HILM LRM 71 218 0 22 Jan 2024
How Language Model Hallucinations Can Snowball Muru Zhang Ofir Press William Merrill Alisa Liu Noah A. Smith HILM LRM 88 257 0 22 May 2023
Unlimiformer: Long-Range Transformers with Unlimited Length Input Amanda Bertsch Uri Alon Graham Neubig Matthew R. Gormley RALM 108 122 0 02 May 2023
Learning to Reason and Memorize with Self-Notes Jack Lanchantin Shubham Toshniwal Jason Weston Arthur Szlam Sainbayar Sukhbaatar ReLM LRM LLMAG 98 29 0 01 May 2023
Do Transformers Parse while Predicting the Masked Word? Haoyu Zhao A. Panigrahi Rong Ge Sanjeev Arora 76 31 0 14 Mar 2023
Resurrecting Recurrent Neural Networks for Long Sequences Antonio Orvieto Samuel L. Smith Albert Gu Anushan Fernando Çağlar Gülçehre Razvan Pascanu Soham De 88 268 0 11 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding Yuchen Li Yuan-Fang Li Andrej Risteski 120 61 0 07 Mar 2023
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought Abulhair Saparov He He ELM LRM ReLM 123 277 0 03 Oct 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models Xuezhi Wang Jason W. Wei Dale Schuurmans Quoc Le Ed H. Chi Sharan Narang Aakanksha Chowdhery Denny Zhou ReLM BDL LRM AI4CE 314 3,273 0 21 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 398 8,559 0 28 Jan 2022
Entity-Based Knowledge Conflicts in Question Answering Shayne Longpre Kartik Perisetla Anthony Chen Nikhil Ramesh Chris DuBois Sameer Singh HILM 245 239 0 10 Sep 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 253 698 0 27 Aug 2021
LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning Yuhuai Wu M. Rabe Wenda Li Jimmy Ba Roger C. Grosse Christian Szegedy AIMat LRM 75 51 0 15 Jan 2021
Language Models as Knowledge Bases? Fabio Petroni Tim Rocktaschel Patrick Lewis A. Bakhtin Yuxiang Wu Alexander H. Miller Sebastian Riedel KELM AI4MH 425 2,588 0 03 Sep 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,746 0 26 Sep 2016
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 218 7,926 0 17 Aug 2015