Rope to Nope and Back Again: A New Hybrid Attention Strategy

30 January 2025

Papers citing "Rope to Nope and Back Again: A New Hybrid Attention Strategy"

4 / 4 papers shown

Title
Hardware-Efficient Attention for Fast Decoding Ted Zadouri Hubert Strauss Tri Dao 46 2 0 27 May 2025
Mechanistic Interpretability of GPT-like Models on Summarization Tasks Anurag Mishra MILM 35 0 0 20 May 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More Arvid Frydenlund LRM 142 0 0 13 Mar 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models Binghai Wang Haizhou Zhao Huozhi Zhou Liang Song Mingyu Xu ... Yan Zhang Yifei Duan Yuyan Zhou Zhi-Ming Ma Zhikai Wu LM&MA ELM AI4MH 107 9 0 18 Feb 2025