Title |
---|
![]() Improving Transformers with Probabilistic Attention Keys Tam Nguyen T. Nguyen Dung D. Le Duy Khuong Nguyen Viet-Anh Tran Richard G. Baraniuk Nhat Ho Stanley J. Osher |
![]() MoEfication: Transformer Feed-forward Layers are Mixtures of Experts Zhengyan Zhang Yankai Lin Zhiyuan Liu Peng Li Maosong Sun Jie Zhou |