OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser

24 June 2024

Papers citing "OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser"

7 / 7 papers shown

Title
An Empirical Study of Mamba-based Language Models R. Waleffe Wonmin Byeon Duncan Riach Brandon Norick V. Korthikanti ... Vartika Singh Jared Casper Jan Kautz Mohammad Shoeybi Bryan Catanzaro 98 72 0 12 Jun 2024
Mixtral of Experts Albert Q. Jiang Alexandre Sablayrolles Antoine Roux A. Mensch Blanche Savary ... Théophile Gervet Thibaut Lavril Thomas Wang Timothée Lacroix William El Sayed MoE LLMAG 142 1,075 0 08 Jan 2024
GLM: General Language Model Pretraining with Autoregressive Blank Infilling Zhengxiao Du Yujie Qian Xiao Liu Ming Ding J. Qiu Zhilin Yang Jie Tang BDL AI4CE 107 1,535 0 18 Mar 2021
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Angelos Katharopoulos Apoorv Vyas Nikolaos Pappas Franccois Fleuret 166 1,755 0 29 Jun 2020
CLUE: A Chinese Language Understanding Evaluation Benchmark Liang Xu Hai Hu Xuanwei Zhang Lu Li Chenjie Cao ... Cong Yue Xinrui Zhang Zhen-Yi Yang Kyle Richardson Zhenzhong Lan ELM 82 383 0 13 Apr 2020
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering Zhilin Yang Peng Qi Saizheng Zhang Yoshua Bengio William W. Cohen Ruslan Salakhutdinov Christopher D. Manning RALM 147 2,635 0 25 Sep 2018
The NarrativeQA Reading Comprehension Challenge Tomás Kociský Jonathan Richard Schwarz Phil Blunsom Chris Dyer Karl Moritz Hermann Gábor Melis Edward Grefenstette 128 771 0 19 Dec 2017