Scaling Laws for Neural Language Models

23 January 2020

Papers citing "Scaling Laws for Neural Language Models"

50 / 982 papers shown

Title
Mixture-of-Experts with Expert Choice Routing Yan-Quan Zhou Tao Lei Han-Chu Liu Nan Du Yanping Huang Vincent Zhao Andrew M. Dai Zhifeng Chen Quoc V. Le James Laudon MoE 160 327 0 18 Feb 2022
ST-MoE: Designing Stable and Transferable Sparse Expert Models Barret Zoph Irwan Bello Sameer Kumar Nan Du Yanping Huang J. Dean Noam M. Shazeer W. Fedus MoE 24 181 0 17 Feb 2022
Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments Maor Ivgi Y. Carmon Jonathan Berant 11 17 0 13 Feb 2022
Compute Trends Across Three Eras of Machine Learning J. Sevilla Lennart Heim A. Ho T. Besiroglu Marius Hobbhahn Pablo Villalobos 27 269 0 11 Feb 2022
Competition-Level Code Generation with AlphaCode Yujia Li David Choi Junyoung Chung Nate Kushman Julian Schrittwieser ... Esme Sutherland Robson Pushmeet Kohli Nando de Koray Kavukcuoglu Oriol Vinyals 26 1,299 0 08 Feb 2022
Cedille: A large autoregressive French language model Martin Müller Florian Laurent 36 19 0 07 Feb 2022
Adaptive Fine-Tuning of Transformer-Based Language Models for Named Entity Recognition Felix Stollenwerk 12 3 0 05 Feb 2022
Formal Mathematics Statement Curriculum Learning Stanislas Polu Jesse Michael Han Kunhao Zheng Mantas Baksys Igor Babuschkin Ilya Sutskever AIMat 86 116 0 03 Feb 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers Youjie Li Amar Phanishayee D. Murray Jakub Tarnawski N. Kim 16 19 0 02 Feb 2022
Unified Scaling Laws for Routed Language Models Aidan Clark Diego de Las Casas Aurelia Guy A. Mensch Michela Paganini ... Oriol Vinyals Jack W. Rae Erich Elsen Koray Kavukcuoglu Karen Simonyan MoE 27 177 0 02 Feb 2022
Auto-Compressing Subset Pruning for Semantic Image Segmentation Konstantin Ditschuneit Johannes Otterbach 21 5 0 26 Jan 2022
CM3: A Causal Masked Multimodal Model of the Internet Armen Aghajanyan Po-Yao (Bernie) Huang Candace Ross Vladimir Karpukhin Hu Xu ... Dmytro Okhonko Mandar Joshi Gargi Ghosh M. Lewis Luke Zettlemoyer 15 154 0 19 Jan 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale Samyam Rajbhandari Conglong Li Z. Yao Minjia Zhang Reza Yazdani Aminabadi A. A. Awan Jeff Rasley Yuxiong He 35 284 0 14 Jan 2022
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models Alexander Pan Kush S. Bhatia Jacob Steinhardt 50 169 0 10 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound Rowan Zellers Jiasen Lu Ximing Lu Youngjae Yu Yanpeng Zhao Mohammadreza Salehi Aditya Kusupati Jack Hessel Ali Farhadi Yejin Choi 31 207 0 07 Jan 2022
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate Xiaonan Nie Xupeng Miao Shijie Cao Lingxiao Ma Qibin Liu Jilong Xue Youshan Miao Yi Liu Zhi-Xin Yang Bin Cui MoMe MoE 21 22 0 29 Dec 2021
Efficient Hierarchical Domain Adaptation for Pretrained Language Models Alexandra Chronopoulou Matthew E. Peters Jesse Dodge 25 42 0 16 Dec 2021
Can Multilinguality benefit Non-autoregressive Machine Translation? Sweta Agrawal Julia Kreutzer Colin Cherry AI4CE 29 1 0 16 Dec 2021
LongT5: Efficient Text-To-Text Transformer for Long Sequences Mandy Guo Joshua Ainslie David C. Uthus Santiago Ontanon Jianmo Ni Yun-hsuan Sung Yinfei Yang VLM 31 307 0 15 Dec 2021
Computational bioacoustics with deep learning: a review and roadmap D. Stowell 32 235 0 13 Dec 2021
Improving language models by retrieving from trillions of tokens Sebastian Borgeaud A. Mensch Jordan Hoffmann Trevor Cai Eliza Rutherford ... Simon Osindero Karen Simonyan Jack W. Rae Erich Elsen Laurent Sifre KELM RALM 90 1,016 0 08 Dec 2021
Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning DeepMind Interactive Agents Team Josh Abramson Josh Abramson Arun Ahuja Arthur Brussee Federico Carnevale ... Tamara von Glehn Greg Wayne Nathaniel Wong Chen Yan Rui Zhu LM&Ro 37 46 0 07 Dec 2021
Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention Yichong Xu Chenguang Zhu Shuohang Wang Siqi Sun Hao Cheng Xiaodong Liu Jianfeng Gao Pengcheng He Michael Zeng Xuedong Huang LRM 254 55 0 06 Dec 2021
Co-domain Symmetry for Complex-Valued Deep Learning Utkarsh Singhal Yifei Xing Stella X. Yu 33 12 0 02 Dec 2021
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models Tri Dao Beidi Chen Kaizhao Liang Jiaming Yang Zhao-quan Song Atri Rudra Christopher Ré 33 75 0 30 Nov 2021
Scaling Up Vision-Language Pre-training for Image Captioning Xiaowei Hu Zhe Gan Jianfeng Wang Zhengyuan Yang Zicheng Liu Yumao Lu Lijuan Wang MLLM VLM 34 246 0 24 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution Ze Liu Han Hu Yutong Lin Zhuliang Yao Zhenda Xie ... Yue Cao Zheng-Wei Zhang Li Dong Furu Wei B. Guo ViT 55 1,747 0 18 Nov 2021
Merging Models with Fisher-Weighted Averaging Michael Matena Colin Raffel FedML MoMe 38 352 0 18 Nov 2021
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN R. Thomas McCoy P. Smolensky Tal Linzen Jianfeng Gao Asli Celikyilmaz SyDa 25 119 0 18 Nov 2021
Few-Shot Self-Rationalization with Natural Language Prompts Ana Marasović Iz Beltagy Doug Downey Matthew E. Peters LRM 26 106 0 16 Nov 2021
Scaling Law for Recommendation Models: Towards General-purpose User Representations Kyuyong Shin Hanock Kwak KyungHyun Kim Max Nihlén Ramström Jisu Jeong Jung-Woo Ha S. Kim ELM 36 38 0 15 Nov 2021
Turing-Universal Learners with Optimal Scaling Laws Preetum Nakkiran 21 2 0 09 Nov 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs Christoph Schuhmann Richard Vencu Romain Beaumont R. Kaczmarczyk Clayton Mullis Aarush Katta Theo Coombes J. Jitsev Aran Komatsuzaki VLM MLLM CLIP 10 1,377 0 03 Nov 2021
Learning Size and Shape of Calabi-Yau Spaces Magdalena Larfors A. Lukas Fabian Ruehle Robin Schneider 6 41 0 02 Nov 2021
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey Bonan Min Hayley L Ross Elior Sulem Amir Pouran Ben Veyseh Thien Huu Nguyen Oscar Sainz Eneko Agirre Ilana Heinz Dan Roth LM&MA VLM AI4CE 77 1,030 0 01 Nov 2021
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units Anurag Katakkar A. Black AuLLM 24 1 0 31 Oct 2021
Training Verifiers to Solve Math Word Problems K. Cobbe V. Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun ... Jerry Tworek Jacob Hilton Reiichiro Nakano Christopher Hesse John Schulman ReLM OffRL LRM 31 3,785 0 27 Oct 2021
The Efficiency Misnomer Daoyuan Chen Liuyi Yao Dawei Gao Ashish Vaswani Yaliang Li 34 99 0 25 Oct 2021
Practical Galaxy Morphology Tools from Deep Supervised Representation Learning Mike Walmsley Anna M. M. Scaife Chris J. Lintott Michelle Lochner Yu Zhu ... Xibo Ma Sandor Kruk Zhen Lei G. Guo B. Simmons 19 29 0 25 Oct 2021
Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model A. Bodin N. Macris 37 13 0 22 Oct 2021
Behavioral Experiments for Understanding Catastrophic Forgetting Samuel J. Bell Neil D. Lawrence 27 4 0 20 Oct 2021
Improving Compositional Generalization with Self-Training for Data-to-Text Generation Sanket Vaibhav Mehta J. Rao Yi Tay Mihir Kale Ankur P. Parikh Emma Strubell AI4CE 38 30 0 16 Oct 2021
Towards More Effective and Economic Sparsely-Activated Model Hao Jiang Ke Zhan Jianwei Qu Yongkang Wu Zhaoye Fei ... Enrui Hu Yinxia Zhang Yantao Jia Fan Yu Zhao Cao MoE 152 12 0 14 Oct 2021
Automated Essay Scoring Using Transformer Models Sabrina Ludwig Christian W. F. Mayer Christopher Hansen Kerstin Eilers Steffen Brandt 19 38 0 13 Oct 2021
Unsupervised Neural Machine Translation with Generative Language Models Only Jesse Michael Han Igor Babuschkin Harrison Edwards Arvind Neelakantan Tao Xu ... Alex Ray Pranav Shyam Aditya A. Ramesh Alec Radford Ilya Sutskever 47 36 0 11 Oct 2021
Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning Shaohua Wu Xudong Zhao Tong Yu Rongguo Zhang C. Shen ... Feng Li Hong Zhu Jiangang Luo Liang Xu Xuanwei Zhang ALM 23 59 0 10 Oct 2021
8-bit Optimizers via Block-wise Quantization Tim Dettmers M. Lewis Sam Shleifer Luke Zettlemoyer MQ 28 270 0 06 Oct 2021
Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers Narsimha Chilkuri Eric Hunsberger Aaron R. Voelker G. Malik C. Eliasmith 35 7 0 05 Oct 2021
Exploring the Limits of Large Scale Pre-training Samira Abnar Mostafa Dehghani Behnam Neyshabur Hanie Sedghi AI4CE 60 114 0 05 Oct 2021
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English Ilias Chalkidis Abhik Jana D. Hartung M. Bommarito Ion Androutsopoulos Daniel Martin Katz Nikolaos Aletras AILaw ELM 130 248 0 03 Oct 2021