First Activations Matter: Training-Free Methods for Dynamic Activation
in Large Language Models

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

21 August 2024

Yujie Wang

ArXiv (abs)PDF HTML

Papers citing "First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models"

10 / 10 papers shown

Title
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models Bowen Pan Songlin Yang Haokun Liu Mayank Mishra Gaoyuan Zhang Aude Oliva Colin Raffel Yikang Shen MoE 67 21 0 08 Apr 2024
Model-Based Control with Sparse Neural Dynamics Ziang Liu Genggeng Zhou Jeff He Tobia Marcucci Fei-Fei Li Jiajun Wu Yunzhu Li AI4CE 74 18 0 20 Dec 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy Pingzhi Li Zhenyu Zhang Prateek Yadav Yi-Lin Sung Yu Cheng Mohit Bansal Tianlong Chen MoMe 69 39 0 02 Oct 2023
SCROLLS: Standardized CompaRison Over Long Language Sequences Uri Shaham Elad Segal Maor Ivgi Avia Efrat Ori Yoran ... Ankit Gupta Wenhan Xiong Mor Geva Jonathan Berant Omer Levy RALM 102 139 0 10 Jan 2022
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts Zhengyan Zhang Yankai Lin Zhiyuan Liu Peng Li Maosong Sun Jie Zhou MoE 91 128 0 05 Oct 2021
Proving the Lottery Ticket Hypothesis: Pruning is All You Need Eran Malach Gilad Yehudai Shai Shalev-Shwartz Ohad Shamir 109 276 0 03 Feb 2020
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions Christopher Clark Kenton Lee Ming-Wei Chang Tom Kwiatkowski Michael Collins Kristina Toutanova 244 1,560 0 24 May 2019
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization Shashi Narayan Shay B. Cohen Mirella Lapata AILaw 149 1,684 0 27 Aug 2018
CoQA: A Conversational Question Answering Challenge Siva Reddy Danqi Chen Christopher D. Manning RALM HAI 114 1,212 0 21 Aug 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Jonathan Frankle Michael Carbin 269 3,488 0 09 Mar 2018