First Activations Matter: Training-Free Methods for Dynamic Activation
in Large Language Models

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

21 August 2024

Yujie Wang

ArXiv (abs)PDF HTML

Papers citing "First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models"

10 / 10 papers shown

Title
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models Bowen Pan Songlin Yang Haokun Liu Mayank Mishra Gaoyuan Zhang Aude Oliva Colin Raffel Yikang Shen MoE 67 21 0 08 Apr 2024
Model-Based Control with Sparse Neural Dynamics Ziang Liu Genggeng Zhou Jeff He Tobia Marcucci Fei-Fei Li Jiajun Wu Yunzhu Li AI4CE 74 18 0 20 Dec 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy Pingzhi Li Zhenyu Zhang Prateek Yadav Yi-Lin Sung Yu Cheng Mohit Bansal Tianlong Chen MoMe 69 39 0 02 Oct 2023
SCROLLS: Standardized CompaRison Over Long Language Sequences Uri Shaham Elad Segal Maor Ivgi Avia Efrat Ori Yoran ... Ankit Gupta Wenhan Xiong Mor Geva Jonathan Berant Omer Levy RALM 102 139 0 10 Jan 2022
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts Zhengyan Zhang Yankai Lin Zhiyuan Liu Peng Li Maosong Sun Jie Zhou MoE 91 128 0 05 Oct 2021
Proving the Lottery Ticket Hypothesis: Pruning is All You Need Eran Malach Gilad Yehudai Shai Shalev-Shwartz Ohad Shamir 109 276 0 03 Feb 2020
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions Christopher Clark Kenton Lee Ming-Wei Chang Tom Kwiatkowski Michael Collins Kristina Toutanova 244 1,560 0 24 May 2019
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization Shashi Narayan Shay B. Cohen Mirella Lapata AILaw 149 1,684 0 27 Aug 2018
CoQA: A Conversational Question Answering Challenge Siva Reddy Danqi Chen Christopher D. Manning RALM HAI 114 1,210 0 21 Aug 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Jonathan Frankle Michael Carbin 266 3,488 0 09 Mar 2018

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.