Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.11393
Cited By
First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models
21 August 2024
Chi Ma
Mincong Huang
Ying Zhang
Chao Wang
Yujie Wang
Lei Yu
Chuan Liu
Wei Lin
AI4CE
LLMSV
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models"
10 / 10 papers shown
Title
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Bowen Pan
Songlin Yang
Haokun Liu
Mayank Mishra
Gaoyuan Zhang
Aude Oliva
Colin Raffel
Yikang Shen
MoE
67
21
0
08 Apr 2024
Model-Based Control with Sparse Neural Dynamics
Ziang Liu
Genggeng Zhou
Jeff He
Tobia Marcucci
Fei-Fei Li
Jiajun Wu
Yunzhu Li
AI4CE
74
18
0
20 Dec 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
Pingzhi Li
Zhenyu Zhang
Prateek Yadav
Yi-Lin Sung
Yu Cheng
Mohit Bansal
Tianlong Chen
MoMe
69
39
0
02 Oct 2023
SCROLLS: Standardized CompaRison Over Long Language Sequences
Uri Shaham
Elad Segal
Maor Ivgi
Avia Efrat
Ori Yoran
...
Ankit Gupta
Wenhan Xiong
Mor Geva
Jonathan Berant
Omer Levy
RALM
102
139
0
10 Jan 2022
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
91
128
0
05 Oct 2021
Proving the Lottery Ticket Hypothesis: Pruning is All You Need
Eran Malach
Gilad Yehudai
Shai Shalev-Shwartz
Ohad Shamir
109
276
0
03 Feb 2020
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
244
1,560
0
24 May 2019
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
Shashi Narayan
Shay B. Cohen
Mirella Lapata
AILaw
149
1,684
0
27 Aug 2018
CoQA: A Conversational Question Answering Challenge
Siva Reddy
Danqi Chen
Christopher D. Manning
RALM
HAI
114
1,212
0
21 Aug 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
269
3,488
0
09 Mar 2018
1