Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.11641
Cited By
PIQA: Reasoning about Physical Commonsense in Natural Language
26 November 2019
Yonatan Bisk
Rowan Zellers
Ronan Le Bras
Jianfeng Gao
Yejin Choi
OOD
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"PIQA: Reasoning about Physical Commonsense in Natural Language"
50 / 1,393 papers shown
Title
Photon: Federated LLM Pre-Training
Lorenzo Sani
Alex Iacob
Zeyu Cao
Royson Lee
Bill Marino
...
Dongqi Cai
Zexi Li
Wanru Zhao
Xinchi Qiu
Nicholas D. Lane
AI4CE
88
9
0
05 Nov 2024
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Xingwu Sun
Yanfeng Chen
Yanwen Huang
Ruobing Xie
Jiaqi Zhu
...
Zhanhui Kang
Yong Yang
Yuhong Liu
Di Wang
Jie Jiang
MoE
ALM
ELM
165
34
0
04 Nov 2024
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo
Chenyang Song
Xu Han
Yuxiao Chen
Chaojun Xiao
Zhiyuan Liu
Maosong Sun
Jiansheng Wei
Zhiyuan Liu
Maosong Sun
147
7
0
04 Nov 2024
TODO: Enhancing LLM Alignment with Ternary Preferences
Yuxiang Guo
Lu Yin
Bo Jiang
Jiaqi Zhang
125
3
0
02 Nov 2024
Multilingual Pretraining Using a Large Corpus Machine-Translated from a Single Source Language
Jiayi Wang
Yao Lu
Maurice Weber
Max Ryabinin
Yihong Chen
Raphael Tang
Pontus Stenetorp
LRM
104
1
0
31 Oct 2024
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments
Xinghao Wang
Pengyu Wang
Bo Wang
Dong Zhang
Yunhua Zhou
Xipeng Qiu
MQ
71
1
0
31 Oct 2024
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists
Michał Pietruszka
Łukasz Borchmann
Aleksander Jędrosz
Paweł Morawiecki
ELM
42
1
0
30 Oct 2024
MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning
Xujia Wang
Haiyan Zhao
Shuo Wang
Hanqing Wang
Zhiyuan Liu
MoMe
MoE
63
1
0
30 Oct 2024
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference
Changwoo Lee
Soo Min Kwon
Qing Qu
Hun-Seok Kim
90
0
0
28 Oct 2024
LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment
Ge Yang
Changyi He
Jinpei Guo
Jianyu Wu
Yifu Ding
Aishan Liu
Haotong Qin
Pengliang Ji
Xianglong Liu
MQ
96
7
0
28 Oct 2024
Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training
Michael Pieler
Marco Bellagente
H. Teufel
Duy Phung
Nathan Cooper
...
Reshinth Adithyan
Zaid Alyafeai
Nikhil Pinnaparaju
Maksym Zhuravinskyi
Carlos Riquelme
74
1
0
28 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
137
7
0
28 Oct 2024
Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor Learning
Aosong Feng
Rex Ying
Leandros Tassiulas
56
2
0
28 Oct 2024
RARe: Retrieval Augmented Retrieval with In-Context Examples
Atula Tejaswi
Yoonsang Lee
Sujay Sanghavi
Eunsol Choi
RALM
LRM
62
1
0
26 Oct 2024
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
Yuhang Li
Priyadarshini Panda
MQ
73
1
0
24 Oct 2024
On the Crucial Role of Initialization for Matrix Factorization
Bingcong Li
Liang Zhang
Aryan Mokhtari
Niao He
163
6
0
24 Oct 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
146
3
0
24 Oct 2024
Taipan: Efficient and Expressive State Space Language Models with Selective Attention
Chien Van Nguyen
Huy Huu Nguyen
Thang M. Pham
Ruiyi Zhang
Hanieh Deilamsalehy
...
Ryan A. Rossi
Trung Bui
Viet Dac Lai
Franck Dernoncourt
Thien Huu Nguyen
Mamba
RALM
58
1
0
24 Oct 2024
KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing
Yifei Yang
Zouying Cao
Qiguang Chen
L. Qin
Dongjie Yang
Hai Zhao
Zhi Chen
66
6
0
24 Oct 2024
Scaling up Masked Diffusion Models on Text
Shen Nie
Fengqi Zhu
Chao Du
Tianyu Pang
Qian Liu
Guangtao Zeng
Min Lin
Chongxuan Li
AI4CE
211
30
0
24 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
113
5
0
24 Oct 2024
MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning
Jingfan Zhang
Yi Zhao
Dan Chen
Xing Tian
Huanran Zheng
Wei Zhu
MoE
134
17
0
23 Oct 2024
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Shansan Gong
Shivam Agarwal
Yizhe Zhang
Jiacheng Ye
Lin Zheng
...
Peilin Zhao
W. Bi
Jiawei Han
Hao Peng
Dianbo Sui
AI4CE
140
27
0
23 Oct 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
158
8
0
23 Oct 2024
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji
Yang Xiang
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
94
2
0
23 Oct 2024
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Tyler A. Chang
Dheeraj Rajagopal
Tolga Bolukbasi
Lucas Dixon
Ian Tenney
TDI
94
5
0
22 Oct 2024
PLDR-LLM: Large Language Model from Power Law Decoder Representations
Burc Gokden
59
1
0
22 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
210
7
0
22 Oct 2024
Self-calibration for Language Model Quantization and Pruning
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
492
0
0
22 Oct 2024
Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models
Yuheng Lu
Bingshuo Qian
Caixia Yuan
Huixing Jiang
Xiaojie Wang
CLL
88
0
0
22 Oct 2024
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes
Bryan R Christ
Zack Gottesman
Jonathan Kropko
Thomas Hartvigsen
LRM
138
4
0
22 Oct 2024
Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S
Christabel Acquaye
Haozhe An
Rachel Rudinger
76
5
0
21 Oct 2024
Revealing and Mitigating the Local Pattern Shortcuts of Mamba
Wangjie You
Zecheng Tang
Juntao Li
Lili Yao
Min Zhang
Mamba
61
0
0
21 Oct 2024
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
Zhenpeng Su
Xing Wu
Zijia Lin
Yizhe Xiong
Minxuan Lv
Guangyuan Ma
Hui Chen
Songlin Hu
Guiguang Ding
MoE
120
4
0
21 Oct 2024
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
Wenkai Li
Jiarui Liu
Andy Liu
Xuhui Zhou
Mona Diab
Maarten Sap
156
11
0
21 Oct 2024
Lossless KV Cache Compression to 2%
Zhen Yang
Jizong Han
Kan Wu
Ruobing Xie
An Wang
Xingwu Sun
Zhanhui Kang
VLM
MQ
75
2
0
20 Oct 2024
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
You Wu
Haoyi Wu
Kewei Tu
81
3
0
18 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLM
CoGe
ReLM
VLM
LRM
78
0
0
17 Oct 2024
ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
Shailaja Keyur Sampat
Yezhou Yang
Chitta Baral
LM&Ro
83
0
0
17 Oct 2024
MoR: Mixture of Ranks for Low-Rank Adaptation Tuning
Chuanyu Tang
Yilong Chen
Zhenyu Zhang
Junyuan Shang
Wenyuan Zhang
Yong Huang
Tingwen Liu
MoE
59
0
0
17 Oct 2024
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching
Jie Peng
Zhang Cao
Huaizhi Qu
Zhengyu Zhang
Chang Guo
Yanyong Zhang
Zhichao Cao
Tianlong Chen
104
2
0
17 Oct 2024
Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning
Minseok Choi
C. Park
Dohyun Lee
Jaegul Choo
KELM
MU
53
1
0
17 Oct 2024
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers
Shwai He
Tao Ge
Guoheng Sun
Bowei Tian
Xiaoyang Wang
Ang Li
MoE
131
1
0
17 Oct 2024
Rethinking Token Reduction for State Space Models
Zheng Zhan
Yushu Wu
Zhenglun Kong
Changdi Yang
Yifan Gong
Xuan Shen
Xue Lin
Pu Zhao
Yanzhi Wang
Mamba
85
6
0
16 Oct 2024
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Bokai Lin
Zihao Zeng
Zipeng Xiao
Siqi Kou
Tianqi Hou
Xiaofeng Gao
Hao Zhang
Zhijie Deng
88
6
0
16 Oct 2024
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Yanyue Xie
Zhi Zhang
Ding Zhou
Cong Xie
Ziang Song
Xin Liu
Yanzhi Wang
Xue Lin
An Xu
LLMAG
89
5
0
15 Oct 2024
DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models
Shangqian Gao
Chi-Heng Lin
Ting Hua
Tang Zheng
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
66
10
0
15 Oct 2024
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Yiding Jiang
Allan Zhou
Zhili Feng
Sadhika Malladi
J. Zico Kolter
100
22
0
15 Oct 2024
Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models
Kai Yao
P. Gao
Lichun Li
Yuan Zhao
Xiaofeng Wang
Wei Wang
Jianke Zhu
56
2
0
15 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
103
18
0
15 Oct 2024
Previous
1
2
3
...
8
9
10
...
26
27
28
Next