
Reducing Transformer Depth on Demand with Structured Dropout
Papers citing "Reducing Transformer Depth on Demand with Structured Dropout"
50 / 406 papers shown
Title |
---|
![]() Towards Efficient NLP: A Standard Evaluation and A Strong Baseline Xiangyang Liu Tianxiang Sun Junliang He Jiawen Wu Lingling Wu Xinyu Zhang Hao Jiang Bo Zhao Xuanjing Huang Xipeng Qiu |
![]() Pre-Trained Models: Past, Present and Future Xu Han Zhengyan Zhang Ning Ding Yuxian Gu Xiao Liu ...Jie Tang Ji-Rong Wen Jinhui Yuan Wayne Xin Zhao Jun Zhu |