Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.06022
Cited By
Lessons on Parameter Sharing across Layers in Transformers
13 April 2021
Sho Takase
Shun Kiyono
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Lessons on Parameter Sharing across Layers in Transformers"
16 / 16 papers shown
Title
Adaptive Additive Parameter Updates of Vision Transformers for Few-Shot Continual Learning
Kyle Stein
A. Mahyari
Guillermo Francia III
Eman El-Sheikh
CLL
65
0
0
11 Apr 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
50
0
0
10 Jan 2025
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
81
5
0
28 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
39
3
0
02 Oct 2024
KALE-LM: Unleash The Power Of AI For Science Via Knowledge And Logic Enhanced Large Model
Weichen Dai
Yezeng Chen
Zijie Dai
Zhijie Huang
Yong-Jin Liu
...
Chengli Zhong
Xinhe Li
Zeyu Wang
Zhuoying Feng
Yi Zhou
35
0
0
27 Sep 2024
MALT: Multi-scale Action Learning Transformer for Online Action Detection
Zhipeng Yang
Ruoyu Wang
Yang Tan
Liping Xie
OffRL
43
1
0
31 May 2024
Enhancing Context Through Contrast
Kshitij Ambilduke
Aneesh Shetye
Diksha Bagade
Rishika Bhagwatkar
Khurshed Fitter
P. Vagdargi
Shital S. Chiddarwar
26
0
0
06 Jan 2024
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
30
1
0
07 Jun 2023
Semi-supervised Neural Machine Translation with Consistency Regularization for Low-Resource Languages
Viet H. Pham
Thang M. Pham
Giang Nguyen
Long H. B. Nguyen
D. Dinh
19
0
0
02 Apr 2023
Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction
Yue Yang
Artemis Panagopoulou
Marianna Apidianaki
Mark Yatskar
Chris Callison-Burch
23
2
0
24 Oct 2022
Spiking Neural Networks for event-based action recognition: A new task to understand their advantage
Alex Vicente-Sola
D. L. Manna
Paul Kirkland
G. D. Caterina
Trevor Bihl
21
8
0
29 Sep 2022
Streaming parallel transducer beam search with fast-slow cascaded encoders
Jay Mahadeokar
Yangyang Shi
Ke Li
Duc Le
Jiedan Zhu
Vikas Chandra
Ozlem Kalinli
M. Seltzer
27
15
0
29 Mar 2022
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Tao Ge
Si-Qing Chen
Furu Wei
MoE
26
21
0
16 Feb 2022
Interpreting Deep Learning Models in Natural Language Processing: A Review
Xiaofei Sun
Diyi Yang
Xiaoya Li
Tianwei Zhang
Yuxian Meng
Han Qiu
Guoyin Wang
Eduard H. Hovy
Jiwei Li
17
44
0
20 Oct 2021
Is Attention always needed? A Case Study on Language Identification from Speech
A. Mandal
Santanu Pal
Indranil Dutta
Mahidas Bhattacharya
S. Naskar
27
6
0
05 Oct 2021
On Compositional Generalization of Neural Machine Translation
Yafu Li
Yongjing Yin
Yulong Chen
Yue Zhang
156
44
0
31 May 2021
1