Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.15312
Cited By
FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer
30 June 2022
Jingping Liu
Yuqiu Song
Kui Xue
Hongli Sun
Chao Wang
Lihan Chen
Haiyun Jiang
Jiaqing Liang
Tong Ruan
Re-assign community
ArXiv (abs)
PDF
HTML
Github (11★)
Papers citing
"FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer"
31 / 31 papers shown
Title
Kformer: Knowledge Injection in Transformer Feed-Forward Layers
Yunzhi Yao
Shaohan Huang
Li Dong
Furu Wei
Huajun Chen
Ningyu Zhang
KELM
MedIm
76
42
0
15 Jan 2022
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
Xiao Liu
Kaixuan Ji
Yicheng Fu
Weng Lam Tam
Zhengxiao Du
Zhilin Yang
Jie Tang
VLM
290
860
0
14 Oct 2021
PTR: Prompt Tuning with Rules for Text Classification
Xu Han
Weilin Zhao
Ning Ding
Zhiyuan Liu
Maosong Sun
VLM
102
526
0
24 May 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
284
2,500
0
20 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
579
4,077
0
18 Apr 2021
Learning How to Ask: Querying LMs with Mixtures of Soft Prompts
Guanghui Qin
J. Eisner
63
547
0
14 Apr 2021
Mask Attention Networks: Rethinking and Strengthen Transformer
Zhihao Fan
Yeyun Gong
Dayiheng Liu
Zhongyu Wei
Siyuan Wang
Jian Jiao
Nan Duan
Ruofei Zhang
Xuanjing Huang
52
75
0
25 Mar 2021
GPT Understands, Too
Xiao Liu
Yanan Zheng
Zhengxiao Du
Ming Ding
Yujie Qian
Zhilin Yang
Jie Tang
VLM
168
1,179
0
18 Mar 2021
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li
Percy Liang
248
4,298
0
01 Jan 2021
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
402
1,971
0
31 Dec 2020
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
163
840
0
29 Dec 2020
Fast Transformers with Clustered Attention
Apoorv Vyas
Angelos Katharopoulos
Franccois Fleuret
60
154
0
09 Jul 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
201
1,786
0
29 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
216
1,713
0
08 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
835
42,332
0
28 May 2020
Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting
Sanyuan Chen
Yutai Hou
Yiming Cui
Wanxiang Che
Ting Liu
Xiangzhan Yu
KELM
CLL
105
224
0
27 Apr 2020
CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu
Hai Hu
Xuanwei Zhang
Lu Li
Chenjie Cao
...
Cong Yue
Xinrui Zhang
Zhen-Yi Yang
Kyle Richardson
Zhenzhong Lan
ELM
87
386
0
13 Apr 2020
Talking-Heads Attention
Noam M. Shazeer
Zhenzhong Lan
Youlong Cheng
Nan Ding
L. Hou
122
80
0
05 Mar 2020
How Can We Know What Language Models Know?
Zhengbao Jiang
Frank F. Xu
Jun Araki
Graham Neubig
KELM
138
1,409
0
28 Nov 2019
Improving Transformer Models by Reordering their Sublayers
Ofir Press
Noah A. Smith
Omer Levy
62
87
0
10 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
450
20,298
0
23 Oct 2019
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
576
2,673
0
03 Sep 2019
NEZHA: Neural Contextualized Representation for Chinese Language Understanding
Junqiu Wei
Xiaozhe Ren
Xiaoguang Li
Wenyong Huang
Yi-Lun Liao
Yasheng Wang
Jianghao Lin
Xin Jiang
Xiao Chen
Qun Liu
44
116
0
31 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
671
24,528
0
26 Jul 2019
How to Fine-Tune BERT for Text Classification?
Chi Sun
Xipeng Qiu
Yige Xu
Xuanjing Huang
87
1,525
0
14 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
274
2,323
0
02 May 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,114
0
11 Oct 2018
Accelerating Neural Transformer via an Average Attention Network
Biao Zhang
Deyi Xiong
Jinsong Su
71
120
0
02 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,182
0
20 Apr 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
224
11,565
0
15 Feb 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
728
132,199
0
12 Jun 2017
1