What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition

23 January 2024

Papers citing "What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition"

26 / 26 papers shown

Title
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning Yaqing Wang Sahaj Agarwal Subhabrata Mukherjee Xiaodong Liu Jing Gao Ahmed Hassan Awadallah Jianfeng Gao MoE 77 133 0 31 Oct 2022
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models Margaret Li Suchin Gururangan Tim Dettmers M. Lewis Tim Althoff Noah A. Smith Luke Zettlemoyer MoMe 84 148 0 05 Aug 2022
No Language Left Behind: Scaling Human-Centered Machine Translation Nllb team Marta R. Costa-jussá James Cross Onur cCelebi Maha Elbayad ... Alexandre Mourachko C. Ropers Safiyyah Saleem Holger Schwenk Jeff Wang MoE 220 1,260 0 11 Jul 2022
Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias Yarden Tal Inbal Magar Roy Schwartz 50 35 0 20 Jun 2022
Towards Climate Awareness in NLP Research Daniel Hershcovich Nicolas Webersinke Mathias Kraus J. Bingler Markus Leippold 72 33 0 10 May 2022
Fair and Argumentative Language Modeling for Computational Argumentation Carolin Holtermann Anne Lauscher Simone Paolo Ponzetto 39 21 0 08 Apr 2022
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference Sneha Kudugunta Yanping Huang Ankur Bapna M. Krikun Dmitry Lepikhin Minh-Thang Luong Orhan Firat MoE 245 109 0 24 Sep 2021
Efficient Test Time Adapter Ensembling for Low-resource Language Varieties Xinyi Wang Yulia Tsvetkov Sebastian Ruder Graham Neubig 48 35 0 10 Sep 2021
Sustainable Modular Debiasing of Language Models Anne Lauscher Tobias Lüken Goran Glavaš 106 121 0 08 Sep 2021
DEMix Layers: Disentangling Domains for Modular Language Modeling Suchin Gururangan Michael Lewis Ari Holtzman Noah A. Smith Luke Zettlemoyer KELM MoE 91 134 0 11 Aug 2021
Towards Understanding and Mitigating Social Biases in Language Models Paul Pu Liang Chiyu Wu Louis-Philippe Morency Ruslan Salakhutdinov 93 388 0 24 Jun 2021
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers Rabeeh Karimi Mahabadi James Henderson Sebastian Ruder MoE 100 485 0 08 Jun 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus Jesse Dodge Maarten Sap Ana Marasović William Agnew Gabriel Ilharco Dirk Groeneveld Margaret Mitchell Matt Gardner AILaw 115 446 0 18 Apr 2021
Prefix-Tuning: Optimizing Continuous Prompts for Generation Xiang Lisa Li Percy Liang 223 4,254 0 01 Jan 2021
Parameter-Efficient Transfer Learning with Diff Pruning Demi Guo Alexander M. Rush Yoon Kim 74 400 0 14 Dec 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding Dmitry Lepikhin HyoukJoong Lee Yuanzhong Xu Dehao Chen Orhan Firat Yanping Huang M. Krikun Noam M. Shazeer Zhiwen Chen MoE 89 1,162 0 30 Jun 2020
DeBERTa: Decoding-enhanced BERT with Disentangled Attention Pengcheng He Xiaodong Liu Jianfeng Gao Weizhu Chen AAML 137 2,731 0 05 Jun 2020
Language (Technology) is Power: A Critical Survey of "Bias" in NLP Su Lin Blodgett Solon Barocas Hal Daumé Hanna M. Wallach 155 1,236 0 28 May 2020
AdapterFusion: Non-Destructive Task Composition for Transfer Learning Jonas Pfeiffer Aishwarya Kamath Andreas Rucklé Kyunghyun Cho Iryna Gurevych CLL MoMe 129 849 0 01 May 2020
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters Ruize Wang Duyu Tang Nan Duan Zhongyu Wei Xuanjing Huang Jianshu Ji Guihong Cao Daxin Jiang Ming Zhou KELM 87 553 0 05 Feb 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 419 20,127 0 23 Oct 2019
Simple, Scalable Adaptation for Neural Machine Translation Ankur Bapna N. Arivazhagan Orhan Firat AI4CE 100 417 0 18 Sep 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks Nils Reimers Iryna Gurevych 1.3K 12,193 0 27 Aug 2019
BERT Rediscovers the Classical NLP Pipeline Ian Tenney Dipanjan Das Ellie Pavlick MILM SSeg 133 1,471 0 15 May 2019
Learning multiple visual domains with residual adapters Sylvestre-Alvise Rebuffi Hakan Bilen Andrea Vedaldi OOD 160 933 0 22 May 2017
What to do about non-standard (or non-canonical) language in NLP Barbara Plank 41 96 0 28 Aug 2016