Regularizing Transformers With Deep Probabilistic Layers

23 August 2021

Papers citing "Regularizing Transformers With Deep Probabilistic Layers"

8 / 8 papers shown

Title
AttentionDrop: A Novel Regularization Method for Transformer Models Mirza Samad Ahmed Baig Syeda Anshrah Gillani Abdul Akbar Khan Shahid Munir Shah 31 0 0 16 Apr 2025
Faraday: Synthetic Smart Meter Generator for the smart grid Sheng Chai Gus Chadney 36 4 0 05 Apr 2024
ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora Ouyang Xuan Shuohuan Wang Chao Pang Yu Sun Hao Tian Hua-Hong Wu Haifeng Wang 62 100 0 31 Dec 2020
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 280 2,015 0 28 Jul 2020
Language GANs Falling Short Massimo Caccia Lucas Caccia W. Fedus Hugo Larochelle Joelle Pineau Laurent Charlin 121 215 0 06 Nov 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,959 0 20 Apr 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,743 0 26 Sep 2016
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 218 7,926 0 17 Aug 2015