ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.13512
  4. Cited By
Add a SideNet to your MainNet

Add a SideNet to your MainNet

14 July 2020
Adrien Morisot
ArXivPDFHTML

Papers citing "Add a SideNet to your MainNet"

21 / 21 papers shown
Title
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
469
41,106
0
28 May 2020
Calibration of Pre-trained Transformers
Calibration of Pre-trained Transformers
Shrey Desai
Greg Durrett
UQLM
266
295
0
17 Mar 2020
Controlling Computation versus Quality for Neural Sequence Models
Controlling Computation versus Quality for Neural Sequence Models
Ankur Bapna
N. Arivazhagan
Orhan Firat
40
30
0
17 Feb 2020
Emergent Properties of Finetuned Language Representation Models
Emergent Properties of Finetuned Language Representation Models
Alexandre Matton
Luke de Oliveira
SSL
40
1
0
23 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
121
7,437
0
02 Oct 2019
What Does BERT Look At? An Analysis of BERT's Attention
What Does BERT Look At? An Analysis of BERT's Attention
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
186
1,586
0
11 Jun 2019
SCAN: A Scalable Neural Networks Framework Towards Compact and Efficient
  Models
SCAN: A Scalable Neural Networks Framework Towards Compact and Efficient Models
Linfeng Zhang
Zhanhong Tan
Jiebo Song
Jingwei Chen
Chenglong Bao
Kaisheng Ma
28
71
0
27 May 2019
Streaming End-to-end Speech Recognition For Mobile Devices
Streaming End-to-end Speech Recognition For Mobile Devices
Yanzhang He
Tara N. Sainath
Rohit Prabhavalkar
Ian McGraw
R. Álvarez
...
K. Sim
Tom Bagby
Shuo-yiin Chang
Kanishka Rao
A. Gruenstein
68
624
0
15 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
943
93,936
0
11 Oct 2018
Deep Learning Scaling is Predictable, Empirically
Deep Learning Scaling is Predictable, Empirically
Joel Hestness
Sharan Narang
Newsha Ardalani
G. Diamos
Heewoo Jun
Hassan Kianinejad
Md. Mostofa Ali Patwary
Yang Yang
Yanqi Zhou
80
728
0
01 Dec 2017
On Calibration of Modern Neural Networks
On Calibration of Modern Neural Networks
Chuan Guo
Geoff Pleiss
Yu Sun
Kilian Q. Weinberger
UQCV
195
5,774
0
14 Jun 2017
Adaptive Neural Networks for Efficient Inference
Adaptive Neural Networks for Efficient Inference
Tolga Bolukbasi
Joseph Wang
O. Dekel
Venkatesh Saligrama
41
354
0
25 Feb 2017
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.4K
192,638
0
10 Dec 2015
Conditional Computation in Neural Networks for faster models
Conditional Computation in Neural Networks for faster models
Emmanuel Bengio
Pierre-Luc Bacon
Joelle Pineau
Doina Precup
AI4CE
94
320
0
19 Nov 2015
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
194
8,793
0
01 Oct 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
236
19,523
0
09 Mar 2015
Batch Normalization: Accelerating Deep Network Training by Reducing
  Internal Covariate Shift
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
298
43,154
0
11 Feb 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
806
149,474
0
22 Dec 2014
Explaining and Harnessing Adversarial Examples
Explaining and Harnessing Adversarial Examples
Ian Goodfellow
Jonathon Shlens
Christian Szegedy
AAML
GAN
161
18,922
0
20 Dec 2014
Going Deeper with Convolutions
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
299
43,511
0
17 Sep 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
923
99,991
0
04 Sep 2014
1