Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.12460
Cited By
Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers
26 July 2021
Dan Rothermel
Margaret Li
Tim Rocktaschel
Jakob N. Foerster
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers"
13 / 13 papers shown
Title
Revisiting Random Walks for Learning on Graphs
Jinwoo Kim
Olga Zaghen
Ayhan Suleymanzade
Youngmin Ryou
Seunghoon Hong
116
1
0
01 Jul 2024
Pretrained Transformers as Universal Computation Engines
Kevin Lu
Aditya Grover
Pieter Abbeel
Igor Mordatch
54
221
0
09 Mar 2021
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
147
720
0
08 Nov 2020
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
78
260
0
11 Oct 2019
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
571
2,670
0
03 Sep 2019
Evaluating Protein Transfer Learning with TAPE
Roshan Rao
Nicholas Bhattacharya
Neil Thomas
Yan Duan
Xi Chen
John F. Canny
Pieter Abbeel
Yun S. Song
SSL
94
803
0
19 Jun 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
271
2,315
0
02 May 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,182
0
20 Apr 2018
ListOps: A Diagnostic Dataset for Latent Tree Learning
Nikita Nangia
Samuel R. Bowman
57
138
0
17 Apr 2018
Differentiable plasticity: training plastic neural networks with backpropagation
Thomas Miconi
Jeff Clune
Kenneth O. Stanley
AI4CE
64
154
0
06 Apr 2018
DeepSF: deep convolutional neural network for mapping protein sequences to folds
Jie Hou
B. Adhikari
Jianlin Cheng
63
200
0
04 Jun 2017
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
Ciprian Chelba
Tomas Mikolov
M. Schuster
Qi Ge
T. Brants
P. Koehn
T. Robinson
181
1,108
0
11 Dec 2013
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov
Ilya Sutskever
Kai Chen
G. Corrado
J. Dean
NAI
OCL
397
33,550
0
16 Oct 2013
1