ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.04887
  4. Cited By
ReZero is All You Need: Fast Convergence at Large Depth

ReZero is All You Need: Fast Convergence at Large Depth

10 March 2020
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
    AI4CE
ArXivPDFHTML

Papers citing "ReZero is All You Need: Fast Convergence at Large Depth"

50 / 72 papers shown
Title
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
Wenjie Qu
Wenxiang Guo
Changhao Pan
Zehan Zhu
Tao Jin
Zhou Zhao
VGen
54
1
0
29 Apr 2025
Versatile Framework for Song Generation with Prompt-based Control
Versatile Framework for Song Generation with Prompt-based Control
Wenjie Qu
Wenxiang Guo
Changhao Pan
Zehan Zhu
Ruiqi Li
...
Rongjie Huang
Ruiyuan Zhang
Zhiqing Hong
Ziyue Jiang
Zhou Zhao
77
2
0
27 Apr 2025
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang
Jing Yu
Keke Gai
Jiamin Zhuang
Gang Xiong
Gaopeng Gou
Qi Wu
VGen
54
1
0
21 Mar 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
76
8
0
24 Feb 2025
Optimizing Job Allocation using Reinforcement Learning with Graph Neural Networks
Optimizing Job Allocation using Reinforcement Learning with Graph Neural Networks
Lars C.P.M. Quaedvlieg
63
0
0
31 Jan 2025
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
83
0
0
28 Jan 2025
GraphXForm: Graph transformer for computer-aided molecular design
GraphXForm: Graph transformer for computer-aided molecular design
Jonathan Pirnay
Jan G. Rittig
Alexander B. Wolf
Martin Grohe
Jakob Burger
Alexander Mitsos
D. G. Grimm
AI4CE
58
1
0
03 Nov 2024
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Federico Arangath Joseph
Jerome Sieber
Melanie Zeilinger
Carmen Amo Alonso
33
0
0
14 Oct 2024
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis
Hyunwoo Lee
Hayoung Choi
Hyunju Kim
39
1
0
03 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
39
3
0
02 Oct 2024
GOAL: A Generalist Combinatorial Optimization Agent Learner
GOAL: A Generalist Combinatorial Optimization Agent Learner
Darko Drakulic
Sofia Michel
J. Andreoli
39
7
0
21 Jun 2024
Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by
  Learning from Floor Plans
Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by Learning from Floor Plans
Ludvig Ericson
Patric Jensfelt
42
7
0
13 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
42
3
0
29 May 2024
Principled Architecture-aware Scaling of Hyperparameters
Principled Architecture-aware Scaling of Hyperparameters
Wuyang Chen
Junru Wu
Zhangyang Wang
Boris Hanin
AI4CE
49
0
0
27 Feb 2024
Latent assimilation with implicit neural representations for unknown
  dynamics
Latent assimilation with implicit neural representations for unknown dynamics
Zhuoyuan Li
Bin Dong
Pingwen Zhang
AI4CE
24
3
0
18 Sep 2023
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Jing Chen
Xingcheng Song
Zhendong Peng
Binbin Zhang
Fuping Pan
Zhiyong Wu
DiffM
29
16
0
31 Aug 2023
Multiplicative update rules for accelerating deep learning training and
  increasing robustness
Multiplicative update rules for accelerating deep learning training and increasing robustness
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
AAML
OOD
36
2
0
14 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
22
41
0
12 Jul 2023
A Semi-Autoregressive Graph Generative Model for Dependency Graph
  Parsing
A Semi-Autoregressive Graph Generative Model for Dependency Graph Parsing
Ye Ma
Mingming Sun
P. Li
GNN
28
1
0
21 Jun 2023
TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
Taorong Liu
Liang Liao
Delin Chen
Jing Xiao
Zheng Wang
Chia-Wen Lin
Shiníchi Satoh
ViT
DiffM
39
6
0
20 Jun 2023
QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster
  Inference on Mobile Platforms
QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms
Guillaume Berger
Manik Dhingra
Antoine Mercier
Yash Savani
Sunny Panchal
Fatih Porikli
SupR
20
5
0
08 Mar 2023
On the Ideal Number of Groups for Isometric Gradient Propagation
On the Ideal Number of Groups for Isometric Gradient Propagation
Bum Jun Kim
Hyeyeon Choi
Hyeonah Jang
Sang Woo Kim
32
1
0
07 Feb 2023
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
  Property Prediction
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction
Christopher Fifty
Joseph M. Paggi
Ehsan Amid
J. Leskovec
R. Dror
AI4CE
22
0
0
04 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
Expected Gradients of Maxout Networks and Consequences to Parameter
  Initialization
Expected Gradients of Maxout Networks and Consequences to Parameter Initialization
Hanna Tseran
Guido Montúfar
ODL
30
0
0
17 Jan 2023
Asymptotic Analysis of Deep Residual Networks
Asymptotic Analysis of Deep Residual Networks
R. Cont
Alain Rossier
Renyuan Xu
27
4
0
15 Dec 2022
NoMorelization: Building Normalizer-Free Models from a Sample's
  Perspective
NoMorelization: Building Normalizer-Free Models from a Sample's Perspective
Chang-Shu Liu
Yuwen Yang
Yue Ding
Hongtao Lu
34
2
0
13 Oct 2022
ZITS++: Image Inpainting by Improving the Incremental Transformer on
  Structural Priors
ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors
Chenjie Cao
Qiaole Dong
Yanwei Fu
38
30
0
12 Oct 2022
Dynamical Isometry for Residual Networks
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODL
AI4CE
40
2
0
05 Oct 2022
On the Factory Floor: ML Engineering for Industrial-Scale Ads
  Recommendation Models
On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models
Rohan Anil
S. Gadanho
Danya Huang
Nijith Jacob
Zhuoshu Li
...
Cristina Pop
Kevin Regan
G. Shamir
Rakesh Shivanna
Qiqi Yan
3DV
29
41
0
12 Sep 2022
Learning an Efficient Multimodal Depth Completion Model
Learning an Efficient Multimodal Depth Completion Model
Dewang Hou
Yuanyuan Du
Kai Zhao
Yang Zhao
25
5
0
23 Aug 2022
Learning Prior Feature and Attention Enhanced Image Inpainting
Learning Prior Feature and Attention Enhanced Image Inpainting
Chenjie Cao
Qiaole Dong
Yanwei Fu
DiffM
33
25
0
03 Aug 2022
Removing Batch Normalization Boosts Adversarial Training
Removing Batch Normalization Boosts Adversarial Training
Haotao Wang
Aston Zhang
Shuai Zheng
Xingjian Shi
Mu Li
Zhangyang Wang
40
41
0
04 Jul 2022
AutoInit: Automatic Initialization via Jacobian Tuning
AutoInit: Automatic Initialization via Jacobian Tuning
Tianyu He
Darshil Doshi
Andrey Gromov
19
4
0
27 Jun 2022
Scaling ResNets in the Large-depth Regime
Scaling ResNets in the Large-depth Regime
Pierre Marion
Adeline Fermanian
Gérard Biau
Jean-Philippe Vert
26
16
0
14 Jun 2022
Learning What and Where: Disentangling Location and Identity Tracking
  Without Supervision
Learning What and Where: Disentangling Location and Identity Tracking Without Supervision
Manuel Traub
S. Otte
Tobias Menge
Matthias Karlbauer
Jannik Thummel
Martin Volker Butz
34
20
0
26 May 2022
Hypercomplex Image-to-Image Translation
Hypercomplex Image-to-Image Translation
Eleonora Grassucci
Luigi Sigillo
A. Uncini
Danilo Comminiello
29
7
0
04 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
46
3,360
0
29 Apr 2022
Automated Progressive Learning for Efficient Training of Vision
  Transformers
Automated Progressive Learning for Efficient Training of Vision Transformers
Changlin Li
Bohan Zhuang
Guangrun Wang
Xiaodan Liang
Xiaojun Chang
Yi Yang
28
46
0
28 Mar 2022
Image Super-Resolution With Deep Variational Autoencoders
Image Super-Resolution With Deep Variational Autoencoders
Darius Chira
Ilian Haralampiev
Ole Winther
Andrea Dittadi
Valentin Liévin
DRL
35
32
0
17 Mar 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored
  Rectifiers
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers
Guodong Zhang
Aleksandar Botev
James Martens
OffRL
39
26
0
15 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
26
148
0
07 Mar 2022
FloorGenT: Generative Vector Graphic Model of Floor Plans for Robotics
FloorGenT: Generative Vector Graphic Model of Floor Plans for Robotics
Ludvig Ericson
Patric Jensfelt
3DV
14
2
0
07 Mar 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking
  Positional Encoding
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
Qiaole Dong
Chenjie Cao
Yanwei Fu
CLL
30
138
0
02 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
26
156
0
01 Mar 2022
Hierarchical Graph-Convolutional Variational AutoEncoding for Generative
  Modelling of Human Motion
Hierarchical Graph-Convolutional Variational AutoEncoding for Generative Modelling of Human Motion
Anthony Bourached
Robert J. Gray
Xiaodong Guan
Ryan-Rhys Griffiths
A. Jha
P. Nachev
3DH
DRL
14
1
0
24 Nov 2021
NormFormer: Improved Transformer Pretraining with Extra Normalization
NormFormer: Improved Transformer Pretraining with Extra Normalization
Sam Shleifer
Jason Weston
Myle Ott
AI4CE
33
74
0
18 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural
  Networks
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
A comparison of combined data assimilation and machine learning methods
  for offline and online model error correction
A comparison of combined data assimilation and machine learning methods for offline and online model error correction
A. Farchi
Marc Bocquet
P. Laloyaux
Massimo Bonavita
Quentin Malartic
OffRL
25
35
0
23 Jul 2021
MedGPT: Medical Concept Prediction from Clinical Narratives
MedGPT: Medical Concept Prediction from Clinical Narratives
Z. Kraljevic
Anthony Shek
D. Bean
R. Bendayan
J. Teo
Richard J. B. Dobson
LM&MA
AI4TS
MedIm
25
39
0
07 Jul 2021
12
Next