ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02677
  4. Cited By
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
v1v2 (latest)

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 June 2017
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
    3DH
ArXiv (abs)PDFHTML

Papers citing "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"

50 / 2,054 papers shown
Title
Learn2Mix: Training Neural Networks Using Adaptive Data Integration
Learn2Mix: Training Neural Networks Using Adaptive Data Integration
Shyam Venkatasubramanian
Vahid Tarokh
174
0
0
17 Feb 2025
Direct Preference Optimization-Enhanced Multi-Guided Diffusion Model for Traffic Scenario Generation
Direct Preference Optimization-Enhanced Multi-Guided Diffusion Model for Traffic Scenario Generation
Seungjun Yu
Kisung Kim
Daejung Kim
Haewook Han
Jinhan Lee
124
1
0
14 Feb 2025
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
Alice Bizeul
Thomas M. Sutter
Alain Ryser
Bernhard Schölkopf
Julius von Kügelgen
Julia E. Vogt
197
2
0
10 Feb 2025
Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data
Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data
Huawei Lin
Jun Woo Chung
Yingjie Lao
Weijie Zhao
77
0
0
03 Feb 2025
BrainOOD: Out-of-distribution Generalizable Brain Network Analysis
BrainOOD: Out-of-distribution Generalizable Brain Network Analysis
Jiaxing Xu
Yongqiang Chen
Xia Dong
Mengcheng Lan
Tiancheng Huang
Qingtian Bian
James Cheng
Yiping Ke
OOD
119
2
0
02 Feb 2025
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers
Xinyu Tang
Xiaolei Wang
Wayne Xin Zhao
Siyuan Lu
Yaliang Li
Ji-Rong Wen
LRM
131
18
0
28 Jan 2025
On the use of neural networks for the structural characterization of polymeric porous materials
On the use of neural networks for the structural characterization of polymeric porous materials
Jorge Torre
Suset Barroso-Solares
M.A. Rodríguez-Pérez
Javier Pinto
116
6
0
25 Jan 2025
Celo: Training Versatile Learned Optimizers on a Compute Diet
Celo: Training Versatile Learned Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
446
0
0
22 Jan 2025
Increasing Batch Size Improves Convergence of Stochastic Gradient Descent with Momentum
Increasing Batch Size Improves Convergence of Stochastic Gradient Descent with Momentum
Keisuke Kamo
Hideaki Iiduka
129
0
0
15 Jan 2025
Self-supervised Transformation Learning for Equivariant Representations
Self-supervised Transformation Learning for Equivariant Representations
Jaemyung Yu
Jaehyun Choi
Dong-Jae Lee
H. Hong
Junmo Kim
90
1
0
15 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
115
4
0
10 Jan 2025
Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain
Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain
Juntao Zhang
Kun Bian
Peng Cheng
You Zhou
Jianning Liu
Wenbo An
Jun Zhou
Kun Shao
Mamba
130
3
0
08 Jan 2025
Normalizing Batch Normalization for Long-Tailed Recognition
Yuxiang Bao
Guoliang Kang
Linlin Yang
Xiaoyue Duan
Bo Zhao
Baochang Zhang
MQ
111
0
0
06 Jan 2025
Human Gaze Boosts Object-Centered Representation Learning
Timothy Schaumlöffel
A. Aubret
Gemma Roig
Jochen Triesch
119
0
0
06 Jan 2025
Towards Precise Scaling Laws for Video Diffusion Transformers
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin
Yaqi Zhao
Mingwu Zheng
Ke Lin
Jiarong Ou
...
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
Kun Gai
197
3
0
03 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
464
0
0
30 Dec 2024
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small
  LLMs
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
197
7
0
17 Dec 2024
Echo: Simulating Distributed Training At Scale
Echo: Simulating Distributed Training At Scale
Yicheng Feng
Yuetao Chen
Kaiwen Chen
Jingzong Li
Tianyuan Wu
Peng Cheng
Chuan Wu
Wei Wang
Tsung-Yi Ho
Hong Xu
129
2
0
17 Dec 2024
Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K
  Video Restoration under Codec Compression
Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression
Ali Mollaahmadi Dehaghi
Reza Razavi
Mohammad Moshirpour
123
1
0
12 Dec 2024
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image
  Modeling
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling
Zhong-Yu Li
Yunheng Li
Deng-Ping Fan
Ming-Ming Cheng
177
0
0
24 Nov 2024
Lie-Equivariant Quantum Graph Neural Networks
Lie-Equivariant Quantum Graph Neural Networks
Jogi Suda Neto
Roy T. Forestano
S. Gleyzer
K. Kong
Konstantin T. Matchev
Katia Matcheva
146
0
0
22 Nov 2024
Content-Aware Preserving Image Generation
Content-Aware Preserving Image Generation
Giang H. Le
Anh Q. Nguyen
Byeongkeun Kang
Yeejin Lee
DiffM
138
0
0
15 Nov 2024
On the Surprising Effectiveness of Attention Transfer for Vision
  Transformers
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Alexander C. Li
Yuandong Tian
Bin Chen
Deepak Pathak
Xinlei Chen
75
3
0
14 Nov 2024
Pay Attention to the Keys: Visual Piano Transcription Using Transformers
Pay Attention to the Keys: Visual Piano Transcription Using Transformers
Uros Zivanovic
Ivan Pilkov
Carlos Eduardo Cancino-Chacón
ViT
44
0
0
13 Nov 2024
Artificial Intelligence for Biomedical Video Generation
Artificial Intelligence for Biomedical Video Generation
Linyuan Li
Jianing Qiu
Anujit Saha
Lin Li
Poyuan Li
Mengxian He
Ziyu Guo
Wu Yuan
VGen
177
0
0
12 Nov 2024
Client Contribution Normalization for Enhanced Federated Learning
Client Contribution Normalization for Enhanced Federated Learning
Mayank Kumar Kundalwal
Anurag Saraswat
Ishan Mishra
Deepak Mishra
FedML
62
0
0
10 Nov 2024
Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating
  Financial Large Language Models
Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models
Xiaojun Wu
Junxi Liu
Huanyi Su
Zhouchi Lin
Yiyan Qi
...
Fuwei Wang
Saizhuo Wang
Fengrui Hua
Jia Li
Jian Guo
109
2
0
09 Nov 2024
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
Yoni Choukroun
Shlomi Azoulay
P. Kisilev
83
0
0
06 Nov 2024
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated
  Parameters by Tencent
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Xingwu Sun
Yanfeng Chen
Yanwen Huang
Ruobing Xie
Jiaqi Zhu
...
Zhanhui Kang
Yong Yang
Yuhong Liu
Di Wang
Jie Jiang
MoEALMELM
165
34
0
04 Nov 2024
Unified Speech Recognition: A Single Model for Auditory, Visual, and
  Audiovisual Inputs
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
A. Haliassos
Rodrigo Mira
Honglie Chen
Zoe Landgraf
Stavros Petridis
Maja Pantic
SSL
86
7
0
04 Nov 2024
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Atli Kosson
Bettina Messmer
Martin Jaggi
AI4CE
71
5
0
31 Oct 2024
How Does Critical Batch Size Scale in Pre-training?
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
187
18
0
29 Oct 2024
Accelerating Augmentation Invariance Pretraining
Accelerating Augmentation Invariance Pretraining
Jinhong Lin
Cheng-En Wu
Yibing Wei
Pedro Morgado
ViT
81
1
0
27 Oct 2024
Enhancing pretraining efficiency for medical image segmentation via transferability metrics
Enhancing pretraining efficiency for medical image segmentation via transferability metrics
Gábor Hidy
Bence Bakos
András Lukács
86
1
0
24 Oct 2024
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for
  Contrastive Loss
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Zesen Cheng
Hang Zhang
Kehan Li
Sicong Leng
Zhiqiang Hu
Fei Wu
Deli Zhao
Xin Li
Lidong Bing
72
2
0
22 Oct 2024
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large
  Multimodal Models
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models
Yufei Zhan
Hongyin Zhao
Yousong Zhu
Fan Yang
Ming Tang
Jinqiao Wang
MLLM
90
1
0
21 Oct 2024
Pipeline Gradient-based Model Training on Analog In-memory Accelerators
Pipeline Gradient-based Model Training on Analog In-memory Accelerators
Zhaoxian Wu
Quan-Wu Xiao
Tayfun Gokmen
H. Tsai
Kaoutar El Maghraoui
Tianyi Chen
69
1
0
19 Oct 2024
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models
Shangda Wu
Yashan Wang
Ruibin Yuan
Zhancheng Guo
Xu Tan
...
Yuanliang Dong
Jiafeng Liu
Xiaobing Li
Feng Yu
Maosong Sun
215
5
0
17 Oct 2024
The Ingredients for Robotic Diffusion Transformers
The Ingredients for Robotic Diffusion Transformers
Sudeep Dasari
Oier Mees
Sebastian Zhao
Mohan Kumar Srirama
Sergey Levine
118
24
0
14 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
141
7
0
14 Oct 2024
Learning General Representation of 12-Lead Electrocardiogram with a
  Joint-Embedding Predictive Architecture
Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture
Sehun Kim
64
2
0
11 Oct 2024
Packing Analysis: Packing Is More Appropriate for Large Models or
  Datasets in Supervised Fine-tuning
Packing Analysis: Packing Is More Appropriate for Large Models or Datasets in Supervised Fine-tuning
Shuhe Wang
Guoyin Wang
Yucheng Wang
Jiwei Li
Eduard H. Hovy
Chen Guo
121
4
0
10 Oct 2024
Self-Supervised Learning for Real-World Object Detection: a Survey
Self-Supervised Learning for Real-World Object Detection: a Survey
Alina Ciocarlan
Sidonie Lefebvre
S. L. Hégarat-Mascle
Arnaud Woiselle
ObjD
94
1
0
09 Oct 2024
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense
  and MoE Models in Large Language Models
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models
Siqi Wang
Zhengyu Chen
Bei Li
Keqing He
Min Zhang
Jingang Wang
104
2
0
08 Oct 2024
Swift Sampler: Efficient Learning of Sampler by 10 Parameters
Swift Sampler: Efficient Learning of Sampler by 10 Parameters
Jiawei Yao
Chuming Li
Canran Xiao
96
6
0
08 Oct 2024
RoWeeder: Unsupervised Weed Mapping through Crop-Row Detection
RoWeeder: Unsupervised Weed Mapping through Crop-Row Detection
Pasquale De Marinis
Gennaro Vessio
Giovanna Castellano
64
0
0
07 Oct 2024
Residual Kolmogorov-Arnold Network for Enhanced Deep Learning
Residual Kolmogorov-Arnold Network for Enhanced Deep Learning
Ray Congrui Yu
Sherry Wu
Jiang Gui
113
1
0
07 Oct 2024
MindFlayer SGD: Efficient Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times
MindFlayer SGD: Efficient Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times
Artavazd Maranjyan
Omar Shaikh Omar
Peter Richtárik
84
4
0
05 Oct 2024
Designing Concise ConvNets with Columnar Stages
Designing Concise ConvNets with Columnar Stages
Ashish Kumar
Jaesik Park
MQ
116
0
0
05 Oct 2024
TPN: Transferable Proto-Learning Network towards Few-shot Document-Level
  Relation Extraction
TPN: Transferable Proto-Learning Network towards Few-shot Document-Level Relation Extraction
Yu Zhang
Zhao Kang
ViT
79
1
0
01 Oct 2024
Previous
12345...404142
Next