ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1710.03740
  4. Cited By
Mixed Precision Training

Mixed Precision Training

10 October 2017
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
David García
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
ArXivPDFHTML

Papers citing "Mixed Precision Training"

50 / 366 papers shown
Title
GlobalGeoTree: A Multi-Granular Vision-Language Dataset for Global Tree Species Classification
GlobalGeoTree: A Multi-Granular Vision-Language Dataset for Global Tree Species Classification
Yang Mu
Zhitong Xiong
Yi Wang
Muhammad Shahzad
Franz Essl
Mark van Kleunen
Xiao Xiang Zhu
VLM
2
0
0
18 May 2025
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via D\mathbf{\texttt{D}}Dual-H\mathbf{\texttt{H}}Head O\mathbf{\texttt{O}}Optimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
57
0
0
12 May 2025
CogniSNN: A First Exploration to Random Graph Architecture based Spiking Neural Networks with Enhanced Expandability and Neuroplasticity
CogniSNN: A First Exploration to Random Graph Architecture based Spiking Neural Networks with Enhanced Expandability and Neuroplasticity
Yongsheng Huang
Peibo Duan
Zhipeng Liu
Kai Sun
Changsheng Zhang
Bin Zhang
Mingkun Xu
GNN
50
0
0
09 May 2025
ALFEE: Adaptive Large Foundation Model for EEG Representation
ALFEE: Adaptive Large Foundation Model for EEG Representation
Wei Xiong
Junming Lin
Jiangtong Li
Jie Li
Changjun Jiang
33
0
0
07 May 2025
RayZer: A Self-supervised Large View Synthesis Model
RayZer: A Self-supervised Large View Synthesis Model
Hanwen Jiang
Hao Tan
Peng Wang
Haian Jin
Yue Zhao
...
Kai Zhang
Fujun Luan
Kalyan Sunkavalli
Qixing Huang
Georgios Pavlakos
68
0
0
01 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Kaipeng Zhang
Lizhuang Ma
Yufei Guo
Jun Wang
Wenbo Zhang
MQ
57
0
0
01 May 2025
Trends in AI Supercomputers
Trends in AI Supercomputers
Konstantin Pilz
James Sanders
Robi Rahman
Lennart Heim
GNN
ELM
29
0
0
22 Apr 2025
Pychop: Emulating Low-Precision Arithmetic in Numerical Methods and Neural Networks
Pychop: Emulating Low-Precision Arithmetic in Numerical Methods and Neural Networks
Erin Carson
Xinye Chen
54
0
0
10 Apr 2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano
Takumi Ito
Jun Suzuki
LRM
47
1
0
05 Apr 2025
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan
VLM
52
0
0
03 Apr 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
90
1
0
16 Mar 2025
MSConv: Multiplicative and Subtractive Convolution for Face Recognition
Si Zhou
Yain-Whar Si
Xiaochen Yuan
Xiaofan Li
Xiaoxiang Liu
Xinyuan Zhang
Cong Lin
Xueyuan Gong
CVBM
78
0
0
08 Mar 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu (Allen) Zhang
Gaojie Jin
Xianrui Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
39
0
0
24 Feb 2025
SpikeRL: A Scalable and Energy-efficient Framework for Deep Spiking Reinforcement Learning
SpikeRL: A Scalable and Energy-efficient Framework for Deep Spiking Reinforcement Learning
Tokey Tahmid
Mark Gates
P. Luszczek
Catherine D. Schuman
36
0
0
21 Feb 2025
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied
Thomas Adler
Vihang Patil
M. Beck
Korbinian Poppel
Johannes Brandstetter
G. Klambauer
Razvan Pascanu
Sepp Hochreiter
75
5
0
21 Feb 2025
Understanding Silent Data Corruption in LLM Training
Understanding Silent Data Corruption in LLM Training
Jeffrey Ma
Hengzhi Pei
Leonard Lausen
George Karypis
42
0
0
17 Feb 2025
An Efficient Row-Based Sparse Fine-Tuning
An Efficient Row-Based Sparse Fine-Tuning
Cen-Jhih Li
Aditya Bhaskara
56
0
0
17 Feb 2025
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument
Kyungsu Kim
Junghyun Koo
Sungho Lee
Haesun Joung
Kyogu Lee
58
0
0
13 Feb 2025
GoRA: Gradient-driven Adaptive Low Rank Adaptation
GoRA: Gradient-driven Adaptive Low Rank Adaptation
Haonan He
Peng Ye
Yuchen Ren
Yuan Yuan
Lei Chen
AI4TS
AI4CE
178
0
0
13 Feb 2025
DejAIvu: Identifying and Explaining AI Art on the Web in Real-Time with Saliency Maps
DejAIvu: Identifying and Explaining AI Art on the Web in Real-Time with Saliency Maps
Jocelyn Dzuong
96
0
0
12 Feb 2025
Spectral-factorized Positive-definite Curvature Learning for NN Training
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
47
0
0
10 Feb 2025
DAGNet: A Dual-View Attention-Guided Network for Efficient X-ray Security Inspection
DAGNet: A Dual-View Attention-Guided Network for Efficient X-ray Security Inspection
Shilong Hong
Yanzhou Zhou
Weichao Xu
84
0
0
03 Feb 2025
Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models
Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models
Minghan Li
Eric Gaussier
Guodong Zhou
RALM
68
0
0
28 Jan 2025
Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation
Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation
Ahmad Süleyman
Göksel Biricik
52
2
0
15 Jan 2025
EmoNeXt: an Adapted ConvNeXt for Facial Emotion Recognition
EmoNeXt: an Adapted ConvNeXt for Facial Emotion Recognition
Yassine El Boudouri
Amine Bohi
73
15
0
14 Jan 2025
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Yunzhi Zhuge
Hongyu Gu
Lu Zhang
Jinqing Qi
Huchuan Lu
VOS
69
2
0
14 Jan 2025
When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages
When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages
Archchana Sindhujan
Diptesh Kanojia
Constantin Orasan
Shenbin Qian
38
2
0
08 Jan 2025
Wonderland: Navigating 3D Scenes from a Single Image
Wonderland: Navigating 3D Scenes from a Single Image
Hanwen Liang
Junli Cao
Vidit Goel
Guocheng Qian
Sergei Korolev
Demetri Terzopoulos
Konstantinos N. Plataniotis
Sergey Tulyakov
Jian Ren
VGen
128
11
0
16 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
74
0
0
05 Dec 2024
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao
S. Sang
Tiancheng Zhi
Jing Liu
Qing Yan
Linjie Luo
Bo Yuan
Bo Yuan
VLM
86
1
0
26 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
209
3
0
20 Nov 2024
Hysteresis Activation Function for Efficient Inference
Hysteresis Activation Function for Efficient Inference
Moshe Kimhi
Idan Kashani
A. Mendelson
Chaim Baskin
LLMSV
40
0
0
15 Nov 2024
Navigating Extremes: Dynamic Sparsity in Large Output Spaces
Navigating Extremes: Dynamic Sparsity in Large Output Spaces
Nasib Ullah
Erik Schultheis
Mike Lasby
Yani Andrew Ioannou
Rohit Babbar
35
0
0
05 Nov 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Yunfan LU
Kurt Keutzer
Jianfei Chen
Song Han
MQ
75
9
0
25 Oct 2024
CompAct: Compressed Activations for Memory-Efficient LLM Training
CompAct: Compressed Activations for Memory-Efficient LLM Training
Yara Shamshoum
Nitzan Hodos
Yuval Sieradzki
Assaf Schuster
MQ
VLM
50
0
0
20 Oct 2024
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models
Shangda Wu
Yashan Wang
Ruibin Yuan
Zhancheng Guo
Xu Tan
...
Yuanliang Dong
Jiafeng Liu
Xiaobing Li
Feng Yu
Maosong Sun
36
3
0
17 Oct 2024
Breaking the Memory Wall for Heterogeneous Federated Learning via Model
  Splitting
Breaking the Memory Wall for Heterogeneous Federated Learning via Model Splitting
Chunlin Tian
Li Li
Kahou Tam
Yebo Wu
Chengzhong Xu
FedML
29
1
0
12 Oct 2024
Detecting Bias and Enhancing Diagnostic Accuracy in Large Language
  Models for Healthcare
Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare
Pardis Sadat Zahraei
Zahra Shakeri
LM&MA
26
0
0
09 Oct 2024
On Importance of Pruning and Distillation for Efficient Low Resource NLP
On Importance of Pruning and Distillation for Efficient Low Resource NLP
Aishwarya Mirashi
Purva Lingayat
Srushti Sonavane
Tejas Padhiyar
Raviraj Joshi
Geetanjali Kale
34
1
0
21 Sep 2024
DeMansia: Mamba Never Forgets Any Tokens
DeMansia: Mamba Never Forgets Any Tokens
Ricky Fang
Mamba
24
0
0
04 Aug 2024
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
u-μ\muμP: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
58
9
0
24 Jul 2024
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing
  Backpropagation
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
Yuchen Yang
Yingdong Shi
Cheems Wang
Xiantong Zhen
Yuxuan Shi
Jun Xu
40
1
0
24 Jun 2024
ProTrain: Efficient LLM Training via Memory-Aware Techniques
ProTrain: Efficient LLM Training via Memory-Aware Techniques
Hanmei Yang
Jin Zhou
Yao Fu
Xiaoqun Wang
Ramine Roane
Hui Guan
Tongping Liu
VLM
33
0
0
12 Jun 2024
Sustainable self-supervised learning for speech representations
Sustainable self-supervised learning for speech representations
Luis Lugo
Valentin Vielzeuf
31
2
0
11 Jun 2024
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma
Ayan Chakraborty
Elizaveta Kostenok
Danila Mishin
Dongho Ha
...
Martin Jaggi
Ming Liu
Yunho Oh
Suvinay Subramanian
Amir Yazdanbakhsh
MQ
44
6
0
31 May 2024
Improving the Training of Rectified Flows
Improving the Training of Rectified Flows
Sangyun Lee
Zinan Lin
Giulia Fanti
41
19
0
30 May 2024
CHARP: Conversation History AwaReness Probing for Knowledge-grounded
  Dialogue Systems
CHARP: Conversation History AwaReness Probing for Knowledge-grounded Dialogue Systems
Abbas Ghaddar
David Alfonso-Hermelo
Philippe Langlais
Mehdi Rezagholizadeh
Boxing Chen
Prasanna Parthasarathi
39
0
0
24 May 2024
TerDiT: Ternary Diffusion Models with Transformers
TerDiT: Ternary Diffusion Models with Transformers
Xudong Lu
Aojun Zhou
Ziyi Lin
Qi Liu
Yuhui Xu
Renrui Zhang
Yafei Wen
Shuai Ren
Peng Gao
Junchi Yan
MQ
55
2
0
23 May 2024
Granite Code Models: A Family of Open Foundation Models for Code
  Intelligence
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Mayank Mishra
Matt Stallone
Gaoyuan Zhang
Songlin Yang
Aditya Prasad
...
Amith Singhee
Nirmit Desai
David D. Cox
Ruchir Puri
Yikang Shen
AI4TS
56
55
0
07 May 2024
SwiftRL: Towards Efficient Reinforcement Learning on Real
  Processing-In-Memory Systems
SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems
Kailash Gogineni
Sai Santosh Dayapule
Juan Gómez Luna
Karthikeya Gogineni
Peng Wei
Tian-Shing Lan
Mohammad Sadrosadati
Onur Mutlu
Guru Venkataramani
50
10
0
07 May 2024
12345678
Next