ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.11704
41
56

Nemotron-4 340B Technical Report

17 June 2024
Nvidia
:
Bo Adler
Niket Agarwal
Ashwath Aithal
Dong H. Anh
Pallab Bhattacharya
Annika Brundyn
Jared Casper
Bryan Catanzaro
Sharon Clay
Jonathan Cohen
Sirshak Das
Ayush Dattagupta
Olivier Delalleau
Leon Derczynski
Yi Dong
Daniel Egert
Ellie Evans
Aleksander Ficek
Denys Fridman
Shaona Ghosh
Boris Ginsburg
Igor Gitman
Tomasz Grzegorzek
R. Hero
Jining Huang
Vibhu Jawa
Joseph Jennings
Aastha Jhunjhunwala
John Kamalu
Sadaf Khan
Oleksii Kuchaiev
P. LeGresley
Hui Li
Jiwei Liu
Zihan Liu
E. Long
Ameya Mahabaleshwarkar
Somshubra Majumdar
James Maki
Miguel Martinez
Maer Rodrigues de Melo
Ivan Moshkov
Deepak Narayanan
Sean Narenthiran
J. Navarro
Phong Nguyen
Osvald Nitski
Vahid Noroozi
Guruprasad Nutheti
Christopher Parisien
Jupinder Parmar
M. Patwary
Krzysztof Pawelec
Wei Ping
Shrimai Prabhumoye
Rajarshi Roy
Trisha Saar
Vasanth Rao Naik Sabavat
S. Satheesh
Jane Polak Scowcroft
J. Sewall
Pavel Shamis
Gerald Shen
M. Shoeybi
Dave Sizer
Misha Smelyanskiy
Felipe Soares
Makesh Narsimhan Sreedhar
Dan Su
Sandeep Subramanian
Shengyang Sun
Shubham Toshniwal
Hao Wang
Zhilin Wang
Jiaxuan You
Jiaqi Zeng
Jimmy Zhang
Jing Zhang
Vivienne Zhang
Yian Zhang
Chen Zhu
ArXivPDFHTML
Abstract

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process.

View on arXiv
Comments on this paper