MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining

29 December 2023

Papers citing "MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining"

7 / 7 papers shown

Title
Nomic Embed: Training a Reproducible Long Context Text Embedder Zach Nussbaum John X. Morris Brandon Duderstadt Andriy Mulyar 27 95 0 02 Feb 2024
Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation Cody Blakeney Jessica Zosa Forde Jonathan Frankle Ziliang Zong Matthew L. Leavitt VLM 30 4 0 01 Nov 2022
Pre-train or Annotate? Domain Adaptation with a Constrained Budget Fan Bai Alan Ritter Wei-ping Xu 66 31 0 10 Sep 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 253 695 0 27 Aug 2021
Code and Named Entity Recognition in StackOverflow Jeniya Tabassum Mounica Maddela Wei-ping Xu Alan Ritter 59 114 0 04 May 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,821 0 17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,959 0 20 Apr 2018