ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.12570
36
27

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

22 August 2024
Jamba Team
Barak Lenz
Alan Arazi
Amir Bergman
Avshalom Manevich
Barak Peleg
Ben Aviram
Chen Almagor
Clara Fridman
Dan Padnos
Daniel Gissin
Daniel Jannai
Dor Muhlgay
Dor Zimberg
E. Gerber
Elad Dolev
Eran Krakovsky
Erez Safahi
Erez Schwartz
Gal Cohen
Gal Shachaf
Haim Rozenblum
Hofit Bata
I. Blass
Inbal Magar
Itay Dalmedigos
Jhonathan Osin
Julie Fadlon
Maria Rozman
Matan Danos
Michael Gokhman
Mor Zusman
N. Gidron
Nir Ratner
Noam Gat
N. Rozen
Oded Fried
Ohad Leshno
Omer Antverg
Omri Abend
Opher Lieber
Or Dagan
Orit Cohavi
Raz Alon
Roí Belson
Roi Cohen
Rom Gilad
Roman Glozman
S. Lev
S. Meirom
Tal Delbari
Tal Ness
Tomer Asida
Tom Ben Gal
Tom Braude
Uriya Pumerantz
Yehoshua Cohen
Yonatan Belinkov
Y. Globerson
Yuval Peleg Levy
Y. Shoham
ArXivPDFHTML
Abstract

We present Jamba-1.5, new instruction-tuned large language models based on our Jamba architecture. Jamba is a hybrid Transformer-Mamba mixture of experts architecture, providing high throughput and low memory usage across context lengths, while retaining the same or better quality as Transformer models. We release two model sizes: Jamba-1.5-Large, with 94B active parameters, and Jamba-1.5-Mini, with 12B active parameters. Both models are fine-tuned for a variety of conversational and instruction-following capabilties, and have an effective context length of 256K tokens, the largest amongst open-weight models. To support cost-effective inference, we introduce ExpertsInt8, a novel quantization technique that allows fitting Jamba-1.5-Large on a machine with 8 80GB GPUs when processing 256K-token contexts without loss of quality. When evaluated on a battery of academic and chatbot benchmarks, Jamba-1.5 models achieve excellent results while providing high throughput and outperforming other open-weight models on long-context benchmarks. The model weights for both sizes are publicly available under the Jamba Open Model License and we release ExpertsInt8 as open source.

View on arXiv
Comments on this paper