ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.07185
  4. Cited By
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

12 May 2023
L. Yu
Daniel Simig
Colin Flaherty
Armen Aghajanyan
Luke Zettlemoyer
M. Lewis
ArXivPDFHTML

Papers citing "MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers"

27 / 77 papers shown
Title
Toucan: Token-Aware Character Level Language Modeling
Toucan: Token-Aware Character Level Language Modeling
William Fleshman
Benjamin Van Durme
20
3
0
15 Nov 2023
Ultra-Long Sequence Distributed Transformer
Ultra-Long Sequence Distributed Transformer
Xiao Wang
Isaac Lyngaas
A. Tsaris
Peng Chen
Sajal Dash
Mayanka Chandra Shekar
Tao Luo
Hong-Jun Yoon
M. Wahib
John P. Gounley
27
4
0
04 Nov 2023
Learn Your Tokens: Word-Pooled Tokenization for Language Modeling
Learn Your Tokens: Word-Pooled Tokenization for Language Modeling
Avijit Thawani
Saurabh Ghanekar
Xiaoyuan Zhu
Jay Pujara
38
4
0
17 Oct 2023
Tokenizer Choice For LLM Training: Negligible or Crucial?
Tokenizer Choice For LLM Training: Negligible or Crucial?
Mehdi Ali
Michael Fromm
Klaudia Thellmann
Richard Rutmann
Max Lübbering
...
Malte Ostendorff
Samuel Weinbach
R. Sifa
Stefan Kesselheim
Nicolas Flores-Herr
23
47
0
12 Oct 2023
Learning Stackable and Skippable LEGO Bricks for Efficient,
  Reconfigurable, and Variable-Resolution Diffusion Modeling
Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Huangjie Zheng
Zhendong Wang
Jianbo Yuan
Guanghan Ning
Pengcheng He
Quanzeng You
Hongxia Yang
Mingyuan Zhou
38
8
0
10 Oct 2023
SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by
  Simulation
SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation
Matthias Lindemann
Alexander Koller
Ivan Titov
AI4CE
19
1
0
01 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBM
AuLLM
28
114
0
01 Oct 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
31
15
0
28 Sep 2023
Code Llama: Open Foundation Models for Code
Code Llama: Open Foundation Models for Code
Baptiste Rozière
Jonas Gehring
Fabian Gloeckle
Sten Sootla
Itai Gat
...
Hugo Touvron
Louis Martin
Nicolas Usunier
Thomas Scialom
Gabriel Synnaeve
ELM
ALM
63
1,906
0
24 Aug 2023
Generative Models as a Complex Systems Science: How can we make sense of
  large language model behavior?
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
30
14
0
31 Jul 2023
Lost in the Middle: How Language Models Use Long Contexts
Lost in the Middle: How Language Models Use Long Contexts
Nelson F. Liu
Kevin Lin
John Hewitt
Ashwin Paranjape
Michele Bevilacqua
Fabio Petroni
Percy Liang
RALM
40
1,404
0
06 Jul 2023
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Kaiyu Yang
Aidan M. Swope
Alex Gu
Rahul Chalamala
Peiyang Song
Shixing Yu
Saad Godil
R. Prenger
Anima Anandkumar
RALM
19
208
0
27 Jun 2023
LM-VC: Zero-shot Voice Conversion via Speech Generation based on
  Language Models
LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Zhichao Wang
Yuan-Jui Chen
Linfu Xie
Qiao Tian
Yuping Wang
72
30
0
18 Jun 2023
AutoML in the Age of Large Language Models: Current Challenges, Future
  Opportunities and Risks
AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks
Alexander Tornede
Difan Deng
Theresa Eimer
Joseph Giovanelli
Aditya Mohan
...
Sarah Segel
Daphne Theodorakopoulos
Tanja Tornede
Henning Wachsmuth
Marius Lindauer
28
23
0
13 Jun 2023
Hierarchical Attention Encoder Decoder
Hierarchical Attention Encoder Decoder
Asier Mujika
BDL
22
3
0
01 Jun 2023
Bytes Are All You Need: Transformers Operating Directly On File Bytes
Bytes Are All You Need: Transformers Operating Directly On File Bytes
Maxwell Horton
Sachin Mehta
Ali Farhadi
Mohammad Rastegari
VLM
22
6
0
31 May 2023
Focus Your Attention (with Adaptive IIR Filters)
Focus Your Attention (with Adaptive IIR Filters)
Shahar Lutati
Itamar Zimerman
Lior Wolf
32
9
0
24 May 2023
FIT: Far-reaching Interleaved Transformers
FIT: Far-reaching Interleaved Transformers
Ting-Li Chen
Lala Li
26
12
0
22 May 2023
Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5
  for Machine Translation
Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation
Lukas Edman
Gabriele Sarti
Antonio Toral
Gertjan van Noord
Arianna Bisazza
24
11
0
28 Feb 2023
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
  Mechanisms in Sequence Learning
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
118
17
0
30 May 2022
Combiner: Full Attention Transformer with Sparse Computation Cost
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
78
77
0
12 Jul 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,781
0
24 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
276
1,996
0
31 Dec 2020
Shortformer: Better Language Modeling using Shorter Inputs
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
230
89
0
31 Dec 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
252
580
0
12 Mar 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
261
4,489
0
23 Jan 2020
Pixel Recurrent Neural Networks
Pixel Recurrent Neural Networks
Aaron van den Oord
Nal Kalchbrenner
Koray Kavukcuoglu
SSeg
GAN
251
2,550
0
25 Jan 2016
Previous
12