Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.12372
Cited By
RedPajama: an Open Dataset for Training Large Language Models
19 November 2024
Maurice Weber
Daniel Y. Fu
Quentin Anthony
Yonatan Oren
S. Adams
Anton Alexandrov
Xiaozhong Lyu
Huu Nguyen
Xiaozhe Yao
Virginia Adams
Ben Athiwaratkun
Rahul Chalamala
Kezhen Chen
Max Ryabinin
Tri Dao
Percy Liang
Christopher Ré
Irina Rish
Ce Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RedPajama: an Open Dataset for Training Large Language Models"
42 / 42 papers shown
Title
Addition is almost all you need: Compressing neural networks with double binary factorization
Vladimír Boža
Vladimír Macko
MQ
17
0
0
16 May 2025
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Yishuo Wang
Z. Fu
Jie Cai
Peijun Tang
Hongya Lyu
...
Jie Zhou
Guoyang Zeng
Chaojun Xiao
Xu Han
Zhiyuan Liu
49
0
0
08 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Xuzhi Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
Lawrence Liu
Inesh Chakrabarti
Yixiao Li
Mengdi Wang
Tuo Zhao
Lin F. Yang
MQ
33
0
0
20 Apr 2025
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?
Hansi Zeng
Kai Hui
Honglei Zhuang
Zhen Qin
Zhenrui Yue
Hamed Zamani
Dana Alon
35
0
0
16 Apr 2025
Forecasting from Clinical Textual Time Series: Adaptations of the Encoder and Decoder Language Model Families
Shahriar Noroozizadeh
Sayantan Kumar
Jeremy C. Weiss
AI4TS
28
0
0
14 Apr 2025
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
Wissam Antoun
B. Sagot
Djamé Seddah
MQ
40
0
0
11 Apr 2025
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries
Neil He
Jiahong Liu
Buze Zhang
N. Bui
Ali Maatouk
Menglin Yang
Irwin King
Melanie Weber
Rex Ying
29
0
0
11 Apr 2025
The KL3M Data Project: Copyright-Clean Training Resources for Large Language Models
Michael J Bommarito II
Jillian Bommarito
Daniel Martin Katz
AILaw
59
0
0
10 Apr 2025
Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation
A. Myntti
Erik Henriksson
Veronika Laippala
S. Pyysalo
31
0
0
02 Apr 2025
WinoWhat: A Parallel Corpus of Paraphrased WinoGrande Sentences with Common Sense Categorization
I. Gevers
Victor De Marez
Luna De Bruyne
Walter Daelemans
37
0
0
31 Mar 2025
The Lucie-7B LLM and the Lucie Training Dataset: Open resources for multilingual language generation
Olivier Gouvert
Julie Hunter
Jérôme Louradour
Christophe Cerisara
Evan Dufraisse
Yaya Sy
Laura Rivière
Jean-Pierre Lorré
OpenLLM-France community
164
0
0
15 Mar 2025
Mixture of Experts Made Intrinsically Interpretable
Xingyi Yang
Constantin Venhoff
Ashkan Khakzar
Christian Schroeder de Witt
P. Dokania
Adel Bibi
Philip H. S. Torr
MoE
49
0
0
05 Mar 2025
CoSMoEs: Compact Sparse Mixture of Experts
Patrick Huber
Akshat Shrivastava
Ernie Chang
Chinnadhurai Sankar
Ahmed Aly
Adithya Sagar
MoE
34
0
0
28 Feb 2025
Identifying Sensitive Weights via Post-quantization Integral
Yuezhou Hu
Weiyu Huang
Zichen Liang
Cheng Chen
Jintao Zhang
Jun Zhu
Jianfei Chen
MQ
44
2
0
28 Feb 2025
Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support
G. Wang
Minyu Gao
Shuai Yang
Ya Zhang
Lizhi He
...
Yexuan Zhang
Wanyue Li
Lu Chen
Jintao Fei
Xin Li
113
1
0
25 Feb 2025
LongAttn: Selecting Long-context Training Data via Token-level Attention
Longyun Wu
Dawei Zhu
Guangxiang Zhao
Zhuocheng Yu
Junfeng Ran
Xiangyu Wong
Lin Sun
Sujian Li
43
0
0
24 Feb 2025
GneissWeb: Preparing High Quality Data for LLMs at Scale
Hajar Emami-Gohari
S. Kadhe
Syed Yousaf Shah. Constantin Adam
Abdulhamid A. Adebayo
Praneet Adusumilli
...
Issei Yoshida
Syed Zawad
Petros Zerfos
Yi Zhou
Bishwaranjan Bhattacharjee
52
1
0
19 Feb 2025
Can Your Uncertainty Scores Detect Hallucinated Entity?
Min-Hsuan Yeh
Max Kamachee
Seongheon Park
Yixuan Li
HILM
55
1
0
17 Feb 2025
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models
Xingrun Xing
Zheng Liu
Shitao Xiao
Boyan Gao
Yiming Liang
Wanpeng Zhang
Haokun Lin
Guoqi Li
Jiajun Zhang
LRM
64
1
0
10 Feb 2025
Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging
Pierre Ablin
Angelos Katharopoulos
Skyler Seto
David Grangier
MoMe
50
0
0
03 Feb 2025
Vision-centric Token Compression in Large Language Model
Ling Xing
Alex Jinpeng Wang
Rui Yan
J. Tang
Jinhui Tang
VLM
60
0
0
02 Feb 2025
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
Gouki Minegishi
Hiroki Furuta
Yusuke Iwasawa
Y. Matsuo
49
1
0
09 Jan 2025
A Toolkit for Virtual Reality Data Collection
Tim Rolff
Niklas Hypki
Markus Lappe
Frank Steinicke
26
0
0
23 Dec 2024
Speech Recognition Rescoring with Large Speech-Text Foundation Models
Prashanth Gurunath Shivakumar
J. Kolehmainen
Aditya Gourav
Yi Gu
Ankur Gandhe
Ariya Rastrow
I. Bulyko
AuLLM
28
0
0
25 Sep 2024
Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
Gentiana Rashiti
G. Karunaratne
Mrinmaya Sachan
Abu Sebastian
Abbas Rahimi
RALM
39
0
0
12 Sep 2024
Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning
Joey Hejna
Chethan Bhateja
Yichen Jian
Karl Pertsch
Dorsa Sadigh
25
15
0
26 Aug 2024
DDK: Distilling Domain Knowledge for Efficient Large Language Models
Jiaheng Liu
Chenchen Zhang
Jinyang Guo
Yuanxing Zhang
Haoran Que
...
Congnan Liu
Wenbo Su
Jiamang Wang
Lin Qu
Bo Zheng
45
3
0
23 Jul 2024
SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images
Josh Myers-Dean
Jarek Reynolds
Brian Price
Yifei Fan
Danna Gurari
46
2
0
12 Jul 2024
A Review of the Challenges with Massive Web-mined Corpora Used in Large Language Models Pre-Training
Michał Perełkiewicz
Rafał Poświata
45
1
0
10 Jul 2024
LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
Xingyu Xie
Zhijie Lin
Kim-Chuan Toh
Pan Zhou
32
2
0
05 Jul 2024
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations
Jinqiang Wang
Huansheng Ning
Yi Peng
Qikai Wei
Daniel Tesfai
Wenwei Mao
Tao Zhu
Runhe Huang
LM&MA
AI4MH
ELM
44
5
0
14 Jun 2024
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Jinjie Ni
Fuzhao Xue
Xiang Yue
Yuntian Deng
Mahir Shah
Kabir Jain
Graham Neubig
Yang You
ELM
32
37
0
03 Jun 2024
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
Vladimir Malinovskii
Denis Mazur
Ivan Ilin
Denis Kuznedelev
Konstantin Burlachenko
Kai Yi
Dan Alistarh
Peter Richtárik
MQ
37
19
0
23 May 2024
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Chandeepa Dissanayake
Lahiru Lowe
Sachith Gunasekara
Yasiru Ratnayake
MoE
ALM
32
1
0
18 Apr 2024
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Haozheng Fan
Hao Zhou
Guangtai Huang
Parameswaran Raman
Xinwei Fu
Gaurav Gupta
Dhananjay Ram
Yida Wang
Jun Huan
45
5
0
16 Apr 2024
Me LLaMA: Foundation Large Language Models for Medical Applications
Qianqian Xie
Qingyu Chen
Aokun Chen
C.A.I. Peng
Yan Hu
...
Huan He
Lucila Ohno-Machido
Yonghui Wu
Hua Xu
Jiang Bian
LM&MA
AI4MH
70
4
0
20 Feb 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Chawin Sitawarin
Norman Mu
David A. Wagner
Alexandre Araujo
ELM
24
29
0
15 Feb 2024
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT
Jon Saad-Falcon
Daniel Y. Fu
Simran Arora
Neel Guha
Christopher Ré
RALM
34
16
0
12 Feb 2024
Hardware Phi-1.5B: A Large Language Model Encodes Hardware Domain Specific Knowledge
Weimin Fu
Shijie Li
Yifang Zhao
Haocheng Ma
R. Dutta
Xuan Zhang
Kaichen Yang
Yier Jin
Xiaolong Guo
ALM
34
10
0
27 Jan 2024
Optimizing Distributed Training on Frontier for Large Language Models
Sajal Dash
Isaac Lyngaas
Junqi Yin
Xiao Wang
Romain Egele
Guojing Cong
Feiyi Wang
Prasanna Balaprakash
ALM
MoE
83
13
0
20 Dec 2023
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Xingyu Xie
Pan Zhou
Huan Li
Zhouchen Lin
Shuicheng Yan
ODL
35
148
0
13 Aug 2022
1