ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.01692
  4. Cited By
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
v1v2 (latest)

U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models

2 October 2024
Tung-Yu Wu
Pei-Yu Lo
    ReLMLRM
ArXiv (abs)PDFHTML

Papers citing "U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models"

40 / 40 papers shown
Title
Large Language Models Must Be Taught to Know What They Don't Know
Large Language Models Must Be Taught to Know What They Don't Know
Sanyam Kapoor
Nate Gruver
Manley Roberts
Katherine Collins
Arka Pal
Umang Bhatt
Adrian Weller
Samuel Dooley
Micah Goldblum
Andrew Gordon Wilson
84
25
0
12 Jun 2024
Language models scale reliably with over-training and on downstream
  tasks
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALMELMLRM
134
45
0
13 Mar 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLMLLMAG
195
493
0
13 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLMLRM
263
562
0
07 Mar 2024
DeepSeekMoE: Towards Ultimate Expert Specialization in
  Mixture-of-Experts Language Models
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Damai Dai
Chengqi Deng
Chenggang Zhao
R. X. Xu
Huazuo Gao
...
Panpan Huang
Fuli Luo
Chong Ruan
Zhifang Sui
W. Liang
MoE
90
292
0
11 Jan 2024
How predictable is language model benchmark performance?
How predictable is language model benchmark performance?
David Owen
ELMLRM
50
20
0
09 Jan 2024
Mixtral of Experts
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELLMAG
151
1,085
0
08 Jan 2024
LLM360: Towards Fully Transparent Open-Source LLMs
LLM360: Towards Fully Transparent Open-Source LLMs
Zhengzhong Liu
Aurick Qiao
Willie Neiswanger
Hongyi Wang
Bowen Tan
...
Zhiting Hu
Mark Schulze
Preslav Nakov
Timothy Baldwin
Eric Xing
109
78
0
11 Dec 2023
The Falcon Series of Open Language Models
The Falcon Series of Open Language Models
Ebtesam Almazrouei
Hamza Alobeidli
Abdulaziz Alshamsi
Alessandro Cappelli
Ruxandra-Aimée Cojocaru
...
Quentin Malartic
Daniele Mazzotta
Badreddine Noune
B. Pannier
Guilherme Penedo
AI4TSALM
150
443
0
28 Nov 2023
Predicting Emergent Abilities with Infinite Resolution Evaluation
Predicting Emergent Abilities with Infinite Resolution Evaluation
Shengding Hu
Xin Liu
Xu Han
Xinrong Zhang
Chaoqun He
...
Ning Ding
Zebin Ou
Guoyang Zeng
Zhiyuan Liu
Maosong Sun
ELMLRM
63
14
0
05 Oct 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
262
1,827
0
28 Sep 2023
Textbooks Are All You Need II: phi-1.5 technical report
Textbooks Are All You Need II: phi-1.5 technical report
Yuan-Fang Li
Sébastien Bubeck
Ronen Eldan
Allison Del Giorno
Suriya Gunasekar
Yin Tat Lee
ALMLRM
161
474
0
11 Sep 2023
Are Emergent Abilities in Large Language Models just In-Context
  Learning?
Are Emergent Abilities in Large Language Models just In-Context Learning?
Sheng Lu
Irina Bigoulaeva
Rachneet Sachdeva
Harish Tayyar Madabushi
Iryna Gurevych
LRMELMReLM
113
100
0
04 Sep 2023
Challenges and Applications of Large Language Models
Challenges and Applications of Large Language Models
Jean Kaddour
J. Harris
Maximilian Mozes
Herbie Bradley
Roberta Raileanu
R. McHardy
UQCVALMAAML
67
309
0
19 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
299
11,894
0
18 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Mian
OffRL
92
595
0
12 Jul 2023
Inverse Scaling: When Bigger Isn't Better
Inverse Scaling: When Bigger Isn't Better
I. R. McKenzie
Alexander Lyzhov
Michael Pieler
Alicia Parrish
Aaron Mueller
...
Yuhui Zhang
Zhengping Zhou
Najoung Kim
Sam Bowman
Ethan Perez
67
139
0
15 Jun 2023
From Text to MITRE Techniques: Exploring the Malicious Use of Large
  Language Models for Generating Cyber Attack Payloads
From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads
P. Charan
Hrushikesh Chunduri
P. Anand
S. Shukla
62
44
0
24 May 2023
How Predictable Are Large Language Model Capabilities? A Case Study on
  BIG-bench
How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench
Qinyuan Ye
Harvey Yiyun Fu
Xiang Ren
Robin Jia
ELM
59
23
0
24 May 2023
Are Emergent Abilities of Large Language Models a Mirage?
Are Emergent Abilities of Large Language Models a Mirage?
Rylan Schaeffer
Brando Miranda
Oluwasanmi Koyejo
LRM
107
429
0
28 Apr 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and
  Scaling
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
103
1,285
0
03 Apr 2023
The Quantization Model of Neural Scaling
The Quantization Model of Neural Scaling
Eric J. Michaud
Ziming Liu
Uzay Girit
Max Tegmark
MILM
89
85
0
23 Mar 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.4K
14,359
0
15 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.5K
13,247
0
27 Feb 2023
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
386
2,388
0
09 Nov 2022
Inverse scaling can become U-shaped
Inverse scaling can become U-shaped
Jason W. Wei
Najoung Kim
Yi Tay
Quoc V. Le
LRM
52
64
0
03 Nov 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELMReLMLRM
279
2,480
0
15 Jun 2022
OPT: Open Pre-trained Transformer Language Models
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLMOSLMAI4CE
330
3,667
0
02 May 2022
CodeGen: An Open Large Language Model for Code with Multi-Turn Program
  Synthesis
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
Erik Nijkamp
Bo Pang
Hiroaki Hayashi
Lifu Tu
Haiquan Wang
Yingbo Zhou
Silvio Savarese
Caiming Xiong
ELM
150
1,023
0
25 Mar 2022
Few-shot Learning with Multilingual Language Models
Few-shot Learning with Multilingual Language Models
Xi Lin
Todor Mihaylov
Mikel Artetxe
Tianlu Wang
Shuohui Chen
...
Luke Zettlemoyer
Zornitsa Kozareva
Mona T. Diab
Ves Stoyanov
Xian Li
BDLELMLRM
103
307
0
20 Dec 2021
TruthfulQA: Measuring How Models Mimic Human Falsehoods
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
140
1,904
0
08 Sep 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
929
29,436
0
26 Feb 2021
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELMRALM
176
4,434
0
07 Sep 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
795
42,055
0
28 May 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
605
4,822
0
23 Jan 2020
Deep Double Descent: Where Bigger Models and More Data Hurt
Deep Double Descent: Where Bigger Models and More Data Hurt
Preetum Nakkiran
Gal Kaplun
Yamini Bansal
Tristan Yang
Boaz Barak
Ilya Sutskever
121
942
0
04 Dec 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
172
2,485
0
19 May 2019
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book
  Question Answering
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Todor Mihaylov
Peter Clark
Tushar Khot
Ashish Sabharwal
113
1,537
0
08 Sep 2018
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive
  Meaning Representations
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations
Mohammad Taher Pilehvar
Jose Camacho-Collados
195
489
0
28 Aug 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning
  Challenge
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELMRALMLRM
158
2,610
0
14 Mar 2018
1