ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.08632
  4. Cited By
Pretraining on the Test Set Is All You Need

Pretraining on the Test Set Is All You Need

13 September 2023
Rylan Schaeffer
ArXivPDFHTML

Papers citing "Pretraining on the Test Set Is All You Need"

23 / 23 papers shown
Title
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
71
0
0
24 Feb 2025
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Jacob Haimes
Cenny Wenner
Kunvar Thaman
Vassil Tashev
Clement Neo
Esben Kran
Jason Schreiber
34
5
0
11 Oct 2024
Unveiling the Spectrum of Data Contamination in Language Models: A
  Survey from Detection to Remediation
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Chunyuan Deng
Yilun Zhao
Yuzhao Heng
Yitong Li
Jiannan Cao
Xiangru Tang
Arman Cohan
35
13
0
20 Jun 2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
47
23
0
11 Jun 2024
Kotlin ML Pack: Technical Report
Kotlin ML Pack: Technical Report
Sergey Titov
Mikhail Evtikhiev
Anton Shapkin
Oleg Smirnov
Sergei Boytsov
...
Dariia Karaeva
Maksim Sheptyakov
Mikhail Arkhipov
T. Bryksin
Egor Bogomolov
32
0
0
29 May 2024
The Mosaic Memory of Large Language Models
The Mosaic Memory of Large Language Models
Igor Shilov
Matthieu Meeus
Yves-Alexandre de Montjoye
47
3
0
24 May 2024
EnviroExam: Benchmarking Environmental Science Knowledge of Large
  Language Models
EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models
Yu Huang
Liang Guo
Wanqian Guo
Zhe Tao
Yang Lv
Zhihao Sun
Dongfang Zhao
ELM
25
1
0
18 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
62
261
0
16 May 2024
Vibe-Eval: A hard evaluation suite for measuring progress of multimodal
  language models
Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
Piotr Padlewski
Max Bain
Matthew Henderson
Zhongkai Zhu
Nishant Relan
...
Che Zheng
Cyprien de Masson dÁutume
Dani Yogatama
Mikel Artetxe
Yi Tay
VLM
84
26
0
03 May 2024
Balancing Speciality and Versatility: a Coarse to Fine Framework for
  Supervised Fine-tuning Large Language Model
Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
Hengyuan Zhang
Yanru Wu
Dawei Li
Zacc Yang
Rui Zhao
Yong Jiang
Fei Tan
ALM
32
1
0
16 Apr 2024
Sample-Efficient Human Evaluation of Large Language Models via Maximum
  Discrepancy Competition
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Kehua Feng
Keyan Ding
Kede Ma
Zhihua Wang
Qiang Zhang
Huajun Chen
39
10
0
10 Apr 2024
Automating Dataset Updates Towards Reliable and Timely Evaluation of
  Large Language Models
Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models
Jiahao Ying
Yixin Cao
Yushi Bai
Qianru Sun
Bo Wang
Wei Tang
Zhaojun Ding
Yizhe Yang
Xuanjing Huang
Shuicheng Yan
KELM
21
6
0
19 Feb 2024
On Catastrophic Inheritance of Large Foundation Models
On Catastrophic Inheritance of Large Foundation Models
Hao Chen
Bhiksha Raj
Xing Xie
Jindong Wang
AI4CE
56
12
0
02 Feb 2024
When Large Language Models Meet Vector Databases: A Survey
When Large Language Models Meet Vector Databases: A Survey
Zhi Jing
Yongye Su
Yikun Han
Bo Yuan
Haiyun Xu
Chunjiang Liu
Kehai Chen
Min Zhang
55
35
0
30 Jan 2024
PPM: Automated Generation of Diverse Programming Problems for
  Benchmarking Code Generation Models
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models
Simin Chen
Xiaoning Feng
Xiao Han
Cong Liu
Wei Yang
44
3
0
28 Jan 2024
NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language
  Models via Complexity Classes
NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes
Lizhou Fan
Wenyue Hua
Lingyao Li
Haoyang Ling
Yongfeng Zhang
LRM
31
45
0
22 Dec 2023
Investigating Data Contamination in Modern Benchmarks for Large Language
  Models
Investigating Data Contamination in Modern Benchmarks for Large Language Models
Chunyuan Deng
Yilun Zhao
Xiangru Tang
Mark B. Gerstein
Arman Cohan
AAML
ELM
24
52
0
16 Nov 2023
Post Turing: Mapping the landscape of LLM Evaluation
Post Turing: Mapping the landscape of LLM Evaluation
Alexey Tikhonov
Ivan P. Yamshchikov
ELM
51
4
0
03 Nov 2023
Leveraging Word Guessing Games to Assess the Intelligence of Large
  Language Models
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
Tian Liang
Zhiwei He
Jen-tse Huang
Wenxuan Wang
Wenxiang Jiao
Rui Wang
Yujiu Yang
Zhaopeng Tu
Shuming Shi
Xing Wang
LLMAG
60
5
0
31 Oct 2023
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
A. Maritan
Jiaao Chen
S. Dey
Luca Schenato
Diyi Yang
Xing Xie
ELM
LRM
27
42
0
29 Sep 2023
LawBench: Benchmarking Legal Knowledge of Large Language Models
LawBench: Benchmarking Legal Knowledge of Large Language Models
Zhiwei Fei
Xiaoyu Shen
D. Zhu
Fengzhe Zhou
Zhuo Han
Songyang Zhang
Kai-xiang Chen
Zongwen Shen
Jidong Ge
ELM
AILaw
34
34
0
28 Sep 2023
The False Dawn: Reevaluating Google's Reinforcement Learning for Chip
  Macro Placement
The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement
I. Markov
26
5
0
16 Jun 2023
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,489
0
23 Jan 2020
1