ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.10628
  4. Cited By
Data Contamination Through the Lens of Time

Data Contamination Through the Lens of Time

16 October 2023
Manley Roberts
Himanshu Thakur
Christine Herlihy
Colin White
Samuel Dooley
ArXivPDFHTML

Papers citing "Data Contamination Through the Lens of Time"

22 / 22 papers shown
Title
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez-Llorca
ELM
185
2
0
10 Feb 2025
ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
Ezra Karger
Houtan Bastani
Chen Yueh-Han
Zachary Jacobs
Danny Halawi
Fred Zhang
P. Tetlock
82
7
0
30 Sep 2024
Training on the Test Task Confounds Evaluation and Emergence
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo
Florian E. Dorner
Moritz Hardt
ELM
80
7
1
10 Jul 2024
Stop Uploading Test Data in Plain Text: Practical Strategies for
  Mitigating Data Contamination by Evaluation Benchmarks
Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks
Alon Jacovi
Avi Caciularu
Omer Goldman
Yoav Goldberg
32
100
0
17 May 2023
Capabilities of GPT-4 on Medical Challenge Problems
Capabilities of GPT-4 on Medical Challenge Problems
Harsha Nori
Nicholas King
S. McKinney
Dean Carignan
Eric Horvitz
LM&MA
ELM
AI4MH
64
786
0
20 Mar 2023
Natural Language to Code Generation in Interactive Data Science
  Notebooks
Natural Language to Code Generation in Interactive Data Science Notebooks
Pengcheng Yin
Wen-Ding Li
Kefan Xiao
Abhishek Rao
Yeming Wen
...
Paige Bailey
Michele Catasta
Henryk Michalewski
Oleksandr Polozov
Charles Sutton
38
60
0
19 Dec 2022
Codex Hacks HackerRank: Memorization Issues and a Framework for Code
  Synthesis Evaluation
Codex Hacks HackerRank: Memorization Issues and a Framework for Code Synthesis Evaluation
Anjan Karmakar
Julian Aron Prenner
Marco DÁmbros
Romain Robbes
ELM
34
17
0
06 Dec 2022
Execution-based Evaluation for Data Science Code Generation Models
Execution-based Evaluation for Data Science Code Generation Models
Junjie Huang
Chenglong Wang
Jipeng Zhang
Cong Yan
Haotian Cui
J. Inala
Colin B. Clement
Nan Duan
Jianfeng Gao
ELM
63
36
0
17 Nov 2022
Quantifying Memorization Across Neural Language Models
Quantifying Memorization Across Neural Language Models
Nicholas Carlini
Daphne Ippolito
Matthew Jagielski
Katherine Lee
Florian Tramèr
Chiyuan Zhang
PILM
68
603
0
15 Feb 2022
Competition-Level Code Generation with AlphaCode
Competition-Level Code Generation with AlphaCode
Yujia Li
David Choi
Junyoung Chung
Nate Kushman
Julian Schrittwieser
...
Esme Sutherland Robson
Pushmeet Kohli
Nando de
Koray Kavukcuoglu
Oriol Vinyals
46
1,337
0
08 Feb 2022
AI and the Everything in the Whole Wide World Benchmark
AI and the Everything in the Whole Wide World Benchmark
Inioluwa Deborah Raji
Emily M. Bender
Amandalynne Paullada
Emily L. Denton
A. Hanna
44
301
0
26 Nov 2021
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
83
1,846
0
16 Aug 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
329
611
0
14 Jul 2021
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
224
657
0
20 May 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean
  Crawled Corpus
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge
Maarten Sap
Ana Marasović
William Agnew
Gabriel Ilharco
Dirk Groeneveld
Margaret Mitchell
Matt Gardner
AILaw
58
437
0
18 Apr 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
398
1,868
0
14 Dec 2020
JuICe: A Large Scale Distantly Supervised Dataset for Open Domain
  Context-based Code Generation
JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation
R. Agashe
R. Campello
Arthur Zimek
45
83
0
05 Oct 2019
SPoC: Search-based Pseudocode to Code
SPoC: Search-based Pseudocode to Code
Sumith Kulal
Panupong Pasupat
Kartik Chandra
Mina Lee
Oded Padon
A. Aiken
Percy Liang
44
215
0
12 Jun 2019
Lessons from Natural Language Inference in the Clinical Domain
Lessons from Natural Language Inference in the Clinical Domain
Alexey Romanov
Chaitanya P. Shivade
LM&MA
37
268
0
21 Aug 2018
Hypothesis Only Baselines in Natural Language Inference
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
221
579
0
02 May 2018
Annotation Artifacts in Natural Language Inference Data
Annotation Artifacts in Natural Language Inference Data
Suchin Gururangan
Swabha Swayamdipta
Omer Levy
Roy Schwartz
Samuel R. Bowman
Noah A. Smith
84
1,167
0
06 Mar 2018
A Broad-Coverage Challenge Corpus for Sentence Understanding through
  Inference
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
363
4,444
0
18 Apr 2017
1