Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.19472
Cited By
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
29 February 2024
Ameya Prabhu
Vishaal Udandarao
Philip H. S. Torr
Matthias Bethge
Adel Bibi
Samuel Albanie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress"
13 / 13 papers shown
Title
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
67
0
0
24 Feb 2025
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
Eduardo R. Corral-Soto
Yang Liu
Tongtong Cao
Y. Ren
Liu Bingbing
49
0
0
14 Oct 2024
Holistic Evaluation of Text-To-Image Models
Tony Lee
Michihiro Yasunaga
Chenlin Meng
Yifan Mai
Joon Sung Park
...
Jun-Yan Zhu
Fei-Fei Li
Jiajun Wu
Stefano Ermon
Percy Liang
147
126
0
07 Nov 2023
Online Continual Learning Without the Storage Constraint
Ameya Prabhu
Zhipeng Cai
P. Dokania
Philip H. S. Torr
V. Koltun
Ozan Sener
CLL
118
31
0
16 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
295
2,232
0
22 Mar 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
Nitzan Bitton-Guetta
Yonatan Bitton
Jack Hessel
Ludwig Schmidt
Yuval Elovici
Gabriel Stanovsky
Roy Schwartz
VLM
121
66
0
13 Mar 2023
RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank
Q. Garrido
Randall Balestriero
Laurent Najman
Yann LeCun
SSL
50
72
0
05 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
444
0
23 Aug 2022
Analyzing Dynamic Adversarial Training Data in the Limit
Eric Wallace
Adina Williams
Robin Jia
Douwe Kiela
198
30
0
16 Oct 2021
Understanding Dataset Difficulty with
V
\mathcal{V}
V
-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
164
157
0
16 Oct 2021
DynaSent: A Dynamic Benchmark for Sentiment Analysis
Christopher Potts
Zhengxuan Wu
Atticus Geiger
Douwe Kiela
230
77
0
30 Dec 2020
Estimating Example Difficulty Using Variance of Gradients
Chirag Agarwal
Daniel D'souza
Sara Hooker
208
107
0
26 Aug 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,956
0
20 Apr 2018
1