ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.19472
  4. Cited By
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid
  Progress

Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress

29 February 2024
Ameya Prabhu
Vishaal Udandarao
Philip H. S. Torr
Matthias Bethge
Adel Bibi
Samuel Albanie
ArXivPDFHTML

Papers citing "Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress"

13 / 13 papers shown
Title
Model Lakes
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
67
0
0
24 Feb 2025
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
Eduardo R. Corral-Soto
Yang Liu
Tongtong Cao
Y. Ren
Liu Bingbing
49
0
0
14 Oct 2024
Holistic Evaluation of Text-To-Image Models
Holistic Evaluation of Text-To-Image Models
Tony Lee
Michihiro Yasunaga
Chenlin Meng
Yifan Mai
Joon Sung Park
...
Jun-Yan Zhu
Fei-Fei Li
Jiajun Wu
Stefano Ermon
Percy Liang
147
126
0
07 Nov 2023
Online Continual Learning Without the Storage Constraint
Online Continual Learning Without the Storage Constraint
Ameya Prabhu
Zhipeng Cai
P. Dokania
Philip H. S. Torr
V. Koltun
Ozan Sener
CLL
118
31
0
16 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
295
2,232
0
22 Mar 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
  Synthetic and Compositional Images
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
Nitzan Bitton-Guetta
Yonatan Bitton
Jack Hessel
Ludwig Schmidt
Yuval Elovici
Gabriel Stanovsky
Roy Schwartz
VLM
121
66
0
13 Mar 2023
RankMe: Assessing the downstream performance of pretrained
  self-supervised representations by their rank
RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank
Q. Garrido
Randall Balestriero
Laurent Najman
Yann LeCun
SSL
50
72
0
05 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
444
0
23 Aug 2022
Analyzing Dynamic Adversarial Training Data in the Limit
Analyzing Dynamic Adversarial Training Data in the Limit
Eric Wallace
Adina Williams
Robin Jia
Douwe Kiela
198
30
0
16 Oct 2021
Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information
Understanding Dataset Difficulty with V\mathcal{V}V-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
164
157
0
16 Oct 2021
DynaSent: A Dynamic Benchmark for Sentiment Analysis
DynaSent: A Dynamic Benchmark for Sentiment Analysis
Christopher Potts
Zhengxuan Wu
Atticus Geiger
Douwe Kiela
230
77
0
30 Dec 2020
Estimating Example Difficulty Using Variance of Gradients
Estimating Example Difficulty Using Variance of Gradients
Chirag Agarwal
Daniel D'souza
Sara Hooker
208
107
0
26 Aug 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,956
0
20 Apr 2018
1