Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.09261
Cited By
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing
14 September 2024
Chenyang Yang
Yining Hong
Grace A. Lewis
Tongshuang Wu
Christian Kastner
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing"
21 / 21 papers shown
Title
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs
Tongshuang Wu
Haiyi Zhu
Maya Albayrak
Alexis Axon
Amanda Bertsch
...
Ying-Jui Tseng
Patricia Vaidos
Zhijin Wu
Wei Wu
Chenyang Yang
124
34
0
10 Jan 2025
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
77
71
0
29 Jan 2024
The Rise and Potential of Large Language Model Based Agents: A Survey
Zhiheng Xi
Wenxiang Chen
Xin Guo
Wei He
Yiwen Ding
...
Wenjuan Qin
Yongyan Zheng
Xipeng Qiu
Xuanjing Huan
Tao Gui
LM&MA
LM&Ro
3DV
AI4CE
110
934
0
14 Sep 2023
Universal Self-Adaptive Prompting
Xingchen Wan
Ruoxi Sun
Hootan Nakhost
H. Dai
Julian Martin Eisenschlos
Sercan O. Arik
Tomas Pfister
LRM
83
11
0
24 May 2023
ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness
Ján Cegin
Jakub Simko
Peter Brusilovsky
65
46
0
22 May 2023
Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning
Ángel Alexander Cabrera
Erica Fu
Donald Bertucci
Kenneth Holstein
Ameet Talwalkar
Jason I. Hong
Adam Perer
71
48
0
09 Feb 2023
Demystifying Prompts in Language Models via Perplexity Estimation
Hila Gonen
Srini Iyer
Terra Blevins
Noah A. Smith
Luke Zettlemoyer
LRM
117
210
0
08 Dec 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
185
3,128
0
20 Oct 2022
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Aarohi Srivastava
Abhinav Rastogi
Abhishek Rao
Abu Awal Md Shoeb
Abubakar Abid
...
Zhuoye Zhao
Zijian Wang
Zijie J. Wang
Zirui Wang
Ziyi Wu
ELM
183
1,750
0
09 Jun 2022
Domino: Discovering Systematic Errors with Cross-Modal Embeddings
Sabri Eyuboglu
M. Varma
Khaled Kamal Saab
Jean-Benoit Delbrouck
Christopher Lee-Messer
Jared A. Dunnmon
James Zou
Christopher Ré
71
148
0
24 Mar 2022
Ethical and social risks of harm from Language Models
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
...
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
108
1,036
0
08 Dec 2021
The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models
G. dÉon
Jason dÉon
J. R. Wright
Kevin Leyton-Brown
77
75
0
01 Jul 2021
What Makes Good In-Context Examples for GPT-
3
3
3
?
Jiachang Liu
Dinghan Shen
Yizhe Zhang
Bill Dolan
Lawrence Carin
Weizhu Chen
AAML
RALM
385
1,379
0
17 Jan 2021
HateCheck: Functional Tests for Hate Speech Detection Models
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
85
271
0
31 Dec 2020
WILDS: A Benchmark of in-the-Wild Distribution Shifts
Pang Wei Koh
Shiori Sagawa
Henrik Marklund
Sang Michael Xie
Marvin Zhang
...
A. Kundaje
Emma Pierson
Sergey Levine
Chelsea Finn
Percy Liang
OOD
177
1,434
0
14 Dec 2020
Social Chemistry 101: Learning to Reason about Social and Moral Norms
Maxwell Forbes
Jena D. Hwang
Vered Shwartz
Maarten Sap
Yejin Choi
52
268
0
01 Nov 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
780
42,055
0
28 May 2020
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
ELM
208
1,104
0
08 May 2020
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Omar Khattab
Matei A. Zaharia
136
1,370
0
27 Apr 2020
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.3K
12,226
0
27 Aug 2019
Deceiving Google's Perspective API Built for Detecting Toxic Comments
Hossein Hosseini
Sreeram Kannan
Baosen Zhang
Radha Poovendran
AAML
66
328
0
27 Feb 2017
1