Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.10474
Cited By
LeCov: Multi-level Testing Criteria for Large Language Models
20 August 2024
Xuan Xie
Jiayang Song
Yuheng Huang
Da Song
Fuyuan Zhang
Felix Juefei-Xu
Lei Ma
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"LeCov: Multi-level Testing Criteria for Large Language Models"
26 / 26 papers shown
Title
To Believe or Not to Believe Your LLM
Yasin Abbasi-Yadkori
Ilja Kuzborskij
András György
Csaba Szepesvári
UQCV
140
60
0
04 Jun 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
115
6
0
12 Apr 2024
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Hakan Inan
Kartikeya Upasani
Jianfeng Chi
Rashi Rungta
Krithika Iyer
...
Michael Tontchev
Qing Hu
Brian Fuller
Davide Testuggine
Madian Khabsa
AI4MH
159
447
0
07 Dec 2023
SafetyBench: Evaluating the Safety of Large Language Models
Zhexin Zhang
Leqi Lei
Lindong Wu
Rui Sun
Yongkang Huang
Chong Long
Xiao Liu
Xuanyu Lei
Jie Tang
Minlie Huang
LRM
LM&MA
ELM
85
106
0
13 Sep 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang
Yafu Li
Leyang Cui
Deng Cai
Lemao Liu
...
Longyue Wang
Anh Tuan Luu
Wei Bi
Freda Shi
Shuming Shi
RALM
LRM
HILM
99
571
0
03 Sep 2023
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Wei Ping
Weixin Chen
Hengzhi Pei
Chulin Xie
Mintong Kang
...
Zinan Lin
Yuk-Kit Cheng
Sanmi Koyejo
Basel Alomair
Yue Liu
95
416
0
20 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
385
3,981
0
29 May 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.4K
14,359
0
15 Mar 2023
A Holistic Approach to Undesired Content Detection in the Real World
Todor Markov
Chong Zhang
Sandhini Agarwal
Tyna Eloundou
Teddy Lee
Steven Adler
Angela Jiang
L. Weng
102
233
0
05 Aug 2022
Aries: Efficient Testing of Deep Neural Networks via Labeling-Free Accuracy Estimation
Qiang Hu
Yuejun Guo
Xiaofei Xie
Maxime Cordy
Lei Ma
Mike Papadakis
Yves Le Traon
AAML
50
20
0
22 Jul 2022
Software Testing for Machine Learning
D. Marijan
A. Gotlieb
AAML
46
28
0
30 Apr 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
874
12,973
0
04 Mar 2022
Black-Box Testing of Deep Neural Networks Through Test Case Diversity
Zohreh Aghababaeyan
Manel Abdellatif
Lionel C. Briand
Ramesh S
M. Bagherzadeh
AAML
68
45
0
20 Dec 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
450
2,096
0
31 Dec 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
158
1,209
0
24 Sep 2020
Model-based Exploration of the Frontier of Behaviours for Deep Learning System Testing
Vincenzo Riccio
Paolo Tonella
AAML
36
131
0
06 Jul 2020
Coverage Guided Testing for Recurrent Neural Networks
Wei Huang
Youcheng Sun
Xing-E. Zhao
James Sharp
Wenjie Ruan
Jie Meng
Xiaowei Huang
AAML
83
48
0
05 Nov 2019
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
Guillaume Wenzek
Marie-Anne Lachaux
Alexis Conneau
Vishrav Chaudhary
Francisco Guzmán
Armand Joulin
Edouard Grave
86
655
0
01 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
439
20,181
0
23 Oct 2019
Improving short text classification through global augmentation methods
Vukosi Marivate
T. Sefara
VLM
56
95
0
07 Jul 2019
Machine Learning Testing: Survey, Landscapes and Horizons
Jie M. Zhang
Mark Harman
Lei Ma
Yang Liu
VLM
AILaw
77
750
0
19 Jun 2019
Predicting the Generalization Gap in Deep Networks with Margin Distributions
Yiding Jiang
Dilip Krishnan
H. Mobahi
Samy Bengio
UQCV
93
199
0
28 Sep 2018
DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems
Lei Ma
Felix Juefei Xu
Fuyuan Zhang
Jiyuan Sun
Minhui Xue
...
Ting Su
Li Li
Yang Liu
Jianjun Zhao
Yadong Wang
ELM
67
622
0
20 Mar 2018
Testing Deep Neural Networks
Youcheng Sun
Xiaowei Huang
Daniel Kroening
James Sharp
Matthew Hill
Rob Ashmore
AAML
54
218
0
10 Mar 2018
DeepXplore: Automated Whitebox Testing of Deep Learning Systems
Kexin Pei
Yinzhi Cao
Junfeng Yang
Suman Jana
AAML
88
1,367
0
18 May 2017
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
207
2,676
0
09 May 2017
1