ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.08978
  4. Cited By
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering
  LLM Weaknesses

See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses

16 August 2024
Yulong Chen
Yang Liu
Jianhao Yan
X. Bai
Ming Zhong
Yinghao Yang
Ziyi Yang
Chenguang Zhu
Yue Zhang
    ALMELM
ArXiv (abs)PDFHTML

Papers citing "See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses"

24 / 24 papers shown
Title
Conformity in Large Language Models
Conformity in Large Language Models
Xiaochen Zhu
Caiqi Zhang
Tom Stafford
Nigel Collier
Andreas Vlachos
98
0
0
16 Oct 2024
Benchmarking LLMs via Uncertainty Quantification
Benchmarking LLMs via Uncertainty Quantification
Fanghua Ye
Mingming Yang
Jianhui Pang
Longyue Wang
Derek F. Wong
Emine Yilmaz
Shuming Shi
Zhaopeng Tu
ELM
211
59
0
23 Jan 2024
Mixtral of Experts
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELLMAG
155
1,117
0
08 Jan 2024
Self-Guard: Empower the LLM to Safeguard Itself
Self-Guard: Empower the LLM to Safeguard Itself
Zezhong Wang
Fangkai Yang
Lu Wang
Pu Zhao
Hongru Wang
Liang Chen
Qingwei Lin
Kam-Fai Wong
136
34
0
24 Oct 2023
Towards Better Evaluation of Instruction-Following: A Case-Study in
  Summarization
Towards Better Evaluation of Instruction-Following: A Case-Study in Summarization
Ondrej Skopek
Rahul Aralikatte
Sian Gooding
Victor Carbune
ELM
74
19
0
12 Oct 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large
  Language Models
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang
Yafu Li
Leyang Cui
Deng Cai
Lemao Liu
...
Longyue Wang
Anh Tuan Luu
Wei Bi
Freda Shi
Shuming Shi
RALMLRMHILM
106
577
0
03 Sep 2023
Understanding Social Reasoning in Language Models with Language Models
Understanding Social Reasoning in Language Models with Language Models
Kanishk Gandhi
Jan-Philipp Fränken
Tobias Gerstenberg
Noah D. Goodman
LRM
67
126
0
21 Jun 2023
Why Does ChatGPT Fall Short in Providing Truthful Answers?
Why Does ChatGPT Fall Short in Providing Truthful Answers?
Shen Zheng
Jie Huang
Kevin Chen-Chuan Chang
HILMAI4MH
88
55
0
20 Apr 2023
ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political
  Twitter Messages with Zero-Shot Learning
ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning
Petter Törnberg
AI4MH
55
149
0
13 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,699
0
15 Mar 2023
Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome
  Homogenization?
Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization?
Rishi Bommasani
Kathleen A. Creel
Ananya Kumar
Dan Jurafsky
Percy Liang
63
86
0
25 Nov 2022
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
...
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALMELMLRMReLM
266
1,134
0
17 Oct 2022
Towards a Unified Multi-Dimensional Evaluator for Text Generation
Towards a Unified Multi-Dimensional Evaluator for Text Generation
Ming Zhong
Yang Liu
Da Yin
Yuning Mao
Yizhu Jiao
Peng Liu
Chenguang Zhu
Heng Ji
Jiawei Han
ELM
85
276
0
13 Oct 2022
Beyond the Imitation Game: Quantifying and extrapolating the
  capabilities of language models
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Aarohi Srivastava
Abhinav Rastogi
Abhishek Rao
Abu Awal Md Shoeb
Abubakar Abid
...
Zhuoye Zhao
Zijian Wang
Zijie J. Wang
Zirui Wang
Ziyi Wu
ELM
208
1,775
0
09 Jun 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
326
4,569
0
27 Oct 2021
Medically Aware GPT-3 as a Data Generator for Medical Dialogue
  Summarization
Medically Aware GPT-3 as a Data Generator for Medical Dialogue Summarization
Bharath Chintagunta
Namit Katariya
X. Amatriain
Anitha Kannan
LM&MAMedIm
175
153
0
09 Sep 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
236
5,647
0
07 Jul 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
490
10,496
0
17 Jun 2021
Not Enough Data? Deep Learning to the Rescue!
Not Enough Data? Deep Learning to the Rescue!
Ateret Anaby-Tavor
Boaz Carmeli
Esther Goldbraich
Amir Kantor
George Kour
Segev Shlomov
N. Tepper
Naama Zwerdling
84
370
0
08 Nov 2019
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
241
1,551
0
24 May 2019
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question
  Answering
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
188
2,694
0
25 Sep 2018
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
312
8,169
0
16 Jun 2016
A Diversity-Promoting Objective Function for Neural Conversation Models
A Diversity-Promoting Objective Function for Neural Conversation Models
Jiwei Li
Michel Galley
Chris Brockett
Jianfeng Gao
W. Dolan
145
2,402
0
11 Oct 2015
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
228
7,757
0
31 Aug 2015
1