ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.15627
  4. Cited By
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
v1v2v3 (latest)

Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph

21 June 2024
Roman Vashurin
Ekaterina Fadeeva
Artem Vazhentsev
Akim Tsvigun
Daniil Vasilev
Rui Xing
Abdelrahman Boda Sadallah
Lyudmila Rvanova
Sergey Petrakov
Alexander Panchenko
Timothy Baldwin
Timothy Baldwin
Maxim Panov
Artem Shelmanov
Artem Shelmanov
    HILM
ArXiv (abs)PDFHTML

Papers citing "Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph"

50 / 60 papers shown
Title
MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration
MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration
Zhitao He
Sandeep Polisetty
Zhiyuan Fan
Yuchen Huang
Shujin Wu
Yi R.
LRM
46
2
0
29 May 2025
Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation
Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation
Ekaterina Fadeeva
Aleksandr Rubashevskii
Roman Vashurin
Shehzaad Dhuliawala
Artem Shelmanov
Timothy Baldwin
Preslav Nakov
Mrinmaya Sachan
Maxim Panov
HILM
52
0
0
27 May 2025
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
Sergey Pletenev
Maria Marina
Nikolay Ivanov
Daria Galimzianova
Nikita Krayko
Mikhail Salnikov
Vasily Konovalov
Alexander Panchenko
Viktor Moskvoretskii
36
0
0
27 May 2025
Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs
Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs
Artem Vazhentsev
Lyudmila Rvanova
Gleb Kuzmin
Ekaterina Fadeeva
Ivan Lazichny
...
Maxim Panov
Timothy Baldwin
Mrinmaya Sachan
Preslav Nakov
Artem Shelmanov
EDLHILM
42
0
0
26 May 2025
Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models
Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models
Weihao Xuan
Qingcheng Zeng
Heli Qi
Junjue Wang
Naoto Yokoya
35
0
0
26 May 2025
UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models
UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models
Roman Vashurin
Maiya Goloburda
Preslav Nakov
Maxim Panov
27
0
0
25 May 2025
Token-Level Uncertainty Estimation for Large Language Model Reasoning
Tunyu Zhang
Haizhou Shi
Yibin Wang
Hengyi Wang
Xiaoxiao He
...
Ligong Han
Kai Xu
Huatian Zhang
Dimitris N. Metaxas
Hao Wang
LRM
87
0
0
16 May 2025
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs
Artem Shelmanov
Ekaterina Fadeeva
Akim Tsvigun
Ivan Tsvigun
Zhuohan Xie
...
Caiqi Zhang
Artem Vazhentsev
Mrinmaya Sachan
Preslav Nakov
Timothy Baldwin
HILM
82
2
0
13 May 2025
Self-Reported Confidence of Large Language Models in Gastroenterology: Analysis of Commercial, Open-Source, and Quantized Models
Self-Reported Confidence of Large Language Models in Gastroenterology: Analysis of Commercial, Open-Source, and Quantized Models
Nariman Naderi
Seyed Amir Ahmad Safavi-Naini
Thomas Savage
Zahra Atf
Peter Lewis
Girish Nadkarni
Ali Soroush
ELM
122
2
0
24 Mar 2025
ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty
ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty
Qing Zong
Zhaoxiang Wang
Tianshi Zheng
Xiyu Ren
Yangqiu Song
145
3
0
28 Dec 2024
Enhancing Zero-shot Chain of Thought Prompting via Uncertainty-Guided Strategy Selection
Shanu Kumar
Saish Mendke
Karody Lubna Abdul Rahman
Santosh Kurasa
Parag Agrawal
Sandipan Dandapat
LLMAGLRM
133
2
0
30 Nov 2024
Quantifying Aleatoric and Epistemic Uncertainty with Proper Scoring
  Rules
Quantifying Aleatoric and Epistemic Uncertainty with Proper Scoring Rules
Paul Hofman
Yusuf Sale
Eyke Hüllermeier
UQCVUDPER
109
6
0
18 Apr 2024
LUQ: Long-text Uncertainty Quantification for LLMs
LUQ: Long-text Uncertainty Quantification for LLMs
Caiqi Zhang
Fangyu Liu
Marco Basaldella
Nigel Collier
HILM
67
38
0
29 Mar 2024
Fact-Checking the Output of Large Language Models via Token-Level
  Uncertainty Quantification
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Ekaterina Fadeeva
Aleksandr Rubashevskii
Artem Shelmanov
Sergey Petrakov
Haonan Li
...
Gleb Kuzmin
Alexander Panchenko
Timothy Baldwin
Preslav Nakov
Maxim Panov
HILM
82
56
0
07 Mar 2024
Stable LM 2 1.6B Technical Report
Stable LM 2 1.6B Technical Report
Marco Bellagente
J. Tow
Dakota Mahan
Duy Phung
Maksym Zhuravinskyi
...
Paulo Rocha
Harry Saini
H. Teufel
Niccoló Zanichelli
Carlos Riquelme
OSLM
78
57
0
27 Feb 2024
Predictive Uncertainty Quantification via Risk Decompositions for
  Strictly Proper Scoring Rules
Predictive Uncertainty Quantification via Risk Decompositions for Strictly Proper Scoring Rules
Nikita Kotelevskii
Maxim Panov
PERUQCVUD
102
3
0
16 Feb 2024
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
Tianhang Zhang
Lin Qiu
Qipeng Guo
Cheng Deng
Yue Zhang
Zheng Zhang
Cheng Zhou
Xinbing Wang
Luoyi Fu
HILM
111
59
0
22 Nov 2023
LM-Polygraph: Uncertainty Estimation for Language Models
LM-Polygraph: Uncertainty Estimation for Language Models
Ekaterina Fadeeva
Roman Vashurin
Akim Tsvigun
Artem Vazhentsev
Sergey Petrakov
...
Elizaveta Goncharova
Alexander Panchenko
Maxim Panov
Timothy Baldwin
Artem Shelmanov
53
67
0
13 Nov 2023
Mistral 7B
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELRM
79
2,229
0
10 Oct 2023
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open
  Generative Large Language Models
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
Neha Sengupta
Sunil Kumar Sahu
Bokang Jia
Satheesh Katipomu
Haonan Li
...
A. Jackson
Hector Xuguang Ren
Preslav Nakov
Timothy Baldwin
Eric P. Xing
LRM
75
41
0
30 Aug 2023
Shifting Attention to Relevance: Towards the Predictive Uncertainty
  Quantification of Free-Form Large Language Models
Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
Jinhao Duan
Hao-Ran Cheng
Shiqi Wang
Alex Zavalny
Chenan Wang
Renjing Xu
B. Kailkhura
Kaidi Xu
97
49
0
03 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
391
4,388
0
09 Jun 2023
Generating with Confidence: Uncertainty Quantification for Black-box
  Large Language Models
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Zhen Lin
Shubhendu Trivedi
Jimeng Sun
HILM
181
152
0
30 May 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
387
4,125
0
29 May 2023
AlignScore: Evaluating Factual Consistency with a Unified Alignment
  Function
AlignScore: Evaluating Factual Consistency with a Unified Alignment Function
Yuheng Zha
Yichi Yang
Ruichen Li
Zhiting Hu
HILM
73
207
0
26 May 2023
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence
  Scores from Language Models Fine-Tuned with Human Feedback
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Katherine Tian
E. Mitchell
Allan Zhou
Archit Sharma
Rafael Rafailov
Huaxiu Yao
Chelsea Finn
Christopher D. Manning
110
354
0
24 May 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
  Generative Large Language Models
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark Gales
HILMLRM
189
439
0
15 Mar 2023
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation
  in Natural Language Generation
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Lorenz Kuhn
Y. Gal
Sebastian Farquhar
UQLM
192
307
0
19 Feb 2023
Accelerating Large Language Model Decoding with Speculative Sampling
Accelerating Large Language Model Decoding with Speculative Sampling
Charlie Chen
Sebastian Borgeaud
G. Irving
Jean-Baptiste Lespiau
Laurent Sifre
J. Jumper
BDLLRM
87
430
0
02 Feb 2023
Understanding and Detecting Hallucinations in Neural Machine Translation
  via Model Introspection
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection
Weijia Xu
Sweta Agrawal
Eleftheria Briakou
Marianna J. Martindale
Marine Carpuat
HILM
58
48
0
18 Jan 2023
Scalable Batch Acquisition for Deep Bayesian Active Learning
Scalable Batch Acquisition for Deep Bayesian Active Learning
Aleksandr Rubashevskii
Daria A. Kotova
Maxim Panov
BDL
56
3
0
13 Jan 2023
Rainproof: An Umbrella To Shield Text Generators From
  Out-Of-Distribution Data
Rainproof: An Umbrella To Shield Text Generators From Out-Of-Distribution Data
Maxime Darrin
Pablo Piantanida
Pierre Colombo
OODD
192
15
0
18 Dec 2022
Fast Inference from Transformers via Speculative Decoding
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
147
724
0
30 Nov 2022
Mutual Information Alleviates Hallucinations in Abstractive
  Summarization
Mutual Information Alleviates Hallucinations in Abstractive Summarization
Liam van der Poel
Ryan Cotterell
Clara Meister
HILM
73
61
0
24 Oct 2022
Out-of-Distribution Detection and Selective Generation for Conditional
  Language Models
Out-of-Distribution Detection and Selective Generation for Conditional Language Models
Jie Jessie Ren
Jiaming Luo
Yao-Min Zhao
Kundan Krishna
Mohammad Saleh
Balaji Lakshminarayanan
Peter J. Liu
OODD
120
111
0
30 Sep 2022
Confident Adaptive Language Modeling
Confident Adaptive Language Modeling
Tal Schuster
Adam Fisch
Jai Gupta
Mostafa Dehghani
Dara Bahri
Vinh Q. Tran
Yi Tay
Donald Metzler
128
169
0
14 Jul 2022
Language Models (Mostly) Know What They Know
Language Models (Mostly) Know What They Know
Saurav Kadavath
Tom Conerly
Amanda Askell
T. Henighan
Dawn Drain
...
Nicholas Joseph
Benjamin Mann
Sam McCandlish
C. Olah
Jared Kaplan
ELM
119
826
0
11 Jul 2022
Towards Computationally Feasible Deep Active Learning
Towards Computationally Feasible Deep Active Learning
Akim Tsvigun
Artem Shelmanov
Gleb Kuzmin
Leonid Sanochkin
Daniil Larionov
Gleb Gusev
Manvel Avetisian
L. Zhukov
83
15
0
07 May 2022
On the Origin of Hallucinations in Conversational Models: Is it the
  Datasets or the Models?
On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?
Nouha Dziri
Sivan Milton
Mo Yu
Osmar Zaiane
Siva Reddy
HILM
49
193
0
17 Apr 2022
Detection of Word Adversarial Examples in Text Classification: Benchmark
  and Baseline via Robust Density Estimation
Detection of Word Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation
Kiyoon Yoo
Jangho Kim
Jiho Jang
Nojun Kwak
213
40
0
03 Mar 2022
Nonparametric Uncertainty Quantification for Single Deterministic Neural
  Network
Nonparametric Uncertainty Quantification for Single Deterministic Neural Network
Nikita Kotelevskii
A. Artemenkov
Kirill Fedyanin
Fedor Noskov
Alexander Fishkov
Artem Shelmanov
Artem Vazhentsev
Aleksandr Petiushko
Maxim Panov
UQCVBDL
80
30
0
07 Feb 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
308
4,533
0
27 Oct 2021
On Hallucination and Predictive Uncertainty in Conditional Language
  Generation
On Hallucination and Predictive Uncertainty in Conditional Language Generation
Yijun Xiao
Wenjie Wang
HILM
153
191
0
28 Mar 2021
Active Learning for Sequence Tagging with Deep Pre-trained Models and
  Bayesian Uncertainty Estimates
Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates
Artem Shelmanov
Dmitri Puzyrev
L. Kupriyanova
D. Belyakov
Daniil Larionov
Nikita Khromov
Olga Kozlova
Ekaterina Artemova
Dmitry V. Dylov
Alexander Panchenko
BDLUQLMUQCV
62
54
0
20 Jan 2021
Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain
  Detection
Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection
Alexander Podolskiy
Dmitry Lipin
A. Bout
Ekaterina Artemova
Irina Piontkovskaya
OODD
230
85
0
11 Jan 2021
COMET: A Neural Framework for MT Evaluation
COMET: A Neural Framework for MT Evaluation
Ricardo Rei
Craig Alan Stewart
Ana C. Farinha
A. Lavie
114
1,096
0
18 Sep 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
182
4,526
0
07 Sep 2020
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
161
2,747
0
05 Jun 2020
Unsupervised Quality Estimation for Neural Machine Translation
Unsupervised Quality Estimation for Neural Machine Translation
M. Fomicheva
Shuo Sun
Lisa Yankovskaya
Frédéric Blain
Francisco Guzmán
Mark Fishel
Nikolaos Aletras
Vishrav Chaudhary
Lucia Specia
UQLM
85
206
0
21 May 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
Ji Xin
Raphael Tang
Jaejun Lee
Yaoliang Yu
Jimmy J. Lin
61
375
0
27 Apr 2020
12
Next