Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.04023
Cited By
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
8 February 2023
Yejin Bang
Samuel Cahyawijaya
Nayeon Lee
Wenliang Dai
Dan Su
Bryan Wilie
Holy Lovenia
Ziwei Ji
Tiezheng Yu
Willy Chung
Quyet V. Do
Yan Xu
Pascale Fung
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity"
50 / 158 papers shown
Title
Symbolically-Guided Visual Plan Inference from Uncurated Video Data
Wenyan Yang
Ahmet Tikna
Yi Zhao
Yuying Zhang
Luigi Palopoli
Marco Roveri
J. Pajarinen
VGen
26
0
0
13 May 2025
Towards Contamination Resistant Benchmarks
Rahmatullah Musawi
Sheng Lu
36
0
0
13 May 2025
QUPID: Quantified Understanding for Enhanced Performance, Insights, and Decisions in Korean Search Engines
Ohjoon Kwon
Changsu Lee
Jihye Back
Lim Sun Suk
Inho Kang
Donghyeon Jeon
38
0
0
12 May 2025
Assessing and Enhancing the Robustness of LLM-based Multi-Agent Systems Through Chaos Engineering
Joshua Owotogbe
LLMAG
57
0
0
06 May 2025
A Survey on Privacy Risks and Protection in Large Language Models
Kang Chen
Xiuze Zhou
Yuanguo Lin
Shibo Feng
Li Shen
Pengcheng Wu
AILaw
PILM
138
0
0
04 May 2025
Can We Enhance Bug Report Quality Using LLMs?: An Empirical Study of LLM-Based Bug Report Generation
Jagrit Acharya
Gouri Ginde
46
0
0
26 Apr 2025
HalluLens: LLM Hallucination Benchmark
Yejin Bang
Ziwei Ji
Alan Schelten
Anthony Hartshorn
Tara Fowler
Cheng Zhang
Nicola Cancedda
Pascale Fung
HILM
92
0
0
24 Apr 2025
Benchmarking Biopharmaceuticals Retrieval-Augmented Generation Evaluation
Hanmeng Zhong
Linqing Chen
Weilei Wang
Wentao Wu
28
0
0
15 Apr 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Ziwei Ji
L. Yu
Yeskendir Koishekenov
Yejin Bang
Anthony Hartshorn
Alan Schelten
Cheng Zhang
Pascale Fung
Nicola Cancedda
49
1
0
18 Mar 2025
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Shiran Dudy
Thulasi Tholeti
R. Ramachandranpillai
Muhammad Ali
Toby Jia-Jun Li
Ricardo Baeza-Yates
27
0
0
16 Mar 2025
Treble Counterfactual VLMs: A Causal Approach to Hallucination
Li Li
Jiashu Qu
Yuxiao Zhou
Yuehan Qin
Tiankai Yang
Yue Zhao
88
2
0
08 Mar 2025
Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs
Xiaomin Li
Zhou Yu
Ziji Zhang
Yingying Zhuang
S.
Narayanan Sadagopan
Anurag Beniwal
HILM
58
0
0
28 Feb 2025
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
Shahriar Kabir Nahin
R. N. Nandi
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Kowsher
Apu Chandraw Shill
Md Ibrahim
Mehadi Hasan Menon
Tareq Al Muntasir
Firoj Alam
66
0
0
24 Feb 2025
Do Multilingual LLMs Think In English?
Lisa Schut
Y. Gal
Sebastian Farquhar
42
3
0
24 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
69
8
0
24 Feb 2025
QUILL: Quotation Generation Enhancement of Large Language Models
Jin Xiao
Bowei Zhang
Qianyu He
Jiaqing Liang
Feng Wei
Jinglei Chen
Zujie Liang
Deqing Yang
Yanghua Xiao
HILM
LRM
106
0
0
21 Feb 2025
Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation
Hieu Nguyen
Zihao He
Shoumik Atul Gandre
Ujjwal Pasupulety
Sharanya Kumari Shivakumar
Kristina Lerman
HILM
56
1
0
16 Feb 2025
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
Shreyan Biswas
Alexander Erlei
U. Gadiraju
105
4
0
13 Feb 2025
Can ChatGPT Diagnose Alzheimer's Disease?
Quoc Toan Nguyen
Linh Le
Xuan-The Tran
T. Do
Chin-Teng Lin
LM&MA
218
0
0
10 Feb 2025
Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy
Rishabh Uapadhyay
Marco Viviani
67
0
0
07 Feb 2025
Shuttle Between the Instructions and the Parameters of Large Language Models
Wangtao Sun
Haotian Xu
Huanxuan Liao
Xuanqing Yu
Zhongtao Jiang
Shizhu He
Jun Zhao
Kang Liu
57
0
0
04 Feb 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Erik Cambria
LM&MA
AILaw
93
153
0
28 Jan 2025
Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators
Dingkang Yang
Dongling Xiao
Jinjie Wei
Mingcheng Li
Zhaoyu Chen
Ke Li
L. Zhang
HILM
94
3
0
28 Jan 2025
AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought
Xin Huang
Tarun K. Vangani
Zhengyuan Liu
Bowei Zou
A. Aw
LRM
AI4CE
55
2
0
27 Jan 2025
Chain-of-Translation Prompting (CoTR): A Novel Prompting Technique for Low Resource Languages
Tejas Deshpande
Nidhi Kowtal
Raviraj Joshi
LRM
47
1
0
31 Dec 2024
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang
Jingdi Lei
Junxian Li
Xunzhi Wang
Y. Liu
...
S. M. I. Simon X. Yang
Jianbo Wu
Peng Ye
Wanli Ouyang
Dongzhan Zhou
OffRL
LRM
107
6
0
27 Nov 2024
AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning
Amy Xin
Jinxin Liu
Zijun Yao
Zhicheng Li
S. Cao
Lei Hou
Juanzi Li
LRM
89
1
0
25 Nov 2024
GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration
Xin Sky Li
Qizhi Chu
Y. Chen
Yang Liu
Yaoqi Liu
Zekai Yu
Weize Chen
Chen Qian
C. Shi
Cheng Yang
LLMAG
48
2
0
23 Oct 2024
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement
Zihao Cheng
Li Zhou
Feng Jiang
Benyou Wang
H. Li
DeLMO
39
4
0
18 Oct 2024
LLM-Human Pipeline for Cultural Context Grounding of Conversations
Rajkumar Pujari
Dan Goldwasser
28
1
0
17 Oct 2024
Pyramid-Driven Alignment: Pyramid Principle Guided Integration of Large Language Models and Knowledge Graphs
Lei Sun
Xinchen Wang
Youdi Li
RALM
29
0
0
16 Oct 2024
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
S. Yu
C. Tang
Bokai Xu
Junbo Cui
Junhao Ran
...
Zhenghao Liu
Shuo Wang
Xu Han
Zhiyuan Liu
Maosong Sun
VLM
39
22
0
14 Oct 2024
P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Simeng Han
Aaron Yu
Rui Shen
Zhenting Qi
Martin Riddell
...
Yingbo Zhou
Caiming Xiong
Dragomir R. Radev
Rex Ying
Arman Cohan
LRM
43
2
0
11 Oct 2024
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
Zirui Zhao
Hanze Dong
Amrita Saha
Caiming Xiong
Doyen Sahoo
LRM
27
3
0
10 Oct 2024
Large Language Models can Achieve Social Balance
Pedro Cisneros-Velarde
45
1
0
05 Oct 2024
TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
Danqing Wang
Jianxin Ma
Fei Fang
Lei Li
LLMAG
LRM
134
0
0
02 Oct 2024
Guided Profile Generation Improves Personalization with LLMs
Jiarui Zhang
34
4
0
19 Sep 2024
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models
Xinyu Zhou
Delong Chen
Samuel Cahyawijaya
Xufeng Duan
Zhenguang G. Cai
26
1
0
19 Sep 2024
Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models
Zikai Xie
HILM
LRM
61
5
0
09 Aug 2024
TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation
Xingpeng Sun
Yiran Zhang
Xindi Tang
Amrit Singh Bedi
Aniket Bera
40
4
0
03 Aug 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
37
9
0
22 Jul 2024
RoboMorph: Evolving Robot Morphology using Large Language Models
Kevin Qiu
Krzysztof Ciebiera
Krzysztof Ciebiera
Marek Cygan
Marek Cygan
Łukasz Kuciński
LM&Ro
47
0
0
11 Jul 2024
Vision-Language Models under Cultural and Inclusive Considerations
Antonia Karamolegkou
Phillip Rust
Yong Cao
Ruixiang Cui
Anders Søgaard
Daniel Hershcovich
VLM
51
7
0
08 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq R. Joty
Jimmy Huang
ELM
ALM
27
27
0
04 Jul 2024
Evaluating Large Language Models along Dimensions of Language Variation: A Systematik Invesdigatiom uv Cross-lingual Generalization
Niyati Bafna
Kenton Murray
David Yarowsky
58
2
0
19 Jun 2024
Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs
Yi Fang
Moxin Li
Wenjie Wang
Hui Lin
Fuli Feng
LRM
60
5
0
17 Jun 2024
MoE-RBench
\texttt{MoE-RBench}
MoE-RBench
: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
71
5
0
17 Jun 2024
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming
Victor-Alexandru Pădurean
Adish Singla
ELM
46
3
0
14 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
38
0
06 Jun 2024
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
David Ifeoluwa Adelani
Jessica Ojo
Israel Abebe Azime
Jian Yun Zhuang
Jesujoba Oluwadara Alabi
...
Salomey Osei
Sokhar Samb
Tadesse Kebede Guge
Pontus Stenetorp
Pontus Stenetorp
ELM
57
7
0
05 Jun 2024
1
2
3
4
Next