ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.04023
  4. Cited By
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
  Reasoning, Hallucination, and Interactivity

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

8 February 2023
Yejin Bang
Samuel Cahyawijaya
Nayeon Lee
Wenliang Dai
Dan Su
Bryan Wilie
Holy Lovenia
Ziwei Ji
Tiezheng Yu
Willy Chung
Quyet V. Do
Yan Xu
Pascale Fung
    ReLM
    LRM
ArXivPDFHTML

Papers citing "A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity"

50 / 168 papers shown
Title
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with
  Sparse Mixture-of-Experts
MoE-RBench\texttt{MoE-RBench}MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
73
5
0
17 Jun 2024
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming
Victor-Alexandru Pădurean
Adish Singla
ELM
54
3
0
14 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
38
0
06 Jun 2024
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
David Ifeoluwa Adelani
Jessica Ojo
Israel Abebe Azime
Jian Yun Zhuang
Jesujoba Oluwadara Alabi
...
Salomey Osei
Sokhar Samb
Tadesse Kebede Guge
Pontus Stenetorp
Pontus Stenetorp
ELM
62
7
0
05 Jun 2024
ChatLang-8: An LLM-Based Synthetic Data Generation Framework for
  Grammatical Error Correction
ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction
Jeiyoon Park
Chanjun Park
Heuiseok Lim
27
2
0
05 Jun 2024
ACCORD: Closing the Commonsense Measurability Gap
ACCORD: Closing the Commonsense Measurability Gap
François Roewer-Després
Jinyue Feng
Zining Zhu
Frank Rudzicz
LRM
48
0
0
04 Jun 2024
Unlearning Climate Misinformation in Large Language Models
Unlearning Climate Misinformation in Large Language Models
Michael Fore
Simranjit Singh
Chaehong Lee
Amritanshu Pandey
Antonios Anastasopoulos
Dimitrios Stamoulis
MU
56
1
0
29 May 2024
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization
Dixuan Wang
Yanda Li
Junyuan Jiang
Zepeng Ding
Ziqin Luo
Guochao Jiang
Jiaqing Liang
Deqing Yang
27
11
0
27 May 2024
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
Jiajie Jin
Yutao Zhu
Xinyu Yang
Chenghao Zhang
Zhicheng Dou
Chenghao Zhang
Tong Zhao
Zhao Yang
Zhicheng Dou
Ji-Rong Wen
VLM
85
47
0
22 May 2024
Supporting Business Document Workflows via Collection-Centric
  Information Foraging with Large Language Models
Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language Models
Raymond Fok
Nedim Lipka
Tong Sun
Alexa F. Siu
28
6
0
02 May 2024
A Framework for Real-time Safeguarding the Text Generation of Large
  Language Model
A Framework for Real-time Safeguarding the Text Generation of Large Language Model
Ximing Dong
Dayi Lin
Shaowei Wang
Ahmed E. Hassan
38
1
0
29 Apr 2024
Innovative Integration of Visual Foundation Model with a Robotic Arm on
  a Mobile Platform
Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform
Shimian Zhang
Qiuhong Lu
34
1
0
29 Apr 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Olivia Wiles
Chuhan Zhang
Isabela Albuquerque
Ivana Kajić
Su Wang
...
Jordi Pont-Tuset
Aida Nematzadeh
Anant Nawalgaria
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
127
14
0
25 Apr 2024
Generating Attractive and Authentic Copywriting from Customer Reviews
Generating Attractive and Authentic Copywriting from Customer Reviews
Yu-Xiang Lin
Wei-Yun Ma
36
2
0
22 Apr 2024
Data Alignment for Zero-Shot Concept Generation in Dermatology AI
Data Alignment for Zero-Shot Concept Generation in Dermatology AI
S. Gadgil
Mahtab Bigverdi
MedIm
AI4MH
VLM
33
0
0
19 Apr 2024
Improving the Capabilities of Large Language Model Based Marketing
  Analytics Copilots With Semantic Search And Fine-Tuning
Improving the Capabilities of Large Language Model Based Marketing Analytics Copilots With Semantic Search And Fine-Tuning
Yilin Gao
Arava Sai Kumar
Yancheng Li
James W. Snyder
AI4MH
37
2
0
16 Apr 2024
Adapting Mental Health Prediction Tasks for Cross-lingual Learning via
  Meta-Training and In-context Learning with Large Language Model
Adapting Mental Health Prediction Tasks for Cross-lingual Learning via Meta-Training and In-context Learning with Large Language Model
Zita Lifelo
Huansheng Ning
Sahraoui Dhelim
AI4MH
48
0
0
13 Apr 2024
High-Dimension Human Value Representation in Large Language Models
High-Dimension Human Value Representation in Large Language Models
Samuel Cahyawijaya
Delong Chen
Yejin Bang
Leila Khalatbari
Bryan Wilie
Ziwei Ji
Etsuko Ishii
Pascale Fung
68
5
0
11 Apr 2024
Apprentices to Research Assistants: Advancing Research with Large
  Language Models
Apprentices to Research Assistants: Advancing Research with Large Language Models
M. Namvarpour
A. Razi
37
3
0
09 Apr 2024
RecGPT: Generative Personalized Prompts for Sequential Recommendation
  via ChatGPT Training Paradigm
RecGPT: Generative Personalized Prompts for Sequential Recommendation via ChatGPT Training Paradigm
Yabin Zhang
Wenhui Yu
Erhan Zhang
Xu Chen
Lantao Hu
Peng Jiang
Kun Gai
LRM
42
8
0
06 Apr 2024
GPTA: Generative Prompt Tuning Assistant for Synergistic Downstream
  Neural Network Enhancement with LLMs
GPTA: Generative Prompt Tuning Assistant for Synergistic Downstream Neural Network Enhancement with LLMs
Xiao Liu
Jiawei Zhang
41
0
0
29 Mar 2024
Boosting Conversational Question Answering with Fine-Grained
  Retrieval-Augmentation and Self-Check
Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-Check
Linhao Ye
Zhikai Lei
Jia-Peng Yin
Qin Chen
Jie Zhou
Liang He
3DV
RALM
34
15
0
27 Mar 2024
Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models
Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models
Chaoqun Liu
Wenxuan Zhang
Yiran Zhao
A. Luu
Lidong Bing
LRM
41
9
0
15 Mar 2024
Meta-Cognitive Analysis: Evaluating Declarative and Procedural Knowledge
  in Datasets and Large Language Models
Meta-Cognitive Analysis: Evaluating Declarative and Procedural Knowledge in Datasets and Large Language Models
Zhuoqun Li
Hongyu Lin
Yaojie Lu
Hao Xiang
Xianpei Han
Le Sun
33
1
0
14 Mar 2024
HINTs: Sensemaking on large collections of documents with Hypergraph
  visualization and INTelligent agents
HINTs: Sensemaking on large collections of documents with Hypergraph visualization and INTelligent agents
Sam Yu-Te Lee
Kwan-Liu Ma
40
2
0
05 Mar 2024
MulCogBench: A Multi-modal Cognitive Benchmark Dataset for Evaluating
  Chinese and English Computational Language Models
MulCogBench: A Multi-modal Cognitive Benchmark Dataset for Evaluating Chinese and English Computational Language Models
Yunhao Zhang
Xiaohan Zhang
Chong Li
Shaonan Wang
Chengqing Zong
25
6
0
02 Mar 2024
How to Understand "Support"? An Implicit-enhanced Causal Inference
  Approach for Weakly-supervised Phrase Grounding
How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding
Jiamin Luo
Jianing Zhao
Jingjing Wang
Guodong Zhou
46
0
0
29 Feb 2024
JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning
  and Professional Question Answering Capability
JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability
Junda Wang
Zhichao Yang
Zonghai Yao
Hong-ye Yu
BDL
AI4MH
LRM
40
30
0
27 Feb 2024
An Evaluation of Large Language Models in Bioinformatics Research
An Evaluation of Large Language Models in Bioinformatics Research
Hengchuang Yin
Zhonghui Gu
Fanhao Wang
Yiparemu Abuduhaibaier
Yanqiao Zhu
Xinming Tu
Xian-Sheng Hua
Xiao Luo
Yizhou Sun
LM&MA
38
8
0
21 Feb 2024
Do Moral Judgment and Reasoning Capability of LLMs Change with Language?
  A Study using the Multilingual Defining Issues Test
Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test
Aditi Khandelwal
Utkarsh Agarwal
Kumar Tanmay
Monojit Choudhury
ELM
LRM
30
6
0
03 Feb 2024
(Chat)GPT v BERT: Dawn of Justice for Semantic Change Detection
(Chat)GPT v BERT: Dawn of Justice for Semantic Change Detection
Francesco Periti
Haim Dubossarsky
Nina Tahmasebi
AI4MH
31
13
0
25 Jan 2024
RePLan: Robotic Replanning with Perception and Language Models
RePLan: Robotic Replanning with Perception and Language Models
Marta Skreta
Zihan Zhou
Jia Lin Yuan
Kourosh Darvish
Alán Aspuru-Guzik
Animesh Garg
LM&Ro
LRM
40
26
0
08 Jan 2024
An Investigation of Large Language Models for Real-World Hate Speech
  Detection
An Investigation of Large Language Models for Real-World Hate Speech Detection
Keyan Guo
Alexander Hu
Jaden Mu
Ziheng Shi
Ziming Zhao
Nishant Vishwamitra
Hongxin Hu
25
12
0
07 Jan 2024
The Earth is Flat? Unveiling Factual Errors in Large Language Models
The Earth is Flat? Unveiling Factual Errors in Large Language Models
Wenxuan Wang
Juluan Shi
Zhaopeng Tu
Youliang Yuan
Jen-tse Huang
Wenxiang Jiao
Michael R. Lyu
KELM
HILM
SyDa
47
1
0
01 Jan 2024
KernelGPT: Enhanced Kernel Fuzzing via Large Language Models
KernelGPT: Enhanced Kernel Fuzzing via Large Language Models
Chenyuan Yang
Zijie Zhao
Lingming Zhang
25
13
0
31 Dec 2023
Generative AI and the History of Architecture
Generative AI and the History of Architecture
J. Ploennigs
Markus Berger
23
1
0
22 Dec 2023
Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning
  Distilled from Large Language Models
Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models
Hongzhan Lin
Ziyang Luo
Jing Ma
Long Chen
27
9
0
09 Dec 2023
Comparing Large Language Model AI and Human-Generated Coaching Messages for Behavioral Weight Loss
Comparing Large Language Model AI and Human-Generated Coaching Messages for Behavioral Weight Loss
Zhuoran Huang
Michael P. Berry
Christina Chwyl
Gary Hsieh
Jing Wei
Evan M. Forman
AI4MH
6
2
0
07 Dec 2023
Intelligent Virtual Assistants with LLM-based Process Automation
Intelligent Virtual Assistants with LLM-based Process Automation
Yanchu Guan
Dong Wang
Zhixuan Chu
Shiyu Wang
Feiyue Ni
Ruihua Song
Longfei Li
Jinjie Gu
Chenyi Zhuang
25
20
0
04 Dec 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
59
20
0
28 Nov 2023
Complementary Advantages of ChatGPTs and Human Readers in Reasoning:
  Evidence from English Text Reading Comprehension
Complementary Advantages of ChatGPTs and Human Readers in Reasoning: Evidence from English Text Reading Comprehension
Tongquan Zhou
Yao Zhang
Siyi Cao
Yulu Li
Tao Wang
AI4MH
LRM
35
2
0
17 Nov 2023
How Vocabulary Sharing Facilitates Multilingualism in LLaMA?
How Vocabulary Sharing Facilitates Multilingualism in LLaMA?
Fei Yuan
Shuai Yuan
Zhiyong Wu
Lei Li
31
10
0
15 Nov 2023
MEGAVERSE: Benchmarking Large Language Models Across Languages,
  Modalities, Models and Tasks
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Sanchit Ahuja
Divyanshu Aggarwal
Varun Gumma
Ishaan Watts
Ashutosh Sathe
...
Rishav Hada
Prachi Jain
Maxamed Axmed
Kalika Bali
Sunayana Sitaram
ELM
39
39
0
13 Nov 2023
Multi-label and Multi-target Sampling of Machine Annotation for
  Computational Stance Detection
Multi-label and Multi-target Sampling of Machine Annotation for Computational Stance Detection
Zhengyuan Liu
Hai Leong Chieu
Nancy F. Chen
21
1
0
08 Nov 2023
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection
  Method
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method
Yukun Zhao
Lingyong Yan
Weiwei Sun
Guoliang Xing
Chong Meng
Shuaiqiang Wang
Zhicong Cheng
Zhaochun Ren
Dawei Yin
27
35
0
27 Oct 2023
Disentangling Extraction and Reasoning in Multi-hop Spatial Reasoning
Disentangling Extraction and Reasoning in Multi-hop Spatial Reasoning
Roshanak Mirzaee
Parisa Kordjamshidi
LRM
27
7
0
25 Oct 2023
Language Models Hallucinate, but May Excel at Fact Verification
Language Models Hallucinate, but May Excel at Fact Verification
Jian-Yu Guan
Jesse Dodge
David Wadden
Minlie Huang
Hao Peng
LRM
HILM
28
28
0
23 Oct 2023
The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64
  Languages
The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages
Chiyu Zhang
Khai Duy Doan
Qisheng Liao
Muhammad Abdul-Mageed
36
6
0
23 Oct 2023
Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large
  Language Models on Sequence to Sequence Tasks
Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks
Andrea Sottana
Bin Liang
Kai Zou
Zheng Yuan
ALM
ELM
LM&MA
35
54
0
20 Oct 2023
Mind the instructions: a holistic evaluation of consistency and
  interactions in prompt-based learning
Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning
Lucas Weber
Elia Bruni
Dieuwke Hupkes
30
24
0
20 Oct 2023
Previous
1234
Next