Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks

16 February 2023

Papers citing "Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks"

38 / 38 papers shown

Title
BeliefNest: A Joint Action Simulator for Embodied Agents with Theory of Mind Rikunari Sagara Koichiro Terao Naoto Iwahashi LM&Ro 2 0 0 18 May 2025
The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners Vince Trencsenyi Agnieszka Mensfelt Kostas Stathis LRM 26 0 0 14 May 2025
R^3-VQA: "Read the Room" by Video Social Reasoning Lixing Niu Jiapeng Li Xingping Yu Shu Wang Ruining Feng Bo Wu Ping Wei Yansen Wang Lifeng Fan 51 0 0 07 May 2025
Do Large Language Models know who did what to whom? Joseph M. Denning Xiaohan Bryor Snefjella Idan A. Blank 62 1 0 23 Apr 2025
Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models Yuheng Wu Wentao Guo Zirui Liu Heng Ji Zhaozhuo Xu Denghui Zhang 33 0 0 05 Apr 2025
Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models? Yi-Long Lu Chunhui Zhang Jiajun Song Lifeng Fan Wei Wang OffRL 53 0 0 02 Apr 2025
Re-evaluating Theory of Mind evaluation in large language models Jennifer Hu Felix Sosa T. Ullman 45 0 0 28 Feb 2025
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models Leena Mathur Marian Qian Paul Pu Liang Louis-Philippe Morency LRM 166 1 0 21 Feb 2025
Why human-AI relationships need socioaffective alignment Hannah Rose Kirk Iason Gabriel Chris Summerfield Bertie Vidgen Scott A. Hale 46 6 0 04 Feb 2025
Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning Eitan Wagner Nitay Alon J. Barnby Omri Abend LRM 85 2 0 18 Dec 2024
Codenames as a Benchmark for Large Language Models Matthew Stephenson Matthew Sidji Benoît Ronval LLMAG LRM ELM 108 1 0 16 Dec 2024
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina Yuan Gao Dokyun Lee Gordon Burtch Sina Fazelpour LRM 56 7 0 25 Oct 2024
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data Jiaming Zhou Abbas Ghaddar Ge Zhang Liheng Ma Yaochen Hu Soumyasundar Pal Mark J. Coates Bin Wang Yingxue Zhang Jianye Hao ReLM LRM 39 4 0 19 Sep 2024
Instigating Cooperation among LLM Agents Using Adaptive Information Modulation Qiliang Chen Sepehr Ilami Nunzio Lore Babak Heydari 31 2 0 16 Sep 2024
CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models S. Bharti Shiyun Cheng Jihyun Rho Martina Rao Mu Cai Yong Jae Lee Martina Rau Xiaojin Zhu 42 1 0 26 Aug 2024
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind Haojun Shi Suyu Ye Xinyu Fang Chuanyang Jin Leyla Isik Yen-Ling Kuo Tianmin Shu LLMAG 75 7 0 22 Aug 2024
Large Language Models Assume People are More Rational than We Really are Ryan Liu Jiayi Geng Joshua C. Peterson Ilia Sucholutsky Thomas L. Griffiths 76 17 0 24 Jun 2024
A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns Asaf Yehudai Taelin Karidi Gabriel Stanovsky Ariel Goldstein Omri Abend 47 1 0 23 May 2024
A social path to human-like artificial intelligence Edgar A. Duénez-Guzmán Suzanne Sadedin Jane X. Wang Kevin R. McKee Joel Z Leibo GNN 31 28 0 22 May 2024
GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment Lance Ying Kunal Jha Shivam Aarya Joshua B. Tenenbaum Antonio Torralba Tianmin Shu 42 14 0 17 Mar 2024
Language Models Represent Beliefs of Self and Others Wentao Zhu Zhining Zhang Yizhou Wang MILM LRM 50 8 0 28 Feb 2024
Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues Deuksin Kwon Emily Weiss Tara Kulshrestha Kushal Chawla Gale M. Lucas Jonathan Gratch 51 7 0 21 Feb 2024
EmoBench: Evaluating the Emotional Intelligence of Large Language Models Sahand Sabour Siyang Liu Zheyuan Zhang June M. Liu Jinfeng Zhou Alvionna S. Sunaryo Juanzi Li Tatia M.C. Lee Rada Mihalcea Minlie Huang 32 12 0 19 Feb 2024
The Tyranny of Possibilities in the Design of Task-Oriented LLM Systems: A Scoping Survey Dhruv Dhamani Mary Lou Maher 30 1 0 29 Dec 2023
Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities Alex Wilf Sihyun Shawn Lee Paul Pu Liang Louis-Philippe Morency LRM 29 33 0 16 Nov 2023
Deep Natural Language Feature Learning for Interpretable Prediction Felipe Urrutia Cristian Buc Valentin Barriere 26 1 0 09 Nov 2023
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction Nicholas Walker Stefan Ultes Pierre Lison LM&Ro 58 1 0 03 Nov 2023
Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models Ziqiao Ma Jacob Sansom Run Peng Joyce Chai 47 16 0 30 Oct 2023
HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models Yinghui He Yufan Wu Yilin Jia Rada Mihalcea Yulong Chen Naihao Deng LRM LLMAG 38 21 0 25 Oct 2023
Using Artificial Populations to Study Psychological Phenomena in Neural Models Jesse Roberts Kyle Moore Drew Wilenzick Doug Fisher 19 6 0 15 Aug 2023
Personality Traits in Large Language Models Gregory Serapio-García Mustafa Safdari Clément Crepy Luning Sun Stephen Fitz P. Romero Marwa Abdulhai Aleksandra Faust Maja J. Matarić LM&MA LLMAG 58 119 0 01 Jul 2023
Turning large language models into cognitive models Marcel Binz Eric Schulz 32 53 0 06 Jun 2023
Playing repeated games with Large Language Models Elif Akata Lion Schulz Julian Coda-Forno Seong Joon Oh Matthias Bethge Eric Schulz 423 122 0 26 May 2023
Comparing Machines and Children: Using Developmental Psychology Experiments to Assess the Strengths and Weaknesses of LaMDA Responses Eliza Kosoy Emily Rose Reagan Leslie Y. Lai Alison Gopnik Danielle Krettek Cobb 24 9 0 18 May 2023
Event knowledge in large language models: the gap between the impossible and the unlikely Carina Kauf Anna A. Ivanova Giulia Rambelli Emmanuele Chersoni Jingyuan Selena She Zawad Chowdhury Evelina Fedorenko Alessandro Lenci 37 67 0 02 Dec 2022
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs Maarten Sap Ronan Le Bras Daniel Fried Yejin Choi 27 207 0 24 Oct 2022
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies Gati Aher RosaI. Arriaga Adam Tauman Kalai 59 349 0 18 Aug 2022
Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others Kanishk Gandhi Gala Stojnic Brenden M. Lake M. Dillon 48 47 0 23 Feb 2021