ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.11667
  4. Cited By
SOTOPIA: Interactive Evaluation for Social Intelligence in Language
  Agents

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

18 October 2023
Xuhui Zhou
Hao Zhu
Leena Mathur
Ruohong Zhang
Haofei Yu
Zhengyang Qi
Louis-Philippe Morency
Yonatan Bisk
Daniel Fried
Graham Neubig
Maarten Sap
    LLMAG
ArXivPDFHTML

Papers citing "SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents"

43 / 93 papers shown
Title
Mitigating Hallucination in Fictional Character Role-Play
Mitigating Hallucination in Fictional Character Role-Play
Nafis Sadeq
Zhouhang Xie
Byungkyu Kang
Prarit Lamba
Xiang Gao
Julian McAuley
HILM
48
7
0
25 Jun 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
77
134
0
22 Jun 2024
Autonomous Agents for Collaborative Task under Information Asymmetry
Autonomous Agents for Collaborative Task under Information Asymmetry
Wei Liu
Chenxi Wang
Yifei Wang
Zihao Xie
Rennai Qiu
Yufan Dang
Zhuoyun Du
Weize Chen
Cheng Yang
Chen Qian
LLMAG
44
4
0
21 Jun 2024
How Many Parameters Does it Take to Change a Light Bulb? Evaluating
  Performance in Self-Play of Conversational Games as a Function of Model
  Characteristics
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Nidhir Bhavsar
Jonathan Jordan
Sherzod Hakimov
David Schlangen
26
0
0
20 Jun 2024
InterIntent: Investigating Social Intelligence of LLMs via Intention
  Understanding in an Interactive Game Context
InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context
Ziyi Liu
Abhishek Anand
Pei Zhou
Jen-tse Huang
Jieyu Zhao
83
6
0
18 Jun 2024
Dialogue Action Tokens: Steering Language Models in Goal-Directed
  Dialogue with a Multi-Turn Planner
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Kenneth Li
Yiming Wang
Fernanda Viégas
Martin Wattenberg
38
6
0
17 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
105
31
0
09 Jun 2024
AgentGym: Evolving Large Language Model-based Agents across Diverse
  Environments
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Zhiheng Xi
Yiwen Ding
Wenxiang Chen
Boyang Hong
Honglin Guo
...
Qi Zhang
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yu-Gang Jiang
LLMAG
LM&Ro
38
29
0
06 Jun 2024
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual
  Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
Anne Beyer
Kranti Chalamalasetti
Sherzod Hakimov
Brielen Madureira
P. Sadler
David Schlangen
LLMAG
40
4
0
31 May 2024
Generative Students: Using LLM-Simulated Student Profiles to Support
  Question Item Evaluation
Generative Students: Using LLM-Simulated Student Profiles to Support Question Item Evaluation
Xinyi Lu
Xu Wang
AI4Ed
35
23
0
19 May 2024
Towards Generalizable Agents in Text-Based Educational Environments: A
  Study of Integrating RL with LLMs
Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs
Bahar Radmehr
Adish Singla
Tanja Kaser
LLMAG
AI4CE
40
6
0
29 Apr 2024
From Persona to Personalization: A Survey on Role-Playing Language
  Agents
From Persona to Personalization: A Survey on Role-Playing Language Agents
Jiangjie Chen
Xintao Wang
Rui Xu
Siyu Yuan
Yikai Zhang
...
Caiyu Hu
Siye Wu
Scott Ren
Ziquan Fu
Yanghua Xiao
62
79
0
28 Apr 2024
A Survey on Self-Evolution of Large Language Models
A Survey on Self-Evolution of Large Language Models
Zhengwei Tao
Ting-En Lin
Xiancai Chen
Hangyu Li
Yuchuan Wu
Yongbin Li
Zhi Jin
Fei Huang
Dacheng Tao
Jingren Zhou
LRM
LM&Ro
57
22
0
22 Apr 2024
Direct Preference Optimization of Video Large Multimodal Models from
  Language Model Reward
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang
Liangke Gui
Zhiqing Sun
Yihao Feng
Keyang Xu
...
Di Fu
Chunyuan Li
Alexander G. Hauptmann
Yonatan Bisk
Yiming Yang
MLLM
56
60
0
01 Apr 2024
Academically intelligent LLMs are not necessarily socially intelligent
Academically intelligent LLMs are not necessarily socially intelligent
Ruoxi Xu
Hongyu Lin
Xianpei Han
Le Sun
Yingfei Sun
ELM
37
6
0
11 Mar 2024
Social Intelligence Data Infrastructure: Structuring the Present and
  Navigating the Future
Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future
Minzhi Li
Weiyan Shi
Caleb Ziems
Diyi Yang
41
9
0
28 Feb 2024
Unveiling the Truth and Facilitating Change: Towards Agent-based
  Large-scale Social Movement Simulation
Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation
Xinyi Mou
Zhongyu Wei
Xuanjing Huang
LLMAG
21
30
0
26 Feb 2024
Q-Probe: A Lightweight Approach to Reward Maximization for Language
  Models
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Kenneth Li
Samy Jelassi
Hugh Zhang
Sham Kakade
Martin Wattenberg
David Brandfonbrener
35
9
0
22 Feb 2024
Data-driven Discovery with Large Generative Models
Data-driven Discovery with Large Generative Models
Bodhisattwa Prasad Majumder
Harshit Surana
Dhruv Agarwal
Sanchaita Hazra
Ashish Sabharwal
Peter Clark
43
9
0
21 Feb 2024
IMBUE: Improving Interpersonal Effectiveness through Simulation and
  Just-in-time Feedback with Human-Language Model Interaction
IMBUE: Improving Interpersonal Effectiveness through Simulation and Just-in-time Feedback with Human-Language Model Interaction
Inna Wanyin Lin
Ashish Sharma
Christopher Rytting
Adam S. Miner
Jina Suh
Tim Althoff
35
11
0
19 Feb 2024
EmoBench: Evaluating the Emotional Intelligence of Large Language Models
EmoBench: Evaluating the Emotional Intelligence of Large Language Models
Sahand Sabour
Siyang Liu
Zheyuan Zhang
June M. Liu
Jinfeng Zhou
Alvionna S. Sunaryo
Juanzi Li
Tatia M.C. Lee
Rada Mihalcea
Minlie Huang
32
12
0
19 Feb 2024
Network Formation and Dynamics Among Multi-LLMs
Network Formation and Dynamics Among Multi-LLMs
Marios Papachristou
Yuan Yuan
50
11
0
16 Feb 2024
Symmetry-Breaking Augmentations for Ad Hoc Teamwork
Symmetry-Breaking Augmentations for Ad Hoc Teamwork
Ravi Hammond
Dustin Craggs
Mingyu Guo
Jakob Foerster
Ian Reid
29
1
0
15 Feb 2024
OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind
  Reasoning Capabilities of Large Language Models
OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models
Hainiu Xu
Runcong Zhao
Lixing Zhu
Bin Liang
Yulan He
84
20
0
08 Feb 2024
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
Federico Bianchi
P. Chia
Mert Yuksekgonul
Jacopo Tagliabue
Daniel Jurafsky
James Zou
LLMAG
40
31
0
08 Feb 2024
TimeArena: Shaping Efficient Multitasking Language Agents in a
  Time-Aware Simulation
TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation
Yikai Zhang
Siyu Yuan
Caiyu Hu
Kyle Richardson
Yanghua Xiao
Jiangjie Chen
AI4CE
LLMAG
32
13
0
08 Feb 2024
Self-Alignment of Large Language Models via Monopolylogue-based Social
  Scene Simulation
Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
Xianghe Pang
Shuo Tang
Rui Ye
Yuxin Xiong
Bolun Zhang
Yanfeng Wang
Siheng Chen
119
28
0
08 Feb 2024
Can Large Language Model Agents Simulate Human Trust Behaviors?
Can Large Language Model Agents Simulate Human Trust Behaviors?
Chengxing Xie
Canyu Chen
Feiran Jia
Ziyu Ye
Kai Shu
Adel Bibi
Ziniu Hu
Philip Torr
Guohao Li
Ge Li
LM&Ro
LLMAG
84
53
0
07 Feb 2024
Large Language Model based Multi-Agents: A Survey of Progress and
  Challenges
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
Taicheng Guo
Xiuying Chen
Yaqi Wang
Ruidi Chang
Shichao Pei
Nitesh V. Chawla
Olaf Wiest
Xiangliang Zhang
LLMAG
LM&Ro
AI4CE
LRM
45
252
0
21 Jan 2024
Large Language Models Empowered Agent-based Modeling and Simulation: A
  Survey and Perspectives
Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives
Chen Gao
Xiaochong Lan
Nian Li
Yuan Yuan
Jingtao Ding
Zhilun Zhou
Fengli Xu
Yong Li
LLMAG
AI4CE
LM&Ro
44
106
0
19 Dec 2023
Urban Generative Intelligence (UGI): A Foundational Platform for Agents
  in Embodied City Environment
Urban Generative Intelligence (UGI): A Foundational Platform for Agents in Embodied City Environment
Fengli Xu
Jun Zhang
Chen Gao
J. Feng
Yong Li
AI4CE
LLMAG
26
29
0
19 Dec 2023
Generative agent-based modeling with actions grounded in physical,
  social, or digital space using Concordia
Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia
A. Vezhnevets
J. Agapiou
Avia Aharon
Ron Ziv
Jayd Matyas
Edgar A. Duénez-Guzmán
William A. Cunningham
Simon Osindero
Danny Karmon
Joel Z Leibo
LLMAG
LM&Ro
AI4CE
40
41
0
06 Dec 2023
Negotiating with LLMS: Prompt Hacks, Skill Gaps, and Reasoning Deficits
Negotiating with LLMS: Prompt Hacks, Skill Gaps, and Reasoning Deficits
Johannes Schneider
Steffi Haag
Leona Chandra Kruse
16
14
0
26 Nov 2023
Simulating Opinion Dynamics with Networks of LLM-based Agents
Simulating Opinion Dynamics with Networks of LLM-based Agents
Yun-Shiuan Chuang
Agam Goyal
Nikunj Harlalka
Siddharth Suresh
Robert Hawkins
Sijia Yang
Dhavan Shah
Junjie Hu
Timothy T. Rogers
AI4CE
19
57
0
16 Nov 2023
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models
  via Contextual Integrity Theory
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
Niloofar Mireshghallah
Hyunwoo J. Kim
Xuhui Zhou
Yulia Tsvetkov
Maarten Sap
Reza Shokri
Yejin Choi
PILM
38
75
0
27 Oct 2023
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning
  Based on Visually Grounded Conversations
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
Kilichbek Haydarov
Xiaoqian Shen
Avinash Madasu
Mahmoud Salem
Jia Li
Gamaleldin F. Elsayed
Mohamed Elhoseiny
39
4
0
30 Aug 2023
PersonaLLM: Investigating the Ability of Large Language Models to
  Express Personality Traits
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits
Hang Jiang
Xiajie Zhang
Xubo Cao
Cynthia Breazeal
Deb Roy
Jad Kabbara
54
74
0
04 May 2023
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
232
1,754
0
07 Apr 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
Grounding Language Models to Images for Multimodal Inputs and Outputs
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
31
119
0
31 Jan 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
273
2,510
0
06 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
398
8,559
0
28 Jan 2022
TEACh: Task-driven Embodied Agents that Chat
TEACh: Task-driven Embodied Agents that Chat
Aishwarya Padmakumar
Jesse Thomason
Ayush Shrivastava
P. Lange
Anjali Narayan-Chen
Spandana Gella
Robinson Piramithu
Gokhan Tur
Dilek Z. Hakkani-Tür
LM&Ro
169
180
0
01 Oct 2021
"Other-Play" for Zero-Shot Coordination
"Other-Play" for Zero-Shot Coordination
Hengyuan Hu
Adam Lerer
A. Peysakhovich
Jakob N. Foerster
VLM
OffRL
136
218
0
06 Mar 2020
Previous
12