ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.04026
  4. Cited By
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation

AgentSims: An Open-Source Sandbox for Large Language Model Evaluation

8 August 2023
Jiaju Lin
Haoran Zhao
Aochi Zhang
Yiting Wu
Huqiuyue Ping
Qin Chen
    ELM
    LLMAG
ArXivPDFHTML

Papers citing "AgentSims: An Open-Source Sandbox for Large Language Model Evaluation"

44 / 44 papers shown
Title
CAMEO: Collection of Multilingual Emotional Speech Corpora
CAMEO: Collection of Multilingual Emotional Speech Corpora
Iwona Christop
Maciej Czajka
19
0
0
16 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
86
2
0
26 Apr 2025
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Z. Wang
Kaidi Wang
Q. Wang
Pingyue Zhang
Linjie Li
...
Jiajun Wu
L. Fei-Fei
Lijuan Wang
Yejin Choi
Manling Li
86
4
0
24 Apr 2025
Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning
Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning
Ehsan Ahmadi
Chao Wang
42
0
0
21 Apr 2025
SandboxEval: Towards Securing Test Environment for Untrusted Code
SandboxEval: Towards Securing Test Environment for Untrusted Code
Rafiqul Rabin
Jesse Hostetler
Sean McGregor
Brett Weir
Nick Judd
ELM
41
0
0
27 Mar 2025
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Shuo Tang
Xianghe Pang
Zexi Liu
Bohan Tang
Rui Ye
Xiaowen Dong
Yunhong Wang
Yanfeng Wang
S. Chen
SyDa
LLMAG
132
4
0
21 Feb 2025
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Deepak Nathani
Lovish Madaan
Nicholas Roberts
Nikolay Bashlykov
Ajay Menon
...
Tatiana Shavrina
Jakob Foerster
Yoram Bachrach
William Yang Wang
Roberta Raileanu
LLMAG
88
7
0
21 Feb 2025
MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer
Qi Chen
Dexi Liu
31
0
0
28 Jan 2025
A Survey on LLM-based Multi-Agent System: Recent Advances and New Frontiers in Application
A Survey on LLM-based Multi-Agent System: Recent Advances and New Frontiers in Application
Shuaihang Chen
Yuanxing Liu
Wei Han
Weinan Zhang
Ting Liu
LLMAG
AI4CE
48
1
0
08 Jan 2025
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
Zhen Li
Yupeng Su
Runming Yang
C. Xie
Zehua Wang
Zhongwei Xie
Ngai Wong
Hongxia Yang
MQ
LRM
51
3
0
06 Jan 2025
EscapeBench: Pushing Language Models to Think Outside the Box
EscapeBench: Pushing Language Models to Think Outside the Box
Cheng Qian
Peixuan Han
Qinyu Luo
Bingxiang He
Xiusi Chen
...
Jiarui Yao
Xiaocheng Yang
Denghui Zhang
Yunzhu Li
Heng Ji
LLMAG
LRM
88
3
0
18 Dec 2024
TrendSim: Simulating Trending Topics in Social Media Under Poisoning
  Attacks with LLM-based Multi-agent System
TrendSim: Simulating Trending Topics in Social Media Under Poisoning Attacks with LLM-based Multi-agent System
Zeyu Zhang
Jianxun Lian
Chen Ma
Yaning Qu
Ye Luo
...
X. Chen
Yankai Lin
Le Wu
Xing Xie
Ji-Rong Wen
LLMAG
AAML
70
3
0
14 Dec 2024
From a Tiny Slip to a Giant Leap: An LLM-Based Simulation for Fake News
  Evolution
From a Tiny Slip to a Giant Leap: An LLM-Based Simulation for Fake News Evolution
Yuhan Liu
Zirui Song
Xiaoqing Zhang
Xiuying Chen
Rui Yan
24
10
0
24 Oct 2024
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems
Donghyun Lee
Mo Tiwari
LLMAG
36
9
0
09 Oct 2024
MLLM as Retriever: Interactively Learning Multimodal Retrieval for
  Embodied Agents
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Junpeng Yue
Xinru Xu
Börje F. Karlsson
Zongqing Lu
39
0
0
04 Oct 2024
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language
  Models
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models
David Castillo-Bolado
Joseph Davidson
Finlay Gray
Marek Rosa
34
3
0
30 Sep 2024
The Emerged Security and Privacy of LLM Agent: A Survey with Case
  Studies
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies
Feng He
Tianqing Zhu
Dayong Ye
Bo Liu
Wanlei Zhou
Philip S. Yu
PILM
LLMAG
ELM
68
24
0
28 Jul 2024
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model
  Agents
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
Tianqi Xu
Linyao Chen
Dai-Jie Wu
Yanjun Chen
Zecheng Zhang
...
Shilong Liu
Bochen Qian
Philip H. S. Torr
Guohao Li
Bernard Ghanem
57
14
0
01 Jul 2024
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for
  LLM Agents
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Edoardo Debenedetti
Jie Zhang
Mislav Balunović
Luca Beurer-Kellner
Marc Fischer
Florian Tramèr
LLMAG
AAML
56
26
1
19 Jun 2024
AI Agents Under Threat: A Survey of Key Security Challenges and Future
  Pathways
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways
Zehang Deng
Yongjian Guo
Changzhou Han
Wanlun Ma
Junwu Xiong
Sheng Wen
Yang Xiang
44
23
0
04 Jun 2024
Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf
Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf
Xuanfa Jin
Ziyan Wang
Yali Du
Meng Fang
Haifeng Zhang
Jun Wang
OffRL
LLMAG
54
6
0
30 May 2024
From Persona to Personalization: A Survey on Role-Playing Language
  Agents
From Persona to Personalization: A Survey on Role-Playing Language Agents
Jiangjie Chen
Xintao Wang
Rui Xu
Siyu Yuan
Yikai Zhang
...
Caiyu Hu
Siye Wu
Scott Ren
Ziquan Fu
Yanghua Xiao
62
77
0
28 Apr 2024
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society
  of LLM Agents
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents
Giorgio Piatti
Zhijing Jin
Max Kleiman-Weiner
Bernhard Schölkopf
Mrinmaya Sachan
Rada Mihalcea
LLMAG
56
15
0
25 Apr 2024
AgentCoord: Visually Exploring Coordination Strategy for LLM-based
  Multi-Agent Collaboration
AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration
Bo Pan
Jiaying Lu
Ke Wang
Li Zheng
Zhen Wen
Yingchaojie Feng
Minfeng Zhu
Wei Chen
LLMAG
37
10
0
18 Apr 2024
Evolving Agents: Interactive Simulation of Dynamic and Diverse Human
  Personalities
Evolving Agents: Interactive Simulation of Dynamic and Diverse Human Personalities
Jiale Li
Jiayang Li
Jiahao Chen
Yifan Li
Shijie Wang
Hugo Zhou
Minjun Ye
Yunsheng Su
AI4CE
42
4
0
03 Apr 2024
A Survey on Large Language Model-Based Game Agents
A Survey on Large Language Model-Based Game Agents
Sihao Hu
Tiansheng Huang
Gaowen Liu
Ramana Rao Kompella
Gaowen Liu
Selim Furkan Tekin
Yichang Xu
Zachary Yahn
Ling Liu
LLMAG
LM&Ro
AI4CE
LM&MA
71
51
0
02 Apr 2024
In-Memory Learning: A Declarative Learning Framework for Large Language
  Models
In-Memory Learning: A Declarative Learning Framework for Large Language Models
Bo Wang
Tianxiang Sun
Hang Yan
Siyin Wang
Qingyuan Cheng
Xipeng Qiu
LLMAG
37
1
0
05 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
52
8
0
29 Feb 2024
AgentScope: A Flexible yet Robust Multi-Agent Platform
AgentScope: A Flexible yet Robust Multi-Agent Platform
Dawei Gao
Zitao Li
Xuchen Pan
Weirui Kuang
Zhijian Ma
...
Chen Cheng
Hongzhu Shi
Yaliang Li
Bolin Ding
Jingren Zhou
LLMAG
32
28
0
21 Feb 2024
Shall We Team Up: Exploring Spontaneous Cooperation of Competing LLM
  Agents
Shall We Team Up: Exploring Spontaneous Cooperation of Competing LLM Agents
Zengqing Wu
Run Peng
Shuyuan Zheng
Qianying Liu
Xu Han
Brian Inhyuk Kwon
Makoto Onizuka
Shaojie Tang
Chuan Xiao
44
10
0
19 Feb 2024
AgentLens: Visual Analysis for Agent Behaviors in LLM-based Autonomous
  Systems
AgentLens: Visual Analysis for Agent Behaviors in LLM-based Autonomous Systems
Jiaying Lu
Bo Pan
Jieyi Chen
Yingchaojie Feng
Jingyuan Hu
Yuchen Peng
Wei Chen
42
13
0
14 Feb 2024
Large Language Models as Minecraft Agents
Large Language Models as Minecraft Agents
Chris Madge
Massimo Poesio
LLMAG
37
6
0
13 Feb 2024
Enhance Reasoning for Large Language Models in the Game Werewolf
Enhance Reasoning for Large Language Models in the Game Werewolf
Shuang Wu
Liwen Zhu
Tao Yang
Shiwei Xu
Qiang Fu
Yang Wei
Haobo Fu
LRM
LLMAG
82
18
0
04 Feb 2024
Computational Experiments Meet Large Language Model Based Agents: A
  Survey and Perspective
Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective
Qun Ma
Xiao Xue
Deyu Zhou
Xiangning Yu
Donghua Liu
...
Yifan Shen
Peilin Ji
Juanjuan Li
Gang Wang
Wanpeng Ma
AI4CE
LM&Ro
LLMAG
21
7
0
01 Feb 2024
LARP: Language-Agent Role Play for Open-World Games
LARP: Language-Agent Role Play for Open-World Games
Ming Yan
Ruihao Li
Hao Zhang
Hao Wang
Zhilan Yang
Ji Yan
LLMAG
LM&Ro
AI4CE
27
16
0
24 Dec 2023
Large Language Models Empowered Agent-based Modeling and Simulation: A
  Survey and Perspectives
Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives
Chen Gao
Xiaochong Lan
Nian Li
Yuan Yuan
Jingtao Ding
Zhilun Zhou
Fengli Xu
Yong Li
LLMAG
AI4CE
LM&Ro
41
105
0
19 Dec 2023
Igniting Language Intelligence: The Hitchhiker's Guide From
  Chain-of-Thought Reasoning to Language Agents
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Zhuosheng Zhang
Yao Yao
Aston Zhang
Xiangru Tang
Xinbei Ma
...
Yiming Wang
Mark B. Gerstein
Rui Wang
Gongshen Liu
Hai Zhao
LLMAG
LM&Ro
LRM
42
53
0
20 Nov 2023
StrategyLLM: Large Language Models as Strategy Generators, Executors,
  Optimizers, and Evaluators for Problem Solving
StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving
Chang Gao
Haiyun Jiang
Deng Cai
Shuming Shi
Wai Lam
LRM
31
3
0
15 Nov 2023
On Generative Agents in Recommendation
On Generative Agents in Recommendation
An Zhang
Yuxin Chen
Leheng Sheng
Xiang Wang
Tat-Seng Chua
38
43
0
16 Oct 2023
AgentCF: Collaborative Learning with Autonomous Language Agents for
  Recommender Systems
AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems
Junjie Zhang
Yupeng Hou
Ruobing Xie
Wenqi Sun
Julian McAuley
Wayne Xin Zhao
Leyu Lin
Ji-Rong Wen
LLMAG
22
67
0
13 Oct 2023
MetaAgents: Simulating Interactions of Human Behaviors for LLM-based
  Task-oriented Coordination via Collaborative Generative Agents
MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents
Yuan Li
Yixuan Zhang
Lichao Sun
LLMAG
LM&Ro
28
100
0
10 Oct 2023
An In-depth Survey of Large Language Model-based Artificial Intelligence
  Agents
An In-depth Survey of Large Language Model-based Artificial Intelligence Agents
Pengyu Zhao
Zijian Jin
Ning Cheng
LLMAG
41
20
0
23 Sep 2023
A Survey on Large Language Model based Autonomous Agents
A Survey on Large Language Model based Autonomous Agents
Lei Wang
Chengbang Ma
Xueyang Feng
Zeyu Zhang
Hao-ran Yang
...
Xu Chen
Yankai Lin
Wayne Xin Zhao
Zhewei Wei
Ji-Rong Wen
LLMAG
AI4CE
LM&Ro
41
1,118
0
22 Aug 2023
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
232
1,742
0
07 Apr 2023
1