ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.15817
  4. Cited By
Identifying the Risks of LM Agents with an LM-Emulated Sandbox

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

25 September 2023
Yangjun Ruan
Honghua Dong
Andrew Wang
Silviu Pitis
Yongchao Zhou
Jimmy Ba
Yann Dubois
Chris J. Maddison
Tatsunori Hashimoto
    LLMAG
    ELM
ArXivPDFHTML

Papers citing "Identifying the Risks of LM Agents with an LM-Emulated Sandbox"

29 / 79 papers shown
Title
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
69
44
0
23 May 2024
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace
  Setting
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
Olly Styles
Sam Miller
Patricio Cerda-Mardini
T. Guha
Victor Sanchez
Bertie Vidgen
LLMAG
33
3
0
01 May 2024
Exploring the Privacy Protection Capabilities of Chinese Large Language
  Models
Exploring the Privacy Protection Capabilities of Chinese Large Language Models
Yuqi Yang
Xiaowen Huang
Jitao Sang
ELM
PILM
AILaw
43
1
0
27 Mar 2024
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma
Weikai Huang
Jieyu Zhang
Tanmay Gupta
Ranjay Krishna
55
18
0
17 Mar 2024
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated
  Large Language Model Agents
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan
Zhixiang Liang
Zifan Ying
Daniel Kang
LLMAG
46
73
0
05 Mar 2024
SoFA: Shielded On-the-fly Alignment via Priority Rule Following
SoFA: Shielded On-the-fly Alignment via Priority Rule Following
Xinyu Lu
Bowen Yu
Yaojie Lu
Hongyu Lin
Haiyang Yu
Le Sun
Xianpei Han
Yongbin Li
60
13
0
27 Feb 2024
Soft Self-Consistency Improves Language Model Agents
Soft Self-Consistency Improves Language Model Agents
Han Wang
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
LLMAG
24
7
0
20 Feb 2024
Learning From Failure: Integrating Negative Examples when Fine-tuning
  Large Language Models as Agents
Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
Renxi Wang
Haonan Li
Xudong Han
Yixuan Zhang
Timothy Baldwin
LLMAG
27
22
0
18 Feb 2024
ToolSword: Unveiling Safety Issues of Large Language Models in Tool
  Learning Across Three Stages
ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages
Junjie Ye
Sixian Li
Guanyu Li
Caishuang Huang
Songyang Gao
Yilong Wu
Qi Zhang
Tao Gui
Xuanjing Huang
LLMAG
35
16
0
16 Feb 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language
  Agents
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Lingbo Mo
Zeyi Liao
Boyuan Zheng
Yu-Chuan Su
Chaowei Xiao
Huan Sun
AAML
LLMAG
49
15
0
15 Feb 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM
  Agents Exponentially Fast
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min-Bin Lin
LLMAG
LM&Ro
37
48
0
13 Feb 2024
Towards Unified Alignment Between Agents, Humans, and Environment
Towards Unified Alignment Between Agents, Humans, and Environment
Zonghan Yang
An Liu
Zijun Liu
Kai Liu
Fangzhou Xiong
...
Zhenhe Zhang
Fuwen Luo
Zhicheng Guo
Peng Li
Yang Liu
32
4
0
12 Feb 2024
Feedback Loops With Language Models Drive In-Context Reward Hacking
Feedback Loops With Language Models Drive In-Context Reward Hacking
Alexander Pan
Erik Jones
Meena Jagadeesan
Jacob Steinhardt
KELM
53
26
0
09 Feb 2024
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science
Xiangru Tang
Qiao Jin
Kunlun Zhu
Tongxin Yuan
Yichi Zhang
...
Jian Tang
Zhuosheng Zhang
Arman Cohan
Zhiyong Lu
Mark B. Gerstein
LLMAG
ELM
19
40
0
06 Feb 2024
TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent
  Constitution
TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution
Wenyue Hua
Xianjun Yang
Zelong Li
Cheng Wei
Yongfeng Zhang
LLMAG
35
12
0
02 Feb 2024
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Elias Stengel-Eskin
Archiki Prasad
Mohit Bansal
25
13
0
29 Jan 2024
Visibility into AI Agents
Visibility into AI Agents
Alan Chan
Carson Ezell
Max Kaufmann
K. Wei
Lewis Hammond
...
Nitarshan Rajkumar
David M. Krueger
Noam Kolt
Lennart Heim
Markus Anderljung
20
32
0
23 Jan 2024
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Tongxin Yuan
Zhiwei He
Lingzhong Dong
Yiming Wang
Ruijie Zhao
...
Binglin Zhou
Fangqi Li
Zhuosheng Zhang
Rui Wang
Gongshen Liu
ELM
34
61
0
18 Jan 2024
AI capabilities can be significantly improved without expensive
  retraining
AI capabilities can be significantly improved without expensive retraining
Tom Davidson
Jean-Stanislas Denain
Pablo Villalobos
Guillem Bas
OffRL
VLM
24
26
0
12 Dec 2023
An LLM Compiler for Parallel Function Calling
An LLM Compiler for Parallel Function Calling
Sehoon Kim
Suhong Moon
Ryan Tabrizi
Nicholas Lee
Michael W. Mahoney
Kurt Keutzer
A. Gholami
LRM
16
59
0
07 Dec 2023
LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent
  Ecosystem
LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem
Yingqiang Ge
Yujie Ren
Wenyue Hua
Shuyuan Xu
Juntao Tan
Yongfeng Zhang
LLMAG
23
27
0
06 Dec 2023
Igniting Language Intelligence: The Hitchhiker's Guide From
  Chain-of-Thought Reasoning to Language Agents
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Zhuosheng Zhang
Yao Yao
Aston Zhang
Xiangru Tang
Xinbei Ma
...
Yiming Wang
Mark B. Gerstein
Rui Wang
Gongshen Liu
Hai Zhao
LLMAG
LM&Ro
LRM
36
53
0
20 Nov 2023
Mind's Eye: Grounded Language Model Reasoning through Simulation
Mind's Eye: Grounded Language Model Reasoning through Simulation
Ruibo Liu
Jason W. Wei
S. Gu
Te-Yen Wu
Soroush Vosoughi
Claire Cui
Denny Zhou
Andrew M. Dai
ReLM
LRM
116
79
0
11 Oct 2022
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
246
2,494
0
06 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
446
0
23 Aug 2022
CodeRL: Mastering Code Generation through Pretrained Models and Deep
  Reinforcement Learning
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Hung Le
Yue Wang
Akhilesh Deepak Gotmare
Silvio Savarese
S. Hoi
SyDa
ALM
129
240
0
05 Jul 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
328
4,106
0
24 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
373
8,495
0
28 Jan 2022
Previous
12