ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.06674
  4. Cited By
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

7 December 2023
Hakan Inan
Kartikeya Upasani
Jianfeng Chi
Rashi Rungta
Krithika Iyer
Yuning Mao
Michael Tontchev
Qing Hu
Brian Fuller
Davide Testuggine
Madian Khabsa
    AI4MH
ArXivPDFHTML

Papers citing "Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations"

50 / 289 papers shown
Title
GNOME: Generating Negotiations through Open-Domain Mapping of Exchanges
GNOME: Generating Negotiations through Open-Domain Mapping of Exchanges
Darshan Deshpande
Shambhavi Sinha
Anirudh Ravi Kumar
Debaditya Pal
Jonathan May
AI4CE
53
0
0
16 Jun 2024
Understanding Jailbreak Success: A Study of Latent Space Dynamics in
  Large Language Models
Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models
Sarah Ball
Frauke Kreuter
Nina Rimsky
40
13
0
13 Jun 2024
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
Delong Ran
Jinyuan Liu
Yichen Gong
Jingyi Zheng
Xinlei He
Tianshuo Cong
Anyu Wang
ELM
47
10
0
13 Jun 2024
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
Zhen Xiang
Linzhi Zheng
Yanjie Li
Junyuan Hong
Qinbin Li
...
Zidi Xiong
Chulin Xie
Carl Yang
Dawn Song
Bo Li
LLMAG
45
23
0
13 Jun 2024
Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak
  Attacks
Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks
Zonghao Ying
Aishan Liu
Xianglong Liu
Dacheng Tao
62
16
0
10 Jun 2024
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Xiangyu Qi
Ashwinee Panda
Kaifeng Lyu
Xiao Ma
Subhrajit Roy
Ahmad Beirami
Prateek Mittal
Peter Henderson
47
73
0
10 Jun 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
76
8
0
08 Jun 2024
GenAI Arena: An Open Evaluation Platform for Generative Models
GenAI Arena: An Open Evaluation Platform for Generative Models
Dongfu Jiang
Max W.F. Ku
Tianle Li
Yuansheng Ni
Shizhuo Sun
Rongqi Fan
Wenhu Chen
EGVM
41
20
0
06 Jun 2024
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt
Zonghao Ying
Aishan Liu
Tianyuan Zhang
Zhengmin Yu
Siyuan Liang
Xianglong Liu
Dacheng Tao
AAML
37
26
0
06 Jun 2024
Ranking Manipulation for Conversational Search Engines
Ranking Manipulation for Conversational Search Engines
Samuel Pfrommer
Yatong Bai
Tanmay Gautam
Somayeh Sojoudi
SILM
47
4
0
05 Jun 2024
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix
  Controller
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
Min Cai
Yuchen Zhang
Shichang Zhang
Fan Yin
Difan Zou
Yisong Yue
Ziniu Hu
30
0
0
04 Jun 2024
Safeguarding Large Language Models: A Survey
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
35
19
0
03 Jun 2024
The Life Cycle of Large Language Models: A Review of Biases in Education
The Life Cycle of Large Language Models: A Review of Biases in Education
Jinsook Lee
Yann Hicke
Renzhe Yu
Christopher A. Brooks
René F. Kizilcec
AI4Ed
39
1
0
03 Jun 2024
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models
  and Their Defenses
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min-Bin Lin
AAML
68
29
0
03 Jun 2024
Jailbreaking Large Language Models Against Moderation Guardrails via
  Cipher Characters
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
Haibo Jin
Andy Zhou
Joe D. Menke
Haohan Wang
38
11
0
30 May 2024
Would I Lie To You? Inference Time Alignment of Language Models using
  Direct Preference Heads
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads
Avelina Asada Hadji-Kyriacou
Ognjen Arandjelović
22
1
0
30 May 2024
AI Risk Management Should Incorporate Both Safety and Security
AI Risk Management Should Incorporate Both Safety and Security
Xiangyu Qi
Yangsibo Huang
Yi Zeng
Edoardo Debenedetti
Jonas Geiping
...
Chaowei Xiao
Bo-wen Li
Dawn Song
Peter Henderson
Prateek Mittal
AAML
51
11
0
29 May 2024
Toxicity Detection for Free
Toxicity Detection for Free
Zhanhao Hu
Julien Piet
Geng Zhao
Jiantao Jiao
David A. Wagner
32
4
0
29 May 2024
Aligning to Thousands of Preferences via System Message Generalization
Aligning to Thousands of Preferences via System Message Generalization
Seongyun Lee
Sue Hyun Park
Seungone Kim
Minjoon Seo
ALM
41
38
0
28 May 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
63
12
0
28 May 2024
Cross-Modal Safety Alignment: Is textual unlearning all you need?
Cross-Modal Safety Alignment: Is textual unlearning all you need?
Trishna Chakraborty
Erfan Shayegani
Zikui Cai
Nael B. Abu-Ghazaleh
M. Salman Asif
Yue Dong
A. Roy-Chowdhury
Chengyu Song
41
16
0
27 May 2024
Navigating the Safety Landscape: Measuring Risks in Finetuning Large
  Language Models
Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
Sheng-Hsuan Peng
Pin-Yu Chen
Matthew Hull
Duen Horng Chau
50
21
0
27 May 2024
Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language
  Models via Role-playing Image Character
Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character
Siyuan Ma
Weidi Luo
Yu Wang
Xiaogeng Liu
38
20
0
25 May 2024
No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning
  Attacks
No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks
Chak Tou Leong
Yi Cheng
Kaishuai Xu
Jian Wang
Hanlin Wang
Wenjie Li
AAML
51
17
0
25 May 2024
Extracting Prompts by Inverting LLM Outputs
Extracting Prompts by Inverting LLM Outputs
Collin Zhang
John X. Morris
Vitaly Shmatikov
36
16
0
23 May 2024
WordGame: Efficient & Effective LLM Jailbreak via Simultaneous
  Obfuscation in Query and Response
WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response
Tianrong Zhang
Bochuan Cao
Yuanpu Cao
Lu Lin
Prasenjit Mitra
Jinghui Chen
AAML
42
9
0
22 May 2024
Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model
  Against LLM Red-Teaming
Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming
Jiaxu Liu
Xiangyu Yin
Sihao Wu
Jianhong Wang
Meng Fang
Xinping Yi
Xiaowei Huang
34
4
0
21 May 2024
Aligning Transformers with Continuous Feedback via Energy Rank Alignment
Aligning Transformers with Continuous Feedback via Energy Rank Alignment
Shriram Chennakesavalu
Frank Hu
Sebastian Ibarraran
Grant M. Rotskoff
38
3
0
21 May 2024
MBIAS: Mitigating Bias in Large Language Models While Retaining Context
MBIAS: Mitigating Bias in Large Language Models While Retaining Context
Shaina Raza
Ananya Raval
Veronica Chatrath
48
6
0
18 May 2024
Realistic Evaluation of Toxicity in Large Language Models
Realistic Evaluation of Toxicity in Large Language Models
Tinh Son Luong
Thanh-Thien Le
L. Van
Thien Huu Nguyen
LM&MA
17
3
0
17 May 2024
Agent Design Pattern Catalogue: A Collection of Architectural Patterns
  for Foundation Model based Agents
Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents
Yue Liu
Sin Kit Lo
Qinghua Lu
Liming Zhu
Dehai Zhao
Xiwei Xu
Stefan Harrer
Jon Whittle
LLMAG
AI4CE
27
10
0
16 May 2024
Risks and Opportunities of Open-Source Generative AI
Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksander Petrov
Bertie Vidgen
Christian Schroeder
Fabio Pizzati
...
Matthew Jackson
Phillip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
48
18
0
14 May 2024
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
Ziyang Zhang
Qizhen Zhang
Jakob N. Foerster
AAML
37
18
0
13 May 2024
BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language
  Models
BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models
Chunyan Luo
Ahmad Ghawanmeh
Xiaodan Zhu
Faiza Khan Khattak
KELM
36
0
0
08 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
42
36
0
06 May 2024
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and
  AI-Generated Images
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Y. Qu
Xinyue Shen
Yixin Wu
Michael Backes
Savvas Zannettou
Yang Zhang
EGVM
40
12
0
06 May 2024
Video Diffusion Models: A Survey
Video Diffusion Models: A Survey
Andrew Melnik
Michal Ljubljanac
Cong Lu
Qi Yan
Weiming Ren
Helge J. Ritter
VGen
71
12
0
06 May 2024
A Framework for Real-time Safeguarding the Text Generation of Large
  Language Model
A Framework for Real-time Safeguarding the Text Generation of Large Language Model
Ximing Dong
Dayi Lin
Shaowei Wang
Ahmed E. Hassan
41
1
0
29 Apr 2024
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksandar Petrov
Bertie Vidgen
Christian Schroeder de Witt
Fabio Pizzati
...
Paul Röttger
Philip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
46
6
0
25 Apr 2024
RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
Adrian de Wynter
Ishaan Watts
Nektar Ege Altıntoprak
Tua Wongsangaroonsri
Minghui Zhang
...
Anna Vickers
Stéphanie Visser
Herdyan Widarmanto
A. Zaikin
Si-Qing Chen
LM&MA
54
16
0
22 Apr 2024
LLMs for Cyber Security: New Opportunities
LLMs for Cyber Security: New Opportunities
D. Divakaran
Sai Teja Peddinti
24
11
0
17 Apr 2024
Uncertainty-Based Abstention in LLMs Improves Safety and Reduces
  Hallucinations
Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations
Christian Tomani
Kamalika Chaudhuri
Ivan Evtimov
Daniel Cremers
Mark Ibrahim
53
9
0
16 Apr 2024
Private Attribute Inference from Images with Vision-Language Models
Private Attribute Inference from Images with Vision-Language Models
Batuhan Tömekçe
Mark Vero
Robin Staab
Martin Vechev
VLM
PILM
68
7
0
16 Apr 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path
  Forward
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
53
6
0
12 Apr 2024
Latent Guard: a Safety Framework for Text-to-image Generation
Latent Guard: a Safety Framework for Text-to-image Generation
Runtao Liu
Ashkan Khakzar
Jindong Gu
Qifeng Chen
Philip H. S. Torr
Fabio Pizzati
28
24
0
11 Apr 2024
AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM
  Experts
AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts
Shaona Ghosh
Prasoon Varshney
Erick Galinkin
Christopher Parisien
ELM
38
36
0
09 Apr 2024
ALERT: A Comprehensive Benchmark for Assessing Large Language Models'
  Safety through Red Teaming
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
Simone Tedeschi
Felix Friedrich
P. Schramowski
Kristian Kersting
Roberto Navigli
Huu Nguyen
Bo Li
ELM
41
45
0
06 Apr 2024
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search
Hwiyeol Jo
Taiwoo Park
Nayoung Choi
Changbong Kim
Ohjoon Kwon
...
Kyoungho Shin
Sun Suk Lim
Kyungmi Kim
Jihye Lee
Sun Kim
60
0
0
05 Apr 2024
CantTalkAboutThis: Aligning Language Models to Stay on Topic in
  Dialogues
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Makesh Narsimhan Sreedhar
Traian Rebedea
Shaona Ghosh
Jiaqi Zeng
Christopher Parisien
ALM
32
4
0
04 Apr 2024
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large
  Language Models
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Patrick Chao
Edoardo Debenedetti
Alexander Robey
Maksym Andriushchenko
Francesco Croce
...
Nicolas Flammarion
George J. Pappas
F. Tramèr
Hamed Hassani
Eric Wong
ALM
ELM
AAML
57
96
0
28 Mar 2024
Previous
123456
Next