ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.10501
  4. Cited By
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications
  with Programmable Rails

NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails

16 October 2023
Traian Rebedea
R. Dinu
Makesh Narsimhan Sreedhar
Christopher Parisien
Jonathan Cohen
    KELM
ArXiv (abs)PDFHTML

Papers citing "NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails"

50 / 101 papers shown
Title
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Hiroshi Matsuda
Chunpeng Ma
Masayuki Asahara
94
0
0
11 Jun 2025
JavelinGuard: Low-Cost Transformer Architectures for LLM Security
JavelinGuard: Low-Cost Transformer Architectures for LLM Security
Yash Datta
Sharath Rajasekar
15
0
0
09 Jun 2025
Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets
Lei Hsiung
Tianyu Pang
Yung-Chen Tang
Linyue Song
Tsung-Yi Ho
Pin-Yu Chen
Yaoqing Yang
119
0
0
05 Jun 2025
RedDebate: Safer Responses through Multi-Agent Red Teaming Debates
RedDebate: Safer Responses through Multi-Agent Red Teaming Debates
Ali Asad
Stephen Obadinma
Radin Shayanfar
Xiaodan Zhu
AAMLLLMAG
25
0
0
04 Jun 2025
CoDial: Interpretable Task-Oriented Dialogue Systems Through Dialogue Flow Alignment
CoDial: Interpretable Task-Oriented Dialogue Systems Through Dialogue Flow Alignment
Radin Shayanfar
Chu Fei Luo
R. Bhambhoria
Samuel Dahan
Xiaodan Zhu
20
0
0
02 Jun 2025
ACCESS DENIED INC: The First Benchmark Environment for Sensitivity Awareness
ACCESS DENIED INC: The First Benchmark Environment for Sensitivity Awareness
Dren Fazlija
Arkadij Orlov
Sandipan Sikdar
26
0
0
01 Jun 2025
Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences
Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences
Mingqian Zheng
Wenjia Hu
Patrick Zhao
Motahhare Eslami
Jena D. Hwang
Faeze Brahman
Carolyn Rose
Maarten Sap
20
0
0
30 May 2025
Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models
Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models
Makesh Narsimhan Sreedhar
Traian Rebedea
Christopher Parisien
LRM
76
0
0
26 May 2025
Relative Bias: A Comparative Framework for Quantifying Bias in LLMs
Alireza Arbabi
Florian Kerschbaum
209
0
0
22 May 2025
Guarded Query Routing for Large Language Models
Guarded Query Routing for Large Language Models
Richard Šléher
William Brach
Tibor Sloboda
Kristián Košťál
Lukas Galke
RALM
72
0
0
20 May 2025
sudoLLM : On Multi-role Alignment of Language Models
sudoLLM : On Multi-role Alignment of Language Models
Soumadeep Saha
Akshay Chaturvedi
Joy Mahapatra
Utpal Garain
45
0
0
20 May 2025
The Hitchhikers Guide to Production-ready Trustworthy Foundation Model powered Software (FMware)
The Hitchhikers Guide to Production-ready Trustworthy Foundation Model powered Software (FMware)
Kirill Vasilevski
Benjamin Rombaut
Gopi Krishnan Rajbahadur
G. Oliva
Keheliya Gallaba
...
Haoxiang Zhang
Bouyan Chen
Kishanthan Thangarajah
Ahmed E. Hassan
Zhen Ming
103
0
0
15 May 2025
Safeguard-by-Development: A Privacy-Enhanced Development Paradigm for Multi-Agent Collaboration Systems
Safeguard-by-Development: A Privacy-Enhanced Development Paradigm for Multi-Agent Collaboration Systems
Jian Cui
Zichuan Li
Luyi Xing
Xiaojing Liao
70
0
0
07 May 2025
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Narek Maloyan
Dmitry Namiot
SILMAAMLELM
119
0
0
25 Apr 2025
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Hannah Cyberey
David Evans
LLMSV
159
3
0
23 Apr 2025
DualBreach: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization
DualBreach: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization
Xinzhe Huang
Kedong Xiu
T. Zheng
Churui Zeng
Wangze Ni
Zhan Qiin
K. Ren
Chong Chen
AAML
51
0
0
21 Apr 2025
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
Yahan Yang
Soham Dan
Shuo Li
Dan Roth
Insup Lee
LRM
99
0
0
21 Apr 2025
The Structural Safety Generalization Problem
The Structural Safety Generalization Problem
Julius Broomfield
Tom Gibbs
Ethan Kosak-Hine
George Ingebretsen
Tia Nasir
Jason Zhang
Reihaneh Iranmanesh
Sara Pieri
Reihaneh Rabbany
Kellin Pelrine
AAML
102
0
0
13 Apr 2025
X-Guard: Multilingual Guard Agent for Content Moderation
X-Guard: Multilingual Guard Agent for Content Moderation
Bibek Upadhayay
Vahid Behzadan
Ph.D
102
3
0
11 Apr 2025
Large Language Models are Unreliable for Cyber Threat Intelligence
Large Language Models are Unreliable for Cyber Threat Intelligence
Emanuele Mezzi
Fabio Massacci
Katja Tuma
110
2
0
29 Mar 2025
Don't Forget It! Conditional Sparse Autoencoder Clamping Works for Unlearning
Matthew Khoriaty
Andrii Shportko
Gustavo Mercier
Zach Wood-Doughty
MU
87
3
0
14 Mar 2025
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Giulio Zizzo
Giandomenico Cornacchia
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Beat Buesser
Mark Purcell
Pin-Yu Chen
P. Sattigeri
Kush R. Varshney
AAML
114
5
0
24 Feb 2025
Prompt Inject Detection with Generative Explanation as an Investigative Tool
Prompt Inject Detection with Generative Explanation as an Investigative Tool
Jonathan Pan
Swee Liang Wong
Yidi Yuan
Xin Wei Chia
SILM
128
0
0
16 Feb 2025
FLAME: Flexible LLM-Assisted Moderation Engine
FLAME: Flexible LLM-Assisted Moderation Engine
Ivan Bakulin
Ilia Kopanichuk
Iaroslav Bespalov
Nikita Radchenko
V. Shaposhnikov
Dmitry V. Dylov
Ivan Oseledets
170
0
0
13 Feb 2025
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
Shanshan Han
Salman Avestimehr
Chaoyang He
123
2
0
12 Feb 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun Xia
Tianyi Wu
Zhiwei Xue
Yuxiao Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TSLRM
273
26
0
30 Jan 2025
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
Gopi Krishnan Rajbahadur
G. Oliva
Dayi Lin
Ahmed E. Hassan
120
1
0
28 Jan 2025
Beyond Benchmarks: On The False Promise of AI Regulation
Gabriel Stanovsky
Renana Keydar
Gadi Perl
Eliya Habba
92
2
0
28 Jan 2025
Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment
Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment
Melissa Kazemi Rad
Huy Nghiem
Andy Luo
Sahil Wadhwa
Mohammad Sorower
Stephen Rawls
AAML
154
5
0
22 Jan 2025
Position: A taxonomy for reporting and describing AI security incidents
Position: A taxonomy for reporting and describing AI security incidents
L. Bieringer
Kevin Paeth
Andreas Wespi
Kathrin Grosse
Alexandre Alahi
Kathrin Grosse
160
0
0
19 Dec 2024
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model
  with Transparent Explanations
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
Zhiwen Chen
Francesco Pinto
Minzhou Pan
Bo Li
118
5
0
09 Dec 2024
Improved Large Language Model Jailbreak Detection via Pretrained
  Embeddings
Improved Large Language Model Jailbreak Detection via Pretrained Embeddings
Erick Galinkin
Martin Sablotny
116
3
0
02 Dec 2024
Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings
Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings
Aaron Zheng
Mansi Rana
Andreas Stolcke
126
1
0
21 Nov 2024
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
Gabriel Chua
Shing Yee Chan
Shaun Khoo
202
1
0
20 Nov 2024
AI Ethics by Design: Implementing Customizable Guardrails for
  Responsible AI Development
AI Ethics by Design: Implementing Customizable Guardrails for Responsible AI Development
Kristina Šekrst
Jeremy McHugh
Jonathan Rodriguez Cefalu
104
0
0
05 Nov 2024
Keep on Swimming: Real Attackers Only Need Partial Knowledge of a
  Multi-Model System
Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System
Julian Collado
Kevin Stangl
AAML
62
0
0
30 Oct 2024
Benchmarking LLM Guardrails in Handling Multilingual Toxicity
Benchmarking LLM Guardrails in Handling Multilingual Toxicity
Yahan Yang
Soham Dan
Dan Roth
Insup Lee
58
9
0
29 Oct 2024
Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring
Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring
Honglin Mu
Han He
Yuxin Zhou
Yunlong Feng
Yang Xu
...
Zeming Liu
Xudong Han
Qi Shi
Qingfu Zhu
Wanxiang Che
AAML
98
1
0
28 Oct 2024
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A
  Comparative Analysis
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis
Jonathan Brokman
Omer Hofman
Oren Rachmil
Inderjeet Singh
Vikas Pahuja
Rathina Sabapathy Aishvariya Priya
Amit Giloni
Roman Vainshtein
Hisashi Kojima
69
2
0
21 Oct 2024
Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in
  Integrating LLMs into Software Products
Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products
Nadia Nahar
Christian Kastner
Jenna L. Butler
Chris Parnin
Thomas Zimmermann
Christian Bird
108
5
0
15 Oct 2024
On Calibration of LLM-based Guard Models for Reliable Content Moderation
On Calibration of LLM-based Guard Models for Reliable Content Moderation
Hongfu Liu
Hengguan Huang
Hao Wang
Xiangming Gu
Ye Wang
188
4
0
14 Oct 2024
Survival of the Safest: Towards Secure Prompt Optimization through
  Interleaved Multi-Objective Evolution
Survival of the Safest: Towards Secure Prompt Optimization through Interleaved Multi-Objective Evolution
Ankita Sinha
Wendi Cui
Kamalika Das
Jiaxin Zhang
AAML
66
7
0
12 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
173
1
0
09 Oct 2024
TaeBench: Improving Quality of Toxic Adversarial Examples
TaeBench: Improving Quality of Toxic Adversarial Examples
Xuan Zhu
Dmitriy Bespalov
Liwen You
Ninad Kulkarni
Yanjun Qi
AAML
111
0
0
08 Oct 2024
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Pratiksha Thaker
Shengyuan Hu
Neil Kale
Yash Maurya
Zhiwei Steven Wu
Virginia Smith
MU
132
25
0
03 Oct 2024
SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large
  Language Models in Scientific Tasks
SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large Language Models in Scientific Tasks
Tianhao Li
Jingyu Lu
Chuangxin Chu
Tianyu Zeng
Yujia Zheng
...
Xuejing Yuan
Xingkai Wang
Keyan Ding
Huajun Chen
Qiang Zhang
ELM
99
5
0
02 Oct 2024
AI Delegates with a Dual Focus: Ensuring Privacy and Strategic
  Self-Disclosure
AI Delegates with a Dual Focus: Ensuring Privacy and Strategic Self-Disclosure
Xi Chen
Zhiyang Zhang
Fangkai Yang
Xiaoting Qin
Chao Du
...
Hangxin Liu
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
39
1
0
26 Sep 2024
Enhancing Guardrails for Safe and Secure Healthcare AI
Enhancing Guardrails for Safe and Secure Healthcare AI
Ananya Gangavarapu
43
1
0
25 Sep 2024
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in
  Red Teaming GenAI
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI
Ambrish Rawat
Stefan Schoepf
Giulio Zizzo
Giandomenico Cornacchia
Muhammad Zaid Hameed
...
Elizabeth M. Daly
Mark Purcell
P. Sattigeri
Pin-Yu Chen
Kush R. Varshney
AAML
104
8
0
23 Sep 2024
LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless
  Integration of Multi Active/Passive Core-Agents
LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents
Amine B. Hassouna
Hana Chaari
Ines Belhaj
LLMAG
95
1
0
17 Sep 2024
123
Next