Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.10501
Cited By
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails
16 October 2023
Traian Rebedea
R. Dinu
Makesh Narsimhan Sreedhar
Christopher Parisien
Jonathan Cohen
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails"
50 / 101 papers shown
Title
No Size Fits All: The Perils and Pitfalls of Leveraging LLMs Vary with Company Size
Ashok Urlana
Charaka Vinayak Kumar
B. Garlapati
Ajeet Kumar Singh
Rahul Mishra
91
1
0
21 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
129
15
0
20 Jul 2024
LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation
David Schlangen
76
1
0
18 Jul 2024
Social and Ethical Risks Posed by General-Purpose LLMs for Settling Newcomers in Canada
I. Nejadgholi
Maryam Molamohammadi
Samir Bakhtawar
107
0
0
15 Jul 2024
Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)
K. Kenthapadi
M. Sameki
Ankur Taly
HILM
ELM
AILaw
83
15
0
10 Jul 2024
ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
Victoria R. Li
Yida Chen
Naomi Saphra
89
5
0
09 Jul 2024
When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
Manish Nagireddy
Inkit Padhi
Soumya Ghosh
P. Sattigeri
77
1
0
08 Jul 2024
R
2
R^2
R
2
-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
Mintong Kang
Yue Liu
LRM
125
16
0
08 Jul 2024
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
100
19
0
06 Jul 2024
Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
Hannah Brown
Leon Lin
Kenji Kawaguchi
Michael Shieh
AAML
147
8
0
03 Jul 2024
LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models
Hayder Elesedy
Pedro M. Esperança
Silviu Vlad Oprea
Mete Ozay
KELM
94
4
0
03 Jul 2024
Building Understandable Messaging for Policy and Evidence Review (BUMPER) with AI
Katherine A. Rosenfeld
Maike Sonnewald
Sonia J. Jindal
Kevin A. McCarthy
Joshua L. Proctor
56
0
0
27 Jun 2024
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance
Caishuang Huang
Wanxu Zhao
Rui Zheng
Huijie Lv
Shihan Dou
...
Junjie Ye
Yuming Yang
Tao Gui
Qi Zhang
Xuanjing Huang
LLMSV
AAML
121
9
0
26 Jun 2024
Unsupervised Extraction of Dialogue Policies from Conversations
Makesh Narsimhan Sreedhar
Traian Rebedea
Christopher Parisien
OffRL
101
3
0
21 Jun 2024
Current state of LLM Risks and AI Guardrails
Suriya Ganesh Ayyamperumal
Limin Ge
100
30
0
16 Jun 2024
garak: A Framework for Security Probing Large Language Models
Leon Derczynski
Erick Galinkin
Jeffrey Martin
Subho Majumdar
Nanna Inie
AAML
ELM
95
20
0
16 Jun 2024
TorchOpera: A Compound AI System for LLM Safety
Shanshan Han
Yuhang Yao
Zijian Hu
Dimitris Stripelis
Zhaozhuo Xu
Chaoyang He
LLMAG
126
0
0
16 Jun 2024
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
Zhen Xiang
Linzhi Zheng
Yanjie Li
Junyuan Hong
Qinbin Li
...
Zidi Xiong
Chulin Xie
Carl Yang
Dawn Song
Bo Li
LLMAG
76
31
0
13 Jun 2024
HelpSteer2: Open-source dataset for training top-performing reward models
Zhilin Wang
Yi Dong
Olivier Delalleau
Jiaqi Zeng
Gerald Shen
Daniel Egert
Jimmy J. Zhang
Makesh Narsimhan Sreedhar
Oleksii Kuchaiev
AI4TS
124
108
0
12 Jun 2024
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways
Zehang Deng
Yongjian Guo
Changzhou Han
Wanlun Ma
Junwu Xiong
Sheng Wen
Yang Xiang
157
49
0
04 Jun 2024
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
99
26
0
03 Jun 2024
Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents
Yue Liu
Sin Kit Lo
Qinghua Lu
Liming Zhu
Dehai Zhao
Xiwei Xu
Stefan Harrer
Jon Whittle
LLMAG
AI4CE
100
16
0
16 May 2024
ContextQ: Generated Questions to Support Meaningful Parent-Child Dialogue While Co-Reading
Griffin Dietz Smith
Siddhartha Prasad
Matt J. Davidson
Leah Findlater
R. Benjamin Shapiro
85
8
0
06 May 2024
"Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time
Scott Rome
Tianwen Chen
Raphael Tang
Luwei Zhou
Ferhan Ture
35
3
0
01 May 2024
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Shreya Shankar
J.D. Zamfirescu-Pereira
Bjorn Hartmann
Aditya G. Parameswaran
Ian Arawjo
ALM
93
97
0
18 Apr 2024
Introducing L2M3, A Multilingual Medical Large Language Model to Advance Health Equity in Low-Resource Regions
Agasthya Gangavarapu
LM&MA
73
8
0
11 Apr 2024
Rethinking How to Evaluate Language Model Jailbreak
Hongyu Cai
Arjun Arunasalam
Leo Y. Lin
Antonio Bianchi
Z. Berkay Celik
ALM
65
8
0
09 Apr 2024
Multicalibration for Confidence Scoring in LLMs
Gianluca Detommaso
Martín Bertrán
Riccardo Fogliato
Aaron Roth
112
19
0
06 Apr 2024
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Makesh Narsimhan Sreedhar
Traian Rebedea
Shaona Ghosh
Jiaqi Zeng
Christopher Parisien
ALM
103
6
0
04 Apr 2024
MELTing point: Mobile Evaluation of Language Transformers
Stefanos Laskaridis
Kleomenis Katevas
Lorenzo Minto
Hamed Haddadi
95
24
0
19 Mar 2024
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Zhuowen Yuan
Zidi Xiong
Yi Zeng
Ning Yu
Ruoxi Jia
Basel Alomair
Yue Liu
AAML
KELM
130
45
0
19 Mar 2024
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations
Swapnaja Achintalwar
Adriana Alvarado Garcia
Ateret Anaby-Tavor
Ioana Baldini
Sara E. Berger
...
Aashka Trivedi
Kush R. Varshney
Dennis L. Wei
Shalisha Witherspooon
Marcel Zalmanovici
94
11
0
09 Mar 2024
Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models
Arijit Ghosh Chowdhury
Md. Mofijul Islam
Vaibhav Kumar
F. H. Shezan
Vaibhav Kumar
Vinija Jain
Aman Chadha
AAML
PILM
92
34
0
03 Mar 2024
LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey
Ashok Urlana
Charaka Vinayak Kumar
Ajeet Kumar Singh
B. Garlapati
S. Chalamala
Rahul Mishra
122
8
0
22 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
132
68
0
14 Feb 2024
Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models
Zhibo Hu
Chen Wang
Yanfeng Shu
Helen Paik
Paik
Liming Zhu
SILM
RALM
77
10
0
11 Feb 2024
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika
Long Phan
Xuwang Yin
Andy Zou
Zifan Wang
...
Nathaniel Li
Steven Basart
Bo Li
David A. Forsyth
Dan Hendrycks
AAML
112
419
0
06 Feb 2024
Nevermind: Instruction Override and Moderation in Large Language Models
Edward Kim
ALM
26
1
0
05 Feb 2024
Building Guardrails for Large Language Models
Yizhen Dong
Ronghui Mu
Gao Jin
Yi Qi
Jinwei Hu
Xingyu Zhao
Jie Meng
Wenjie Ruan
Xiaowei Huang
OffRL
134
32
0
02 Feb 2024
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Andy Zhou
Bo Li
Haohan Wang
AAML
130
88
0
30 Jan 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu
Sanjay Jain
Mohan S. Kankanhalli
HILM
LRM
172
259
0
22 Jan 2024
ML-On-Rails: Safeguarding Machine Learning Models in Software Systems A Case Study
Hala Abdelkader
Mohamed Abdelrazek
Scott Barnett
Jean-Guy Schneider
Priya Rani
Rajesh Vasa
83
4
0
12 Jan 2024
Malla: Demystifying Real-world Large Language Model Integrated Malicious Services
Zilong Lin
Jian Cui
Xiaojing Liao
Wenyuan Xu
64
23
0
06 Jan 2024
Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability, Explainability, and Safety
Manas Gaur
Amit P. Sheth
67
17
0
05 Dec 2023
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
Zhilin Wang
Yi Dong
Jiaqi Zeng
Virginia Adams
Makesh Narsimhan Sreedhar
...
Olivier Delalleau
Jane Polak Scowcroft
Neel Kant
Aidan Swope
Oleksii Kuchaiev
3DV
70
77
0
16 Nov 2023
Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey
Garima Agrawal
Tharindu Kumarage
Zeyad Alghami
Huanmin Liu
106
97
0
14 Nov 2023
Self-Guard: Empower the LLM to Safeguard Itself
Zezhong Wang
Fangkai Yang
Lu Wang
Pu Zhao
Hongru Wang
Liang Chen
Qingwei Lin
Kam-Fai Wong
166
35
0
24 Oct 2023
Co(ve)rtex: ML Models as storage channels and their (mis-)applications
Md Abdullah Al Mamun
Quazi Mishkatul Alam
Erfan Shayegani
Pedram Zaree
Ihsen Alouani
Nael B. Abu-Ghazaleh
72
0
0
17 Jul 2023
Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions
Dawen Zhang
Pamela Finckenberg-Broman
Thong Hoang
Shidong Pan
Zhenchang Xing
Mark Staples
Xiwei Xu
AILaw
MU
130
55
0
08 Jul 2023
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants
A. Sun
Varun Nair
Elliot Schumacher
Anitha Kannan
80
3
0
27 Apr 2023
Previous
1
2
3
Next