ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19428
  4. Cited By
Frictional Agent Alignment Framework: Slow Down and Don't Break Things

Frictional Agent Alignment Framework: Slow Down and Don't Break Things

26 May 2025
Abhijnan Nath
Carine Graff
Andrei Bachinin
Nikhil Krishnaswamy
ArXiv (abs)PDFHTML

Papers citing "Frictional Agent Alignment Framework: Slow Down and Don't Break Things"

47 / 47 papers shown
Title
Rethinking Random Masking in Self Distillation on ViT
Rethinking Random Masking in Self Distillation on ViT
Jihyeon Seong
Hyunkyung Han
101
0
0
12 Jun 2025
TRACE: Real-Time Multimodal Common Ground Tracking in Situated Collaborative Dialogues
Hannah VanderHoeven
Brady Bhalla
Ibrahim Khebour
Austin Youngren
Videep Venkatesha
...
Yifan Zhu
Kenneth Lai
Changsoo Jung
James Pustejovsky
Nikhil Krishnaswamy
82
2
0
12 Mar 2025
Any Other Thoughts, Hedgehog? Linking Deliberation Chains in
  Collaborative Dialogues
Any Other Thoughts, Hedgehog? Linking Deliberation Chains in Collaborative Dialogues
Abhijnan Nath
Videep Venkatesha
Mariah Bradford
Avyakta Chelle
Austin Youngren
Carlos Mabrey
Nathaniel Blanchard
Nikhil Krishnaswamy
83
3
0
25 Oct 2024
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both
Abhijnan Nath
Changsoo Jung
Ethan Seefried
Nikhil Krishnaswamy
473
4
0
11 Oct 2024
Modulating Language Model Experiences through Frictions
Modulating Language Model Experiences through Frictions
Katherine M. Collins
Valerie Chen
Ilia Sucholutsky
Hannah Rose Kirk
Malak Sadek
Holli Sargeant
Ameet Talwalkar
Adrian Weller
Umang Bhatt
KELM
114
5
0
24 Jun 2024
Toward Optimal LLM Alignments Using Two-Player Games
Toward Optimal LLM Alignments Using Two-Player Games
Rui Zheng
Hongyi Guo
Zhihan Liu
Xiaoying Zhang
Yuanshun Yao
...
Tao Gui
Qi Zhang
Xuanjing Huang
Hang Li
Yang Liu
103
6
0
16 Jun 2024
Regularizing Hidden States Enables Learning Generalizable Reward Model
  for LLMs
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Rui Yang
Ruomeng Ding
Yong Lin
Huan Zhang
Tong Zhang
105
62
0
14 Jun 2024
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning
  in LLMs
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
Xuan Zhang
Chao Du
Tianyu Pang
Qian Liu
Wei Gao
Min Lin
LRMAI4CE
98
64
0
13 Jun 2024
Robust Preference Optimization through Reward Model Distillation
Robust Preference Optimization through Reward Model Distillation
Adam Fisch
Jacob Eisenstein
Vicky Zayats
Alekh Agarwal
Ahmad Beirami
Chirag Nagpal
Peter Shaw
Jonathan Berant
150
37
0
29 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng
Mengzhou Xia
Danqi Chen
145
492
0
23 May 2024
Imitation Learning: A Survey of Learning Methods, Environments and
  Metrics
Imitation Learning: A Survey of Learning Methods, Environments and Metrics
Nathan Gavenski
Odinaldo Rodrigues
Michael Luck
71
134
0
30 Apr 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with
  General Preferences
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
199
132
0
04 Apr 2024
Common Ground Tracking in Multimodal Dialogue
Common Ground Tracking in Multimodal Dialogue
Ibrahim Khebour
Kenneth Lai
Mariah Bradford
Yifan Zhu
R. Brutti
...
Jingxuan Tu
Benjamin Ibarra
Nathaniel Blanchard
Nikhil Krishnaswamy
James Pustejovsky
64
9
0
26 Mar 2024
RewardBench: Evaluating Reward Models for Language Modeling
RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert
Valentina Pyatkin
Jacob Morrison
Lester James V. Miranda
Bill Yuchen Lin
...
Sachin Kumar
Tom Zick
Yejin Choi
Noah A. Smith
Hanna Hajishirzi
ALM
180
261
0
20 Mar 2024
Human Alignment of Large Language Models through Online Preference
  Optimisation
Human Alignment of Large Language Models through Online Preference Optimisation
Daniele Calandriello
Daniel Guo
Rémi Munos
Mark Rowland
Yunhao Tang
...
Michal Valko
Tianqi Liu
Rishabh Joshi
Zeyu Zheng
Bilal Piot
99
67
0
13 Mar 2024
ORPO: Monolithic Preference Optimization without Reference Model
ORPO: Monolithic Preference Optimization without Reference Model
Jiwoo Hong
Noah Lee
James Thorne
OSLM
100
266
0
12 Mar 2024
MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language
  Models in Multi-Turn Dialogues
MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
Ge Bai
Jie Liu
Xingyuan Bu
Yancheng He
Jiaheng Liu
...
Zhuoran Lin
Wenbo Su
Tiezheng Ge
Bo Zheng
Wanli Ouyang
ELMLM&MA
110
93
0
22 Feb 2024
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Arka Pal
Deep Karkhanis
Samuel Dooley
Manley Roberts
Siddartha Naidu
Colin White
OSLM
91
155
0
20 Feb 2024
Exploring a Behavioral Model of "Positive Friction" in Human-AI
  Interaction
Exploring a Behavioral Model of "Positive Friction" in Human-AI Interaction
Zeya Chen
Ruth Schmidt
64
3
0
15 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
277
569
0
02 Feb 2024
Some things are more CRINGE than others: Iterative Preference
  Optimization with the Pairwise Cringe Loss
Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss
Jing Xu
Andrew Lee
Sainbayar Sukhbaatar
Jason Weston
70
97
0
27 Dec 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLMVLM
99
126
0
09 Nov 2023
A General Theoretical Paradigm to Understand Learning from Human
  Preferences
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
180
647
0
18 Oct 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
389
4,169
0
29 May 2023
How Good is Automatic Segmentation as a Multimodal Discourse Annotation
  Aid?
How Good is Automatic Segmentation as a Multimodal Discourse Annotation Aid?
Corbin Terpstra
Ibrahim Khebour
Mariah Bradford
B. Wisniewski
Nikhil Krishnaswamy
Nathaniel Blanchard
43
4
0
27 May 2023
QLoRA: Efficient Finetuning of Quantized LLMs
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers
Artidoro Pagnoni
Ari Holtzman
Luke Zettlemoyer
ALM
154
2,611
0
23 May 2023
CAMEL: Communicative Agents for "Mind" Exploration of Large Language
  Model Society
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Ge Li
Hasan Hammoud
Hani Itani
Dmitrii Khizbullin
Guohao Li
SyDaALM
133
513
0
31 Mar 2023
Large Language Models Fail on Trivial Alterations to Theory-of-Mind
  Tasks
Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks
T. Ullman
LRM
71
240
0
16 Feb 2023
Aligning Language Models with Preferences through f-divergence
  Minimization
Aligning Language Models with Preferences through f-divergence Minimization
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Nahyeon Ryu
Marc Dymetman
81
76
0
16 Feb 2023
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
216
3,750
0
06 Dec 2022
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
Maarten Sap
Ronan Le Bras
Daniel Fried
Yejin Choi
97
230
0
24 Oct 2022
Controllable Dialogue Simulation with In-Context Learning
Controllable Dialogue Simulation with In-Context Learning
Zekun Li
Wenhu Chen
Shiyang Li
Hong Wang
Jingu Qian
Xi Yan
200
47
0
09 Oct 2022
On Reinforcement Learning and Distribution Matching for Fine-Tuning
  Language Models with no Catastrophic Forgetting
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
78
57
0
01 Jun 2022
OPT: Open Pre-trained Transformer Language Models
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLMOSLMAI4CE
367
3,700
0
02 May 2022
Reframing Human-AI Collaboration for Generating Free-Text Explanations
Reframing Human-AI Collaboration for Generating Free-Text Explanations
Sarah Wiegreffe
Jack Hessel
Swabha Swayamdipta
Mark O. Riedl
Yejin Choi
72
149
0
16 Dec 2021
DeliData: A dataset for deliberation in multi-party problem solving
DeliData: A dataset for deliberation in multi-party problem solving
Georgi Karadzhov
Tom Stafford
Andreas Vlachos
86
20
0
11 Aug 2021
MultiWOZ 2.4: A Multi-Domain Task-Oriented Dialogue Dataset with
  Essential Annotation Corrections to Improve State Tracking Evaluation
MultiWOZ 2.4: A Multi-Domain Task-Oriented Dialogue Dataset with Essential Annotation Corrections to Improve State Tracking Evaluation
Fanghua Ye
Jarana Manotumruksa
Emine Yilmaz
64
104
0
01 Apr 2021
Measuring Association Between Labels and Free-Text Rationales
Measuring Association Between Labels and Free-Text Rationales
Sarah Wiegreffe
Ana Marasović
Noah A. Smith
316
182
0
24 Oct 2020
Learning to summarize from human feedback
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
262
2,192
0
02 Sep 2020
MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections
  and State Tracking Baselines
MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines
Xiaoxue Zang
Abhinav Rastogi
Srinivas Sunkara
Raghav Gupta
Jianguo Zhang
Jindong Chen
71
279
0
10 Jul 2020
Sparse Text Generation
Sparse Text Generation
Pedro Henrique Martins
Zita Marinho
André F. T. Martins
MoE
100
39
0
06 Apr 2020
Advantage-Weighted Regression: Simple and Scalable Off-Policy
  Reinforcement Learning
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
Xue Bin Peng
Aviral Kumar
Grace Zhang
Sergey Levine
OffRL
157
570
0
01 Oct 2019
Towards Scalable Multi-domain Conversational Agents: The Schema-Guided
  Dialogue Dataset
Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset
Abhinav Rastogi
Xiaoxue Zang
Srinivas Sunkara
Raghav Gupta
Pranav Khaitan
79
614
0
12 Sep 2019
The Curious Case of Neural Text Degeneration
The Curious Case of Neural Text Degeneration
Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
209
3,213
0
22 Apr 2019
Decoupled Weight Decay Regularization
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
154
2,158
0
14 Nov 2017
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
577
19,315
0
20 Jul 2017
Evidence and plausibility in neighborhood structures
Evidence and plausibility in neighborhood structures
J. Benthem
D. Fernández-Duque
Eric Pacuit
57
69
0
04 Jul 2013
1