ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLMALM
ArXiv (abs)PDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 6,398 papers shown
Title
Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
Zhuohao Yu
Weizheng Gu
Yidong Wang
Xingru Jiang
Zhengran Zeng
Jindong Wang
Wei Ye
Shikun Zhang
LRM
200
5
0
19 Dec 2024
Next Patch Prediction for Autoregressive Visual Generation
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
255
10
0
19 Dec 2024
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
Xiaoning Dong
Wenbo Hu
Wei Xu
Tianxing He
216
0
0
19 Dec 2024
Language verY Rare for All
Language verY Rare for All
Ibrahim Merad
Amos Wolf
Ziad Mazzawi
Yannick Léo
117
0
0
18 Dec 2024
Energy-Based Preference Model Offers Better Offline Alignment than the
  Bradley-Terry Preference Model
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
Yuzhong Hong
Hanshan Zhang
Junwei Bao
Hongfei Jiang
Yang Song
OffRL
122
4
0
18 Dec 2024
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and
  Post-LN
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li
Lu Yin
Shiwei Liu
116
8
0
18 Dec 2024
PsyDT: Using LLMs to Construct the Digital Twin of Psychological
  Counselor with Personalized Counseling Style for Psychological Counseling
PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling
Haojie Xie
Yirong Chen
Xiaofen Xing
Jingkai Lin
Xiangmin Xu
OffRL
144
5
0
18 Dec 2024
Gradual Vigilance and Interval Communication: Enhancing Value Alignment
  in Multi-Agent Debates
Gradual Vigilance and Interval Communication: Enhancing Value Alignment in Multi-Agent Debates
Rui Zou
Mengqi Wei
Jintian Feng
Qian Wan
Jianwen Sun
Sannyuya Liu
94
0
0
18 Dec 2024
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
Kun Wu
Chengkai Hou
Jiaming Liu
Zhengping Che
Xiaozhu Ju
...
Zhenyu Wang
Pengju An
Siyuan Qian
Shanghang Zhang
Jian Tang
LM&Ro
242
24
0
18 Dec 2024
An Automated Explainable Educational Assessment System Built on LLMs
An Automated Explainable Educational Assessment System Built on LLMs
Jiazheng Li
Artem Bobrov
David West
Cesare Aloisi
Yulan He
178
3
0
17 Dec 2024
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small
  LLMs
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
205
7
0
17 Dec 2024
LLMs are Also Effective Embedding Models: An In-depth Overview
LLMs are Also Effective Embedding Models: An In-depth Overview
Chongyang Tao
Tao Shen
Shen Gao
Junshuo Zhang
Zhen Li
Zhengwei Tao
Shuai Ma
145
11
0
17 Dec 2024
Understanding Emotional Body Expressions via Large Language Models
Understanding Emotional Body Expressions via Large Language Models
Haifeng Lu
Jiuyi Chen
Feng Liang
Mingkui Tan
Runhao Zeng
Xiping Hu
122
0
0
17 Dec 2024
NLSR: Neuron-Level Safety Realignment of Large Language Models Against
  Harmful Fine-Tuning
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Xin Yi
Shunfan Zheng
Linlin Wang
Gerard de Melo
Xiaoling Wang
Liang He
146
13
0
17 Dec 2024
ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models
ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models
Yuxi Sun
Wei Gao
Jing Ma
Hongzhan Lin
Ziyang Luo
Wenxuan Zhang
ELM
180
0
0
17 Dec 2024
Does VLM Classification Benefit from LLM Description Semantics?
Does VLM Classification Benefit from LLM Description Semantics?
Pingchuan Ma
Lennart Rietdorf
Dmytro Kotovenko
Vincent Tao Hu
Bjorn Ommer
VLM
162
1
0
16 Dec 2024
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia
Mingxing Li
Hancheng Ye
Wenjie Wu
Hongbin Zhou
...
Zeang Sheng
Botian Shi
Tao Chen
Junchi Yan
Bo Zhang
207
10
0
16 Dec 2024
IDEA-Bench: How Far are Generative Models from Professional Designing?
IDEA-Bench: How Far are Generative Models from Professional Designing?
C. Liang
Lianghua Huang
Jingwu Fang
Huanzhang Dou
Wei Wang
Zhi-Fan Wu
Yupeng Shi
Junge Zhang
Xin Zhao
Yu Liu
3DV
155
1
0
16 Dec 2024
Context Filtering with Reward Modeling in Question Answering
Context Filtering with Reward Modeling in Question Answering
Sangryul Kim
James Thorne
166
0
0
16 Dec 2024
Why Does ChatGPT "Delve" So Much? Exploring the Sources of Lexical
  Overrepresentation in Large Language Models
Why Does ChatGPT "Delve" So Much? Exploring the Sources of Lexical Overrepresentation in Large Language Models
Tom S. Juzek
Zina B. Ward
134
2
0
16 Dec 2024
LLMs Can Simulate Standardized Patients via Agent Coevolution
LLMs Can Simulate Standardized Patients via Agent Coevolution
Zhuoyun Du
Lujie Zheng
Renjun Hu
Yuyang Xu
Xiaochen Li
Ying Sun
Wei Chen
Jian Wu
Haolei Cai
Haohao Ying
LM&MA
117
5
0
16 Dec 2024
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
Boyang Xue
Fei Mi
Qi Zhu
Hongru Wang
Rui Wang
Sheng Wang
Erxin Yu
Xuming Hu
Kam-Fai Wong
HILM
248
2
0
16 Dec 2024
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Jiale Cheng
Xiao-Chang Liu
C. Wang
Xiaotao Gu
Yaojie Lu
Dan Zhang
Yuxiao Dong
J. Tang
Hongning Wang
Minlie Huang
LRM
189
4
0
16 Dec 2024
Safe Reinforcement Learning using Finite-Horizon Gradient-based
  Estimation
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Juntao Dai
Yaodong Yang
Qian Zheng
Gang Pan
OffRL
143
2
0
15 Dec 2024
SpearBot: Leveraging Large Language Models in a Generative-Critique
  Framework for Spear-Phishing Email Generation
SpearBot: Leveraging Large Language Models in a Generative-Critique Framework for Spear-Phishing Email Generation
Qinglin Qi
Yun Luo
Yijia Xu
Wenbo Guo
Yong Fang
AAML
134
2
0
15 Dec 2024
LAW: Legal Agentic Workflows for Custody and Fund Services Contracts
LAW: Legal Agentic Workflows for Custody and Fund Services Contracts
William Watson
Nicole Cho
Nishan Srishankar
Zhen Zeng
Lucas Cecchi
Daniel Scott
S. Siddagangappa
Rachneet Kaur
T. Balch
Manuela Veloso
AILaw
118
0
0
15 Dec 2024
Dual Traits in Probabilistic Reasoning of Large Language Models
Dual Traits in Probabilistic Reasoning of Large Language Models
Shenxiong Li
Huaxia Rui
122
0
0
15 Dec 2024
CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation
CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation
Zhendong Mi
Renming Zheng
Haowen Zhong
Yue Sun
Shaoyi Huang
Sayan Moitra
Ken Kutzer
Zhaozhuo Xu Shaoyi Huang
143
5
0
15 Dec 2024
Empowering LLMs to Understand and Generate Complex Vector Graphics
Empowering LLMs to Understand and Generate Complex Vector Graphics
Ximing Xing
Juncheng Hu
Guotao Liang
Jing Zhang
Dong Xu
Qian Yu
198
12
0
15 Dec 2024
SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation
SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation
Hang Zhang
Zhuoling Li
Jun Liu
LRM
186
1
0
15 Dec 2024
SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability
  Report Generation
SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation
Qilong Wu
Xiaoneng Xiang
Hejia Huang
Xuan Wang
Yeo Wei Jie
Ranjan Satapathy
Ricardo Shirota Filho
Bharadwaj Veeravalli
147
3
0
14 Dec 2024
TrendSim: Simulating Trending Topics in Social Media Under Poisoning
  Attacks with LLM-based Multi-agent System
TrendSim: Simulating Trending Topics in Social Media Under Poisoning Attacks with LLM-based Multi-agent System
Zeyu Zhang
Jianxun Lian
Chen Ma
Yaning Qu
Ye Luo
...
X. Chen
Yankai Lin
Le Wu
Xing Xie
Ji-Rong Wen
LLMAGAAML
143
5
0
14 Dec 2024
A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options
A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options
Peilong Wang
J. Holmes
Ziqiang Liu
Dequan Chen
Tianming Liu
Jiajian Shen
Wen Liu
LRMELMLM&MA
185
0
0
14 Dec 2024
Hybrid Preference Optimization for Alignment: Provably Faster
  Convergence Rates by Combining Offline Preferences with Online Exploration
Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration
Avinandan Bose
Zhihan Xiong
Aadirupa Saha
S. Du
Maryam Fazel
123
1
0
13 Dec 2024
MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary
  Negative Samples
MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samples
Shuo Xie
Fangzhi Zhu
Jiahui Wang
Lulu Wen
Wei Dai
Xiaowei Chen
Junxiong Zhu
Kai Zhou
Bo Zheng
91
0
0
13 Dec 2024
Text2Cypher: Bridging Natural Language and Graph Databases
Text2Cypher: Bridging Natural Language and Graph Databases
Makbule Gulcin Ozsoy
Leila Messallem
Jon Besga
Gianandrea Minneci
127
7
0
13 Dec 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
  Long-term Streaming Video and Audio Interactions
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
Xinsong Zhang
Kai Chen
Yu Qiao
Dahua Lin
Jiaqi Wang
KELM
203
16
0
12 Dec 2024
MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for
  Multi-Task Learning
MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for Multi-Task Learning
Lulu Zhao
Weihao Zeng
Xiaofeng Shi
Hua Zhou
MoMeMoE
125
2
0
12 Dec 2024
Test-Time Alignment via Hypothesis Reweighting
Test-Time Alignment via Hypothesis Reweighting
Yoonho Lee
Jonathan Williams
Henrik Marklund
Archit Sharma
E. Mitchell
Anikait Singh
Chelsea Finn
148
5
0
11 Dec 2024
ChatDyn: Language-Driven Multi-Actor Dynamics Generation in Street
  Scenes
ChatDyn: Language-Driven Multi-Actor Dynamics Generation in Street Scenes
Yuxi Wei
Jingbo Wang
Yuwen Du
Dingju Wang
Liang Pan
Chenxin Xu
Yao Feng
Bo Dai
Siheng Chen
AI4CE
156
1
0
11 Dec 2024
SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with
  ground-level prompting
SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting
Pallavi Jain
Dino Ienco
R. Interdonato
Tristan Berchoux
Diego Marcos
VLM
137
3
0
11 Dec 2024
Coverage-based Fairness in Multi-document Summarization
Coverage-based Fairness in Multi-document Summarization
Haoyuan Li
Yusen Zhang
Rui Zhang
Snigdha Chaturvedi
185
0
0
11 Dec 2024
Observing Micromotives and Macrobehavior of Large Language Models
Observing Micromotives and Macrobehavior of Large Language Models
Yuyang Cheng
Xingwei Qu
Tomas Goldsack
Chenghua Lin
Chung-Chi Chen
156
0
0
10 Dec 2024
Active Inference for Self-Organizing Multi-LLM Systems: A Bayesian Thermodynamic Approach to Adaptation
Active Inference for Self-Organizing Multi-LLM Systems: A Bayesian Thermodynamic Approach to Adaptation
Rithvik Prakki
LLMAGAI4CE
548
0
0
10 Dec 2024
Comateformer: Combined Attention Transformer for Semantic Sentence
  Matching
Comateformer: Combined Attention Transformer for Semantic Sentence Matching
Bo Li
Di Liang
Zixin Zhang
104
2
0
10 Dec 2024
On Evaluating the Durability of Safeguards for Open-Weight LLMs
On Evaluating the Durability of Safeguards for Open-Weight LLMs
Xiangyu Qi
Boyi Wei
Nicholas Carlini
Yangsibo Huang
Tinghao Xie
Luxi He
Matthew Jagielski
Milad Nasr
Prateek Mittal
Peter Henderson
AAML
137
23
0
10 Dec 2024
PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
Zachary Coalson
Jeonghyun Woo
Shiyang Chen
Yu Sun
Lishan Yang
Prashant J. Nair
Bo Fang
Sanghyun Hong
AAML
142
3
0
10 Dec 2024
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu
Tim Z. Xiao
Weiyang Liu
Yoshua Bengio
Dinghuai Zhang
283
6
0
10 Dec 2024
MVReward: Better Aligning and Evaluating Multi-View Diffusion Models
  with Human Preferences
MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences
Weitao Wang
Haoran Xu
Yuxiao Yang
Zhifang Liu
Jun Meng
Haoqian Wang
EGVM
128
3
0
09 Dec 2024
Bridging Conversational and Collaborative Signals for Conversational Recommendation
Bridging Conversational and Collaborative Signals for Conversational Recommendation
Ahmad Bin Rabiah
Nafis Sadeq
Julian McAuley
201
0
0
09 Dec 2024
Previous
123...353637...126127128
Next