ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.15473
  4. Cited By
Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A
  Case-Study in E-Commerce Opinion Summarization
v1v2 (latest)

Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization

23 February 2024
Swaroop Nath
Tejpalsingh Siledar
Sankara Sri Raghava Ravindra Muddu
Rupasai Rangaraju
H. Khadilkar
Pushpak Bhattacharyya
Suman Banerjee
Amey Patil
Sudhanshu Singh
M. Chelliah
Nikesh Garera
ArXiv (abs)PDFHTML

Papers citing "Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization"

13 / 13 papers shown
Title
Zephyr: Direct Distillation of LM Alignment
Zephyr: Direct Distillation of LM Alignment
Lewis Tunstall
E. Beeching
Nathan Lambert
Nazneen Rajani
Kashif Rasul
...
Nathan Habib
Nathan Sarrazin
Omar Sanseviero
Alexander M. Rush
Thomas Wolf
ALM
107
398
0
25 Oct 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
389
4,163
0
29 May 2023
Aligning Large Language Models through Synthetic Feedback
Aligning Large Language Models through Synthetic Feedback
Sungdong Kim
Sanghwan Bae
Jamin Shin
Soyoung Kang
Donghyun Kwak
Kang Min Yoo
Minjoon Seo
ALMSyDa
126
70
0
23 May 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
886
13,207
0
04 Mar 2022
A General Language Assistant as a Laboratory for Alignment
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
120
790
0
01 Dec 2021
Can Machines Learn Morality? The Delphi Experiment
Can Machines Learn Morality? The Delphi Experiment
Liwei Jiang
Jena D. Hwang
Chandra Bhagavatula
Ronan Le Bras
Jenny T Liang
...
Yulia Tsvetkov
Oren Etzioni
Maarten Sap
Regina A. Rini
Yejin Choi
FaML
192
122
0
14 Oct 2021
Aspect-Controllable Opinion Summarization
Aspect-Controllable Opinion Summarization
Reinald Kim Amplayo
Stefanos Angelidis
Mirella Lapata
53
74
0
07 Sep 2021
Self-Supervised Multimodal Opinion Summarization
Self-Supervised Multimodal Opinion Summarization
Jinbae Im
Moonki Kim
Hoyeop Lee
Hyunsouk Cho
Sehee Chung
43
36
0
27 May 2021
Unsupervised Opinion Summarization with Noising and Denoising
Unsupervised Opinion Summarization with Noising and Denoising
Reinald Kim Amplayo
Mirella Lapata
32
71
0
21 Apr 2020
Unsupervised Opinion Summarization as Copycat-Review Generation
Unsupervised Opinion Summarization as Copycat-Review Generation
Arthur Brazinskas
Mirella Lapata
Ivan Titov
55
130
0
06 Nov 2019
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
485
1,768
0
18 Sep 2019
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
547
19,296
0
20 Jul 2017
A Neural Attention Model for Abstractive Sentence Summarization
A Neural Attention Model for Abstractive Sentence Summarization
Alexander M. Rush
S. Chopra
Jason Weston
CVBM
186
2,702
0
02 Sep 2015
1