ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.09585
  4. Cited By
LifeTox: Unveiling Implicit Toxicity in Life Advice

LifeTox: Unveiling Implicit Toxicity in Life Advice

16 November 2023
Minbeom Kim
Jahyun Koo
Hwanhee Lee
Joonsuk Park
Hwaran Lee
Kyomin Jung
ArXivPDFHTML

Papers citing "LifeTox: Unveiling Implicit Toxicity in Life Advice"

7 / 7 papers shown
Title
Drift: Decoding-time Personalized Alignments with Implicit User Preferences
Drift: Decoding-time Personalized Alignments with Implicit User Preferences
Minbeom Kim
Kang-il Lee
Seongho Joo
Hwaran Lee
Thibaut Thonet
Kyomin Jung
AI4TS
121
1
0
20 Feb 2025
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
Jahyun Koo
Yerin Hwang
Yongil Kim
Taegwan Kang
Hyunkyung Bae
Kyomin Jung
60
0
0
25 Oct 2024
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
Minbeom Kim
Hwanhee Lee
Joonsuk Park
Hwaran Lee
Kyomin Jung
40
1
0
18 Apr 2024
Can LLMs Recognize Toxicity? Definition-Based Toxicity Metric
Can LLMs Recognize Toxicity? Definition-Based Toxicity Metric
Hyukhun Koh
Dohyung Kim
Minwoo Lee
Kyomin Jung
29
3
0
10 Feb 2024
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
231
446
0
23 Aug 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
333
12,003
0
04 Mar 2022
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Mai Elsherief
Caleb Ziems
D. Muchlinski
Vaishnavi Anupindi
Jordyn Seybolt
M. D. Choudhury
Diyi Yang
106
237
0
11 Sep 2021
1