ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.10204
  4. Cited By
Shielded Representations: Protecting Sensitive Attributes Through
  Iterative Gradient-Based Projection

Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection

17 May 2023
Shadi Iskander
Kira Radinsky
Yonatan Belinkov
ArXivPDFHTML

Papers citing "Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection"

20 / 20 papers shown
Title
Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends
Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends
M. Tami
Mohammed Elhenawy
Huthaifa I. Ashqar
41
0
0
21 Apr 2025
BiasEdit: Debiasing Stereotyped Language Models via Model Editing
Xin Xu
Wei Xu
N. Zhang
Julian McAuley
KELM
44
0
0
11 Mar 2025
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Tom A. Lamb
Adam Davies
Alasdair Paren
Philip Torr
Francesco Pinto
49
0
0
30 Oct 2024
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Yu Zhao
Alessio Devoto
Giwon Hong
Xiaotang Du
Aryo Pradipta Gema
Hongru Wang
Xuanli He
Kam-Fai Wong
Pasquale Minervini
KELM
LLMSV
39
16
0
21 Oct 2024
AGR: Age Group fairness Reward for Bias Mitigation in LLMs
AGR: Age Group fairness Reward for Bias Mitigation in LLMs
Shuirong Cao
Ruoxi Cheng
Zhiqiang Wang
34
4
0
06 Sep 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical
  Grounding of Causal Interpretability
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
52
18
0
02 Aug 2024
Deconstructing The Ethics of Large Language Models from Long-standing
  Issues to New-emerging Dilemmas
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas
Chengyuan Deng
Yiqun Duan
Xin Jin
Heng Chang
Yijun Tian
...
Kuofeng Gao
Sihong He
Jun Zhuang
Lu Cheng
Haohan Wang
AILaw
43
16
0
08 Jun 2024
The Life Cycle of Large Language Models: A Review of Biases in Education
The Life Cycle of Large Language Models: A Review of Biases in Education
Jinsook Lee
Yann Hicke
Renzhe Yu
Christopher A. Brooks
René F. Kizilcec
AI4Ed
42
1
0
03 Jun 2024
Spectral Editing of Activations for Large Language Model Alignment
Spectral Editing of Activations for Large Language Model Alignment
Yifu Qiu
Zheng Zhao
Yftah Ziser
Anna Korhonen
E. Ponti
Shay B. Cohen
KELM
LLMSV
28
15
0
15 May 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
46
115
0
28 Mar 2024
Leveraging Prototypical Representations for Mitigating Social Bias
  without Demographic Information
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
Shadi Iskander
Kira Radinsky
Yonatan Belinkov
47
4
0
14 Mar 2024
On the Scaling Laws of Geographical Representation in Language Models
On the Scaling Laws of Geographical Representation in Language Models
Nathan Godey
Eric Villemonte de la Clergerie
Benoît Sagot
49
6
0
29 Feb 2024
MultiContrievers: Analysis of Dense Retrieval Representations
MultiContrievers: Analysis of Dense Retrieval Representations
Seraphina Goldfarb-Tarrant
Pedro Rodriguez
Jane Dwivedi-Yu
Patrick Lewis
28
1
0
24 Feb 2024
Tackling Bias in Pre-trained Language Models: Current Trends and
  Under-represented Societies
Tackling Bias in Pre-trained Language Models: Current Trends and Under-represented Societies
Vithya Yogarajan
Gillian Dobbie
Te Taka Keegan
R. Neuwirth
ALM
43
11
0
03 Dec 2023
Bias and Fairness in Large Language Models: A Survey
Bias and Fairness in Large Language Models: A Survey
Isabel O. Gallegos
Ryan A. Rossi
Joe Barrow
Md Mehrab Tanjim
Sungchul Kim
Franck Dernoncourt
Tong Yu
Ruiyi Zhang
Nesreen Ahmed
AILaw
26
490
0
02 Sep 2023
A Survey on Fairness in Large Language Models
A Survey on Fairness in Large Language Models
Yingji Li
Mengnan Du
Rui Song
Xin Wang
Ying Wang
ALM
52
59
0
20 Aug 2023
Linear Adversarial Concept Erasure
Linear Adversarial Concept Erasure
Shauli Ravfogel
Michael Twiton
Yoav Goldberg
Ryan Cotterell
KELM
81
57
0
28 Jan 2022
Debiasing Methods in Natural Language Understanding Make Bias More
  Accessible
Debiasing Methods in Natural Language Understanding Make Bias More Accessible
Michael J. Mendelson
Yonatan Belinkov
42
23
0
09 Sep 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
226
405
0
24 Feb 2021
Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation
Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation
Tianlu Wang
Xi Lin
Nazneen Rajani
Bryan McCann
Vicente Ordonez
Caimng Xiong
CVBM
157
54
0
03 May 2020
1