ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.10871
  4. Cited By
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents

Applying Refusal-Vector Ablation to Llama 3.1 70B Agents

8 October 2024
Simon Lermen
Mateusz Dziemian
Govind Pimpale
    LLMAG
ArXiv (abs)PDFHTML

Papers citing "Applying Refusal-Vector Ablation to Llama 3.1 70B Agents"

4 / 4 papers shown
Title
Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems
Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems
Simon Lermen
Mateusz Dziemian
Natalia Pérez-Campanero Antolín
109
0
0
10 Apr 2025
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
Pierre Peigne-Lefebvre
Mikolaj Kniejski
Filip Sondej
Matthieu David
J. Hoelscher-Obermaier
Christian Schroeder de Witt
Esben Kran
120
7
0
26 Feb 2025
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Blake Bullwinkel
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
160
18
0
18 Nov 2024
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Richard Fang
Antony Kellermann
Akul Gupta
Qiusi Zhan
Richard Fang
R. Bindu
Daniel Kang
LLMAG
103
36
0
02 Jun 2024
1