ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.06259
17
5

Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

14 October 2019
David Stutz
Matthias Hein
Bernt Schiele
    AAML
ArXivPDFHTML
Abstract

Adversarial training yields robust models against a specific threat model, e.g., L∞L_\inftyL∞​ adversarial examples. Typically robustness does not generalize to previously unseen threat models, e.g., other LpL_pLp​ norms, or larger perturbations. Our confidence-calibrated adversarial training (CCAT) tackles this problem by biasing the model towards low confidence predictions on adversarial examples. By allowing to reject examples with low confidence, robustness generalizes beyond the threat model employed during training. CCAT, trained only on L∞L_\inftyL∞​ adversarial examples, increases robustness against larger L∞L_\inftyL∞​, L2L_2L2​, L1L_1L1​ and L0L_0L0​ attacks, adversarial frames, distal adversarial examples and corrupted examples and yields better clean accuracy compared to adversarial training. For thorough evaluation we developed novel white- and black-box attacks directly attacking CCAT by maximizing confidence. For each threat model, we use 777 attacks with up to 505050 restarts and 500050005000 iterations and report worst-case robust test error, extended to our confidence-thresholded setting, across all attacks.

View on arXiv
Comments on this paper