BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models

BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models

Papers citing "BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models"