Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks

16 October 2024

Rudra Murthy

Praveen Venkateswaran

ArXiv (abs)PDF HTML Github (1★)

Main:9 Pages

45 Figures

Bibliography:3 Pages

11 Tables

Appendix:34 Pages

Abstract

LLM evaluation benchmarks have traditionally separated the testing of knowledge/reasoning capabilities from instruction following. In this work, we study the interaction between knowledge and instruction following, and observe that LLMs struggle to follow simple answer modifying instructions, and are also distracted by instructions that should have no bearing on the original knowledge task answer. We leverage existing multiple-choice answer based knowledge benchmarks and apply a set of simple instructions which include manipulating text (eg.: change case), numeric quantities (eg.: increase value, change formatting), operate on lists (eg.: sort answer candidates) and distractor instructions (eg.: change case of numeric answers).

View on arXiv

Comments on this paper