Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.19999
Cited By
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models
28 June 2024
Xinyi Chen
Baohao Liao
Jirui Qi
Panagiotis Eustratiadis
Christof Monz
Arianna Bisazza
Maarten de Rijke
ALM
ELM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models"
11 / 11 papers shown
Title
LookAlike: Consistent Distractor Generation in Math MCQs
Nisarg Parikh
Nigel Fernandez
Alexander Scarlatos
Simon Woodhead
Andrew S. Lan
53
0
0
03 May 2025
Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary
Yakai Li
Jiekang Hu
Weiduan Sang
Luping Ma
Jing Xie
Weijuan Zhang
Aimin Yu
Shijie Zhao
Qingjia Huang
Qihang Zhou
AAML
52
0
0
28 Apr 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLM
ELM
LRM
139
0
0
28 Mar 2025
RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
Jianhao Yan
Yun Luo
Yue Zhang
LLMAG
62
1
0
25 Feb 2025
MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark
Elliot L. Epstein
Kaisheng Yao
Jing Li
Xinyi Bai
Hamid Palangi
LRM
47
0
0
26 Sep 2024
Fine-tuning Large Language Models with Sequential Instructions
Hanxu Hu
Simon Yu
Pinzhen Chen
E. Ponti
ALM
LRM
78
15
0
12 Mar 2024
Can Large Language Models Understand Real-World Complex Instructions?
Qi He
Jie Zeng
Wenhao Huang
Lina Chen
Jin Xiao
...
Shisong Chen
Yikai Zhang
Zhouhong Gu
Jiaqing Liang
Yanghua Xiao
ALM
LRM
ELM
98
52
0
17 Sep 2023
Controlled Text Generation with Natural Language Instructions
Wangchunshu Zhou
Yuchen Eleanor Jiang
Ethan Gotlieb Wilcox
Ryan Cotterell
Mrinmaya Sachan
160
84
0
27 Apr 2023
Large Language Models are Diverse Role-Players for Summarization Evaluation
Ning Wu
Ming Gong
Linjun Shou
Shining Liang
Daxin Jiang
59
44
0
27 Mar 2023
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
213
1,657
0
15 Oct 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
250
677
0
06 Jan 2021
1