Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1903.00161
Cited By
v1
v2 (latest)
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
1 March 2019
Dheeru Dua
Yizhong Wang
Pradeep Dasigi
Gabriel Stanovsky
Sameer Singh
Matt Gardner
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs"
50 / 376 papers shown
Title
Context-Informed Grounding Supervision
Hyunji Lee
Seunghyun Yoon
Yunjae Won
Hanseok Oh
Geewook Kim
Trung H. Bui
Franck Dernoncourt
Elias Stengel-Eskin
Mohit Bansal
Minjoon Seo
LRM
43
0
0
18 Jun 2025
Model Merging for Knowledge Editing
Zichuan Fu
Xian Wu
Guojing Li
Yingying Zhang
Yefeng Zheng
Tianshi Ming
Y. X. R. Wang
Wanyu Wang
Xiangyu Zhao
KELM
MoMe
CLL
27
0
0
14 Jun 2025
DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts
Yuchen Feng
Bowen Shen
Naibin Gu
Jiaxuan Zhao
Peng Fu
Zheng Lin
Weiping Wang
MoMe
MoE
61
0
0
11 Jun 2025
dots.llm1 Technical Report
Bi Huo
Bin Tu
Cheng Qin
Da Zheng
Debing Zhang
...
Yuqiu Ji
Ze Wen
Zhenhai Liu
Zichao Li
Zilong Liao
MoE
61
0
0
06 Jun 2025
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
J. Oswald
Nino Scherrer
Seijin Kobayashi
Luca Versari
Songlin Yang
...
Guillaume Lajoie
Charlotte Frenkel
Razvan Pascanu
Blaise Agüera y Arcas
João Sacramento
106
1
0
05 Jun 2025
MiMo-VL Technical Report
Xiaomi LLM-Core Team
Zihao Yue
Zhenru Lin
Yifan Song
Weikun Wang
...
Di Zhang
Chong Ma
Chang Liu
Can Cai
Bingquan Xia
OffRL
MoE
VLM
LRM
93
0
0
04 Jun 2025
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
Shihan Dou
Ming Zhang
Chenhao Huang
Jiayi Chen
F. Chen
...
Wei Chengzhi
Lin Yan
Qi Zhang
Xuanjing Huang
Xuanjing Huang
ELM
92
0
0
03 Jun 2025
ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge
Chaoyue He
Xin Zhou
Y. Wu
Xinjia Yu
Yan Zhang
...
Shengfei Lyu
Hong Xu
X. Wang
Wei Liu
Chunyan Miao
ELM
60
0
0
02 Jun 2025
Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D
Artemis Panagopoulou
Le Xue
Honglu Zhou
Silvio Savarese
Ran Xu
Caiming Xiong
Chris Callison-Burch
Mark Yatskar
Juan Carlos Niebles
59
0
0
02 Jun 2025
PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
Atharva Naik
Darsh Agrawal
Manav Kapadnis
Yuwei An
Yash Mathur
Carolyn Rose
David R. Mortensen
LRM
ELM
65
0
0
29 May 2025
Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
Paramita Mirza
Lucas Weber
Fabian Küch
51
0
0
28 May 2025
What Has Been Lost with Synthetic Evaluation?
Alexander Gill
Abhilasha Ravichander
Ana Marasović
ELM
36
0
0
28 May 2025
STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models
Kai Chen
Zihao He
Taiwei Shi
Kristina Lerman
ALM
LLMSV
104
0
0
27 May 2025
Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding
Patara Trirat
Wonyong Jeong
Sung Ju Hwang
94
0
0
26 May 2025
Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models
Lachlan McGinness
Peter Baumgartner
ReLM
LRM
ELM
80
1
0
26 May 2025
KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning
Zhendong Mi
Qitao Tan
Xiaodong Yu
Zining Zhu
Geng Yuan
Shaoyi Huang
206
0
0
24 May 2025
Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection
Shrey Pandit
Ashwin Vinod
Liu Leqi
Ying Ding
HILM
77
0
0
23 May 2025
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Tencent Hunyuan Team
Ao Liu
Botong Zhou
Can Xu
Chayse Zhou
...
Bingxin Qu
Bolin Ni
Boyu Wu
Chen Li
Cheng-peng Jiang
MoE
LRM
AI4CE
163
0
0
21 May 2025
Social Bias in Popular Question-Answering Benchmarks
Angelie Kraft
Judith Simon
Sonja Schimmler
120
0
0
21 May 2025
FlashThink: An Early Exit Method For Efficient Reasoning
Guochao Jiang
Guofeng Quan
Zepeng Ding
Ziqin Luo
Dixuan Wang
Zheng Hu
ReLM
LRM
74
2
0
20 May 2025
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
Yanggan Gu
Zhaoyi Yan
Yuanyi Wang
Yiming Zhang
Qi Zhou
Leilei Gan
Hongxia Yang
74
0
0
20 May 2025
Shadow-FT: Tuning Instruct via Base
Taiqiang Wu
Runming Yang
Jiayi Li
Pengfei Hu
Ngai Wong
Yujiu Yang
248
0
0
19 May 2025
A Systematic Analysis of Base Model Choice for Reward Modeling
Kian Ahrabian
Pegah Jandaghi
Negar Mokhberian
Sai Praneeth Karimireddy
Jay Pujara
136
0
0
16 May 2025
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Kai Hua
Steven Wu
Ge Zhang
Ke Shen
LRM
85
0
0
12 May 2025
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Xiaomi LLM-Core Team
Bingquan Xia
Bo Shen
Cici
Dawei Zhu
...
Yun Wang
Yue Yu
Zhenru Lin
Zhichao Song
Zihao Yue
MoE
ReLM
LRM
AI4CE
176
7
0
12 May 2025
HalluLens: LLM Hallucination Benchmark
Yejin Bang
Ziwei Ji
Alan Schelten
Anthony Hartshorn
Tara Fowler
Cheng Zhang
Nicola Cancedda
Pascale Fung
HILM
132
5
0
24 Apr 2025
A Self-Improving Coding Agent
Maxime Robeyns
Martin Szummer
Laurence Aitchison
LLMAG
146
1
0
21 Apr 2025
ViQA-COVID: COVID-19 Machine Reading Comprehension Dataset for Vietnamese
H. Phung
Ngoc C. Lê
Van-Chien Nguyen
Hang Thi Nguyen
Thuy Phuong Thi Nguyen
224
2
0
21 Apr 2025
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
346
0
0
21 Apr 2025
D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Model
Grace Byun
Jinho D. Choi
EGVM
87
0
0
18 Apr 2025
LLM-as-a-Judge: Reassessing the Performance of LLMs in Extractive QA
Xanh Ho
Jiahao Huang
Florian Boudin
Akiko Aizawa
ELM
147
0
0
16 Apr 2025
DebFlow: Automating Agent Creation via Agent Debate
Jinwei Su
Yinghui Xia
Ronghua Shi
Jianhui Wang
Jianuo Huang
Yansen Wang
Tianyu Shi
Yang Jingsong
Lewei He
94
1
0
31 Mar 2025
MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning
Jiancheng Zhao
Xingda Yu
Zhen Yang
MoE
91
3
0
27 Mar 2025
The Amazon Nova Family of Models: Technical Report and Model Card
Amazon AGI
Aaron Langford
A. Shah
Abhanshu Gupta
Abhimanyu Bhatter
...
Benjamin Biggs
Benjamin Ott
Bhanu Vinzamuri
Bharath Venkatesh
Bhavana Ganesh
30
21
0
17 Mar 2025
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
157
10
0
17 Mar 2025
A Survey on Federated Fine-tuning of Large Language Models
Yebo Wu
Chunlin Tian
Jingguang Li
He Sun
Kahou Tam
Zhanting Zhou
Haicheng Liao
Zhijiang Guo
Li Li
Chengzhong Xu
FedML
158
5
0
15 Mar 2025
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
Hao Cui
Zahra Shamsi
Gowoon Cheon
Xuejian Ma
Shutong Li
...
Eun-Ah Kim
M. Brenner
Viren Jain
Sameera Ponda
Subhashini Venugopalan
ELM
LRM
146
4
0
14 Mar 2025
Accurate INT8 Training Through Dynamic Block-Level Fallback
Pengle Zhang
Jia Wei
Jintao Zhang
Jun-Jie Zhu
Jianfei Chen
MQ
173
9
0
11 Mar 2025
Development and Enhancement of Text-to-Image Diffusion Models
Rajdeep Roshan Sahu
VLM
162
44
0
07 Mar 2025
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong
Kamilė Stankevičiūtė
Chao-gang Wan
Anmol Kabra
Raphael Thesmar
Johann Lee
Julius Klenke
Carla P. Gomes
Kilian Q. Weinberger
LRM
RALM
119
0
0
27 Feb 2025
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
Jake Poznanski
Aman Rangapur
Jon Borchardt
Jason Dunkelberger
Regan Huff
Daniel Lin
Aman Rangapur
Christopher Wilhelm
Kyle Lo
Luca Soldaini
174
7
0
25 Feb 2025
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Matthew Barker
Andrew Bell
Evan Thomas
James Carr
Thomas Andrews
Umang Bhatt
167
2
0
25 Feb 2025
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Chengyin Xu
Kaiyuan Chen
Xiao Li
Ke Shen
Chenggang Li
OffRL
191
2
0
24 Feb 2025
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi
Nishanth Dikkala
Zhiyuan Li
Sanjiv Kumar
Sashank J. Reddi
OffRL
LRM
AI4CE
148
22
0
24 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
156
1
0
24 Feb 2025
SPEX: Scaling Feature Interaction Explanations for LLMs
Justin Singh Kang
Landon Butler
Abhineet Agarwal
Yigit Efe Erginbas
Ramtin Pedarsani
Kannan Ramchandran
Bin Yu
VLM
LRM
172
2
0
20 Feb 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu Cheng
KELM
162
5
0
19 Feb 2025
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
Zican Dong
Junyi Li
Jinhao Jiang
Mingyu Xu
Wayne Xin Zhao
Bin Wang
Xin Wu
VLM
378
5
0
11 Feb 2025
Aligning Black-box Language Models with Human Judgments
Gerrit J. J. van den Burg
Gen Suzuki
Wei Liu
Murat Sensoy
ALM
146
0
0
07 Feb 2025
Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis
Lin Yuan
Jun Xu
Honghao Gui
Mengshu Sun
Qing Cui
Lei Liang
Jun Zhou
AI4CE
457
0
0
06 Feb 2025
1
2
3
4
5
6
7
8
Next