Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models

1 June 2025

Boheng Sheng

Main:9 Pages

13 Figures

Bibliography:3 Pages

16 Tables

Appendix:8 Pages

Abstract

Large language models (LLMs) often struggle to accurately read and comprehend extremely long texts. Current methods for improvement typically rely on splitting long contexts into fixed-length chunks. However, fixed truncation risks separating semantically relevant content, leading to ambiguity and compromising accurate understanding. To overcome this limitation, we propose a straightforward approach for dynamically separating and selecting chunks of long context, facilitating a more streamlined input for LLMs. In particular, we compute semantic similarities between adjacent sentences, using lower similarities to adaptively divide long contexts into variable-length chunks. We further train a question-aware classifier to select sensitive chunks that are critical for answering specific questions. Experimental results on both single-hop and multi-hop question-answering benchmarks show that the proposed approach consistently outperforms strong baselines. Notably, it maintains robustness across a wide range of input lengths, handling sequences of up to 256k tokens. Our datasets and code are available at the following link: this https URL

View on arXiv

@article{sheng2025_2506.00773,
  title={ Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models },
  author={ Boheng Sheng and Jiacheng Yao and Meicong Zhang and Guoxiu He },
  journal={arXiv preprint arXiv:2506.00773},
  year={ 2025 }
}

Comments on this paper