Pre: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation
- AI4CE

Extensive LLM applications demand efficient structured generations, particularly for LR(1) grammars, to produce outputs in specified formats (e.g., JSON). Existing methods primarily parse LR(1) grammars into a pushdown automaton (PDA), leading to runtime execution overhead for context-dependent token processing, especially inefficient under large inference batches. To address these issues, we propose Pre that exploits deterministic pushdown automata (DPDA) to optimize the constrained LLM decoding efficiency. First, by precomputing prefix-conditioned edges during the preprocessing, Pre enables ahead-of-time edge analysis and thus makes parallel transition processing possible. Second, by leveraging the prefix-conditioned edges, Pre introduces a novel approach that transforms LR(1) transition graphs into DPDA, eliminating the need for runtime path exploration and achieving edge transitions with minimal overhead. Pre can be seamlessly integrated into standard LLM inference frameworks, reducing time per output token (TPOT) by up to 40% and increasing throughput by up to 36% in our experiments. Our code is available atthis https URL.
View on arXiv@article{chen2025_2506.03887, title={ Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation }, author={ Junyi Chen and Shihao Bai and Zaijun Wang and Siyu Wu and Chuheng Du and Hailong Yang and Ruihao Gong and Shengzhong Liu and Fan Wu and Guihai Chen }, journal={arXiv preprint arXiv:2506.03887}, year={ 2025 } }