
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding
Papers citing "Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding"
24 / 24 papers shown
Title |
---|
![]() A Simple Hash-Based Early Exiting Approach For Language Understanding
and Generation Tianxiang Sun Xiangyang Liu Wei-wei Zhu Zhichao Geng Lingling Wu Yilong He Yuan Ni Guotong Xie Xuanjing Huang Xipeng Qiu |