
v1v2 (latest)
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Papers citing "From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency"
50 / 58 papers shown
Title |
---|
![]() Qwen2 Technical Report An Yang Baosong Yang Binyuan Hui Jian Xu Bowen Yu ...Yuqiong Liu Zeyu Cui Zhenru Zhang Zhifang Guo Zhi-Wei Fan |