Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.09864
Cited By
v1
v2
v3
v4
v5 (latest)
RoFormer: Enhanced Transformer with Rotary Position Embedding
20 April 2021
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"RoFormer: Enhanced Transformer with Rotary Position Embedding"
50 / 250 papers shown
Title
Absolute Coordinates Make Motion Generation Easy
Zichong Meng
Zeyu Han
Xiaogang Peng
Yiming Xie
Huaizu Jiang
170
0
0
26 May 2025
FP4 All the Way: Fully Quantized Training of LLMs
Brian Chmiel
Maxim Fishman
Ron Banner
Daniel Soudry
MQ
80
0
0
25 May 2025
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
Fengqi Zhu
Rongzhen Wang
Shen Nie
Xiaolu Zhang
Chunwei Wu
...
Jun Zhou
Jianfei Chen
Yankai Lin
Ji-Rong Wen
Chongxuan Li
175
2
0
25 May 2025
Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking
Chen-Hao Chao
Wei-Fang Sun
Hanwen Liang
Chun-Yi Lee
Rahul G. Krishnan
DiffM
388
0
0
24 May 2025
Learning Generalized and Flexible Trajectory Models from Omni-Semantic Supervision
Yuanshao Zhu
James Jianqiao Yu
Xiangyu Zhao
Xiao Han
Qidong Liu
Xuetao Wei
Yuxuan Liang
77
0
0
23 May 2025
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
Zhihao Du
Changfeng Gao
Yuxuan Wang
Fan Yu
Tianyu Zhao
...
Mengzhe Chen
Yafeng Chen
Shiliang Zhang
Wen Wang
Jieping Ye
AuLLM
121
1
0
23 May 2025
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang
Xueyuan Lin
Chengjin Xu
Xuhui Jiang
Xiaojun Wu
Honghao Liu
Hui Xiong
Jian Guo
LRM
86
0
0
22 May 2025
SELF: Self-Extend the Context Length With Logistic Growth Function
Phat Thanh Dang
Saahil Thoppay
Wang Yang
Qifan Wang
Vipin Chaudhary
Xiaotian Han
85
0
0
22 May 2025
LLM-Based Emulation of the Radio Resource Control Layer: Towards AI-Native RAN Protocols
Ziming Liu
Bryan Liu
Alvaro Valcarce
Xiaoli Chu
240
1
0
22 May 2025
LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding
Junlong Tong
Jinlan Fu
Zixuan Lin
Yingqi Fan
Anhao Zhao
Hui Su
Xiaoyu Shen
86
0
0
22 May 2025
dKV-Cache: The Cache for Diffusion Language Models
Xinyin Ma
Runpeng Yu
Gongfan Fang
Xinchao Wang
DiffM
99
3
0
21 May 2025
Scale-invariant Attention
Ben Anson
Xi Wang
Laurence Aitchison
LRM
85
0
0
20 May 2025
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
Yu Zhang
Wenxiang Guo
Changhao Pan
Dongyu Yao
Zhiyuan Zhu
Ziyue Jiang
Yuhan Wang
Tao Jin
Zhou Zhao
VLM
90
0
0
20 May 2025
Systematic Generalization in Language Models Scales with Information Entropy
Sondre Wold
Lucas Georges Gabriel Charpentier
Étienne Simon
201
0
0
19 May 2025
Panda: A pretrained forecast model for universal representation of chaotic dynamics
Jeffrey Lai
Anthony Bao
William Gilpin
AI4TS
AI4CE
93
0
0
19 May 2025
A3 : an Analytical Low-Rank Approximation Framework for Attention
Jeffrey T. H. Wong
Cheng Zhang
Xinye Cao
Pedro Gimenes
George A. Constantinides
Wayne Luk
Yiren Zhao
OffRL
MQ
89
1
0
19 May 2025
PSC: Extending Context Window of Large Language Models via Phase Shift Calibration
Wenqiao Zhu
Chao Xu
Lulu Wang
Jun Wu
91
1
0
18 May 2025
ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention
Jintian Shao
Hongyi Huang
Hongyi Huang
Beiwen Zhang
ZhiYu Wu
You Shan
MingKai Zheng
140
0
0
15 May 2025
xGen-small Technical Report
Erik Nijkamp
Bo Pang
Egor Pakhomov
Akash Gokul
Jin Qu
Silvio Savarese
Yingbo Zhou
Caiming Xiong
LLMAG
142
0
0
10 May 2025
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions
Dhruvesh Patel
Aishwarya Sahoo
Avinash Amballa
Tahira Naseem
Tim G. J. Rudner
Andrew McCallum
KELM
109
0
0
09 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffM
VGen
158
6
0
07 May 2025
Bielik v3 Small: Technical Report
Krzysztof Ociepa
Łukasz Flis
Remigiusz Kinas
Krzysztof Wróbel
Adrian Gwoździej
81
0
0
05 May 2025
Bielik 11B v2 Technical Report
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
77
0
0
05 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
172
2
0
05 May 2025
FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension
Jushi Kai
Boyi Zeng
Yansen Wang
Haoli Bai
Ziwei He
Bo Jiang
Zhouhan Lin
98
0
0
01 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
237
2
0
01 May 2025
Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline
Minwoo Oh
Minsu Park
Eunil Park
192
0
0
30 Apr 2025
Versatile Framework for Song Generation with Prompt-based Control
Yanzhe Zhang
Wenxiang Guo
Changhao Pan
Zehan Zhu
Ruiqi Li
...
Rongjie Huang
Ruiyuan Zhang
Zhiqing Hong
Ziyue Jiang
Zhou Zhao
171
2
0
27 Apr 2025
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
Zehao Wang
Senthil Purushwalkam
Caiming Xiong
Siyang Song
Chenhui Xu
Ran Xu
155
2
0
23 Apr 2025
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?
Rahul Thapa
Andrew Li
Qingyang Wu
Bryan He
Yuki Sahashi
...
Angela Zhang
Ben Athiwaratkun
Shuaiwen Leon Song
David Ouyang
James Zou
LM&MA
167
0
0
19 Apr 2025
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models
Xinlin Zhuang
Jiahui Peng
Ren Ma
Yucheng Wang
Tianyi Bai
Xingjian Wei
Jiantao Qiu
Chi Zhang
Ying Qian
Conghui He
124
0
0
19 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
307
8
0
17 Apr 2025
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Ashwinee Panda
Vatsal Baherwani
Zain Sarwar
Benjamin Thérien
Supriyo Chakraborty
Tom Goldstein
MoE
110
0
0
16 Apr 2025
Neural Encoding and Decoding at Scale
Yizi Zhang
Yanchen Wang
Mehdi Azabou
Alexandre Andre
Zixuan Wang
Hanrui Lyu
International Brain Laboratory
Eva L. Dyer
Liam Paninski
Cole Hurwitz
AI4CE
116
1
0
11 Apr 2025
RETROcode: Leveraging a Code Database for Improved Natural Language to Code Generation
Nathanael Beau
Benoît Crabbé
84
0
0
08 Apr 2025
Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts
Yifei Yu
Qian Zhang
Lingfeng Qiao
Di Yin
Fang Li
Jie Wang
Zheyu Chen
Suncong Zheng
Xiaolong Liang
Xingwu Sun
83
0
0
07 Apr 2025
Leveraging State Space Models in Long Range Genomics
Matvei Popov
Aymen Kallala
Anirudha Ramesh
Narimane Hennouni
Shivesh Khaitan
Rick Gentry
Alain-Sam Cohen
Mamba
115
0
0
07 Apr 2025
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
Cheng Chen
Jiacheng Wei
Tianrun Chen
Chi Zhang
Xiaofeng Yang
...
Bingchen Yang
Chuan-Sheng Foo
Guosheng Lin
Qixing Huang
Fayao Liu
86
4
0
07 Apr 2025
Spline-based Transformers
Prashanth Chandran
Agon Serifi
Markus Gross
Moritz Bächer
150
0
0
03 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
157
0
0
02 Apr 2025
Urban Computing in the Era of Large Language Models
Zhonghang Li
Lianghao Xia
Xubin Ren
J. Tang
Tianyi Chen
Yong-mei Xu
Chenyu Huang
203
0
0
02 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
111
0
0
29 Mar 2025
IgCraft: A versatile sequence generation framework for antibody discovery and engineering
Matthew Greenig
Haowen Zhao
Vladimir Radenkovic
Aubin Ramon
Pietro Sormanni
126
2
0
25 Mar 2025
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu
Weijia Mao
Mike Zheng Shou
VGen
163
10
0
25 Mar 2025
Adaptive Machine Learning for Resource-Constrained Environments
Sebastián A. Cajas Ordóñez
Jaydeep Samanta
Andrés L. Suárez-Cetrulo
Ricardo Simón Carbajo
173
0
0
24 Mar 2025
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Yuxiao Chen
L. Meng
Wujian Peng
Zuxuan Wu
Yu-Gang Jiang
VLM
195
1
0
24 Mar 2025
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
Jiawei Wang
Kai Hu
Qiang Huo
101
0
0
20 Mar 2025
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
Lixing Xiao
Shunlin Lu
Huaijin Pi
Ke Fan
Liang Pan
Yueer Zhou
Ziyong Feng
Xiaowei Zhou
Sida Peng
Jingbo Wang
DiffM
VGen
97
7
0
19 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
Jun Wang
Jun Wang
409
0
0
15 Mar 2025
Direction-Aware Diagonal Autoregressive Image Generation
Yijia Xu
Jianzhong Ju
Jian Luan
J. Cui
157
0
0
14 Mar 2025
1
2
3
4
5
Next