ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.09864
  4. Cited By
RoFormer: Enhanced Transformer with Rotary Position Embedding
v1v2v3v4v5 (latest)

RoFormer: Enhanced Transformer with Rotary Position Embedding

20 April 2021
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
ArXiv (abs)PDFHTML

Papers citing "RoFormer: Enhanced Transformer with Rotary Position Embedding"

50 / 250 papers shown
Title
Pyramidal Flow Matching for Efficient Video Generative Modeling
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin
Zhicheng Sun
Ningyuan Li
Kun Xu
K. Xu
...
Nan Zhuang
Quzhe Huang
Yang Song
Yadong Mu
Zhouchen Lin
VGen
140
84
0
08 Oct 2024
Think While You Generate: Discrete Diffusion with Planned Denoising
Think While You Generate: Discrete Diffusion with Planned Denoising
Sulin Liu
Juno Nam
Andrew Campbell
Hannes Stärk
Yilun Xu
Tommi Jaakkola
Rafael Gómez-Bombarelli
DiffM
132
11
0
08 Oct 2024
Differential Transformer
Differential Transformer
Tianzhu Ye
Li Dong
Yuqing Xia
Yutao Sun
Yi Zhu
Gao Huang
Furu Wei
470
0
0
07 Oct 2024
No Need to Talk: Asynchronous Mixture of Language Models
No Need to Talk: Asynchronous Mixture of Language Models
Anastasiia Filippova
Angelos Katharopoulos
David Grangier
Ronan Collobert
MoE
77
0
0
04 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia Wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLMMQ
142
38
0
03 Oct 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
163
47
0
03 Oct 2024
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Ulyana Piterbarg
Lerrel Pinto
Rob Fergus
SyDa
82
2
0
03 Oct 2024
Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection
Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection
Song Li
Yang Tan
Song Ke
Liang Hong
Bingxin Zhou
62
4
0
03 Oct 2024
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Yejin Lee
Anna Y. Sun
Basil Hosmer
Bilge Acun
Can Balioglu
...
Ram Pasunuru
Scott Yih
Sravya Popuri
Xing Liu
Carole-Jean Wu
118
2
0
30 Sep 2024
Enhancing elusive clues in knowledge learning by contrasting attention of language models
Enhancing elusive clues in knowledge learning by contrasting attention of language models
Jian Gao
Xiao Zhang
Ji Wu
Miao Li
103
0
0
26 Sep 2024
Zero-shot forecasting of chaotic systems
Zero-shot forecasting of chaotic systems
Yuanzhao Zhang
William Gilpin
AI4TS
238
8
0
24 Sep 2024
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Xiaoming Shi
Shiyu Wang
Yuqi Nie
Dianqi Li
Zhou Ye
Qingsong Wen
Ming Jin
AI4TS
112
54
0
24 Sep 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLMDiffM
158
15
0
23 Sep 2024
Towards LifeSpan Cognitive Systems
Towards LifeSpan Cognitive Systems
Yu Wang
Chi Han
Tongtong Wu
Xiaoxin He
Wangchunshu Zhou
...
Zexue He
Wei Wang
Gholamreza Haffari
Heng Ji
Julian McAuley
KELMCLL
459
2
0
20 Sep 2024
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
Yun Wang
Hangting Chen
Dongchao Yang
Zhiyong Wu
Xixin Wu
DiffM
95
2
0
19 Sep 2024
SOAP: Improving and Stabilizing Shampoo using Adam
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham Kakade
Sham Kakade
146
38
0
17 Sep 2024
MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping
MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping
Amirreza Fateh
Mohammad Reza Mohammadi
Mohammad Reza Jahed Motlagh
ViT
239
6
0
17 Sep 2024
Flash STU: Fast Spectral Transform Units
Flash STU: Fast Spectral Transform Units
Y. Isabel Liu
Windsor Nguyen
Yagiz Devre
Evan Dogariu
Anirudha Majumdar
Elad Hazan
AI4TS
126
1
0
16 Sep 2024
Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
Gentiana Rashiti
G. Karunaratne
Mrinmaya Sachan
Abu Sebastian
Abbas Rahimi
RALM
216
0
0
12 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
VLM
129
59
0
06 Sep 2024
Masked Mixers for Language Generation and Retrieval
Masked Mixers for Language Generation and Retrieval
Benjamin L. Badger
148
0
0
02 Sep 2024
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Jinghan Yao
Sam Ade Jacobs
Masahiro Tanaka
Olatunji Ruwase
Hari Subramoni
D. Panda
89
2
0
30 Aug 2024
Hierarchical Spatio-Temporal State-Space Modeling for fMRI Analysis
Hierarchical Spatio-Temporal State-Space Modeling for fMRI Analysis
Yuxiang Wei
A. Abrol
Reihaneh Hassanzadeh
Mamba
80
3
0
23 Aug 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
142
13
0
19 Aug 2024
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
...
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffMVGen
237
558
0
12 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Ping Luo
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
149
58
0
05 Aug 2024
A Bayesian Flow Network Framework for Chemistry Tasks
A Bayesian Flow Network Framework for Chemistry Tasks
Nianze Tao
Minori Abe
BDL
84
4
0
28 Jul 2024
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
u-μ\muμP: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
105
10
0
24 Jul 2024
dMel: Speech Tokenization made Simple
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
100
6
0
22 Jul 2024
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Peng Xu
Ming-Yu Liu
Xianchao Wu
Zihan Liu
Mohammad Shoeybi
Mohammad Shoeybi
Bryan Catanzaro
RALM
126
21
0
19 Jul 2024
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
Jerry Huang
106
7
0
11 Jul 2024
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Wentao Zhang
Junliang Guo
Tianyu He
Li Zhao
Linli Xu
Jiang Bian
100
4
0
10 Jul 2024
Deconstructing What Makes a Good Optimizer for Language Models
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
106
25
0
10 Jul 2024
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Jinghui Lu
Haiyang Yu
Yanjie Wang
Yongjie Ye
Jingqun Tang
...
Qi Liu
Hao Feng
Han Wang
Hao Liu
Can Huang
130
23
0
02 Jul 2024
Eliminating Position Bias of Language Models: A Mechanistic Approach
Eliminating Position Bias of Language Models: A Mechanistic Approach
Ziqi Wang
Hanlin Zhang
Xiner Li
Kuan-Hao Huang
Chi Han
Shuiwang Ji
Sham Kakade
Hao Peng
Heng Ji
125
19
0
01 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
127
26
0
27 Jun 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
178
19
0
20 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
169
69
0
11 Jun 2024
Evaluating Zero-Shot Long-Context LLM Compression
Evaluating Zero-Shot Long-Context LLM Compression
Chenyu Wang
Yihan Wang
Kai Li
88
0
0
10 Jun 2024
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Jingyang Ou
Shen Nie
Kaiwen Xue
Fengqi Zhu
Jiacheng Sun
Zhenguo Li
Chongxuan Li
DiffM
127
54
0
06 Jun 2024
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation
Houxing Ren
Mingjie Zhan
Zhongyuan Wu
Aojun Zhou
Junting Pan
Hongsheng Li
SyDa
101
7
0
27 May 2024
Glauber Generative Model: Discrete Diffusion Models via Binary Classification
Glauber Generative Model: Discrete Diffusion Models via Binary Classification
Harshit Varma
Dheeraj M. Nagaraj
Karthikeyan Shanmugam
VLM
149
3
0
27 May 2024
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
Jiayi Yao
Hanchen Li
Yuhan Liu
Siddhant Ray
Yihua Cheng
Qizheng Zhang
Kuntai Du
Shan Lu
Junchen Jiang
100
24
0
26 May 2024
Positional encoding is not the same as context: A study on positional encoding for sequential recommendation
Positional encoding is not the same as context: A study on positional encoding for sequential recommendation
Alejo López-Ávila
Jinhua Du
Abbas Shimary
Ze Li
83
3
0
16 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
193
333
0
16 May 2024
Memory Mosaics
Memory Mosaics
Jianyu Zhang
Niklas Nolte
Ranajoy Sadhukhan
Beidi Chen
Léon Bottou
VLM
116
4
0
10 May 2024
GP-MoLFormer: A Foundation Model For Molecular Generation
GP-MoLFormer: A Foundation Model For Molecular Generation
Jerret Ross
Brian M. Belgodere
Samuel C. Hoffman
Vijil Chenthamarakshan
Youssef Mroueh
Payel Das
Payel Das
81
6
0
04 Apr 2024
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
Cunxiang Wang
Ruoxi Ning
Boqi Pan
Tonghui Wu
Qipeng Guo
...
Guangsheng Bao
Xiangkun Hu
Zheng Zhang
Qian Wang
Yue Zhang
RALM
203
11
0
18 Mar 2024
MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models
MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models
Zunnan Xu
Yukang Lin
Haonan Han
Sicheng Yang
Ronghui Li
Yachao Zhang
Xiu Li
Mamba
119
26
0
14 Mar 2024
SPAFormer: Sequential 3D Part Assembly with Transformers
SPAFormer: Sequential 3D Part Assembly with Transformers
Boshen Xu
Sipeng Zheng
Qin Jin
93
2
0
09 Mar 2024
Previous
12345
Next