Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.03762
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
Attention Is All You Need
12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Attention Is All You Need"
50 / 2,193 papers shown
Title
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
469
5
0
10 Mar 2025
KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus
Xiaoming Shi
Zeming Liu
Chenkai Zhang
Yiming Lei
Haitao Leng
...
Qingjie Liu
Wanxiang Che
Shaoguo Liu
Size Li
Yanjie Wang
132
1
0
10 Mar 2025
FaceID-6M: A Large-Scale, Open-Source FaceID Customization Dataset
Shuhe Wang
Xiaoya Li
Jiwei Li
G. Wang
Xiaofei Sun
...
Han Qiu
Mo Yu
Shengjie Shen
Tianwei Zhang
Eduard H. Hovy
VLM
122
1
0
10 Mar 2025
Temporal Triplane Transformers as Occupancy World Models
Haoran Xu
Peixi Peng
Guang Tan
Yiqian Chang
Yisen Zhao
Yonghong Tian
158
1
0
10 Mar 2025
Green Prompting
Marta Adamska
Daria Smirnova
Hamid Nasiri
Zhengxin Yu
Peter Garraghan
515
1
0
09 Mar 2025
Semantic Wave Functions: Exploring Meaning in Large Language Models Through Quantum Formalism
Timo Aukusti Laine
70
1
0
09 Mar 2025
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Yingfeng Luo
Tong Zheng
Yongyu Mu
Yangqiu Song
Qinghong Zhang
...
Ziqiang Xu
Peinan Feng
Xiaoqian Liu
Tong Xiao
Jingbo Zhu
AI4CE
495
3
0
09 Mar 2025
Heterogeneous bimodal attention fusion for speech emotion recognition
Jiachen Luo
Huy Phan
Lin Wang
Joshua Reiss
115
0
0
09 Mar 2025
Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals
Hanze Li
Xiande Huang
126
0
0
09 Mar 2025
Future-Aware Interaction Network For Motion Forecasting
Shijie Li
Xun Xu
S. Yeo
Xulei Yang
Mamba
474
0
0
09 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
LRM
187
7
0
08 Mar 2025
Single Domain Generalization with Adversarial Memory
Hao Yan
Marzi Heidari
Yuhong Guo
384
0
0
08 Mar 2025
Bimodal Connection Attention Fusion for Speech Emotion Recognition
Jiachen Luo
Huy Phan
Lin Wang
Joshua D. Reiss
118
0
0
08 Mar 2025
A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval
Yu Zhang
Shutong Qiao
Jiaqi Zhang
Tzu-Heng Lin
Chen Gao
Yongqian Li
LM&Ro
LM&MA
227
3
0
07 Mar 2025
Unity RL Playground: A Versatile Reinforcement Learning Framework for Mobile Robots
Linqi Ye
Rankun Li
Xiaowen Hu
Jiayi Li
Boyang Xing
Yan Peng
Bin Liang
113
0
0
07 Mar 2025
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard
Hippolyte Gisserot-Boukhlef
Duarte M. Alves
André F. T. Martins
Ayoub Hammal
...
Maxime Peyrard
Nuno M. Guerreiro
Patrick Fernandes
Ricardo Rei
Pierre Colombo
502
3
0
07 Mar 2025
EDM: Efficient Deep Feature Matching
Xi Li
Tong Rao
Cihui Pan
77
0
0
07 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu Cheng
MoE
220
4
0
07 Mar 2025
Generative Trajectory Stitching through Diffusion Composition
Yunhao Luo
Utkarsh Aashu Mishra
Yilun Du
Danfei Xu
449
6
0
07 Mar 2025
Training and Inference Efficiency of Encoder-Decoder Speech Models
Piotr .Zelasko
Kunal Dhawan
Daniel Galvez
Krishna Puvvada
Ankita Pasad
Nithin Rao Koluguri
Ke Hu
Vitaly Lavrukhin
Jagadeesh Balam
Boris Ginsburg
96
1
0
07 Mar 2025
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Chan hur
Jeong-hun Hong
Dong-hun Lee
Dabin Kang
Semin Myeong
Sang-hyo Park
Hyeyoung Park
167
1
0
07 Mar 2025
KIEval: Evaluation Metric for Document Key Information Extraction
Minsoo Khang
Sang Chul Jung
Sungrae Park
Teakgyu Hong
94
0
0
07 Mar 2025
The Society of HiveMind: Multi-Agent Optimization of Foundation Model Swarms to Unlock the Potential of Collective Intelligence
Noah Mamie
Susie Xi Rao
LLMAG
AI4CE
121
1
0
07 Mar 2025
InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference
Tianyu Cui
Song-Jun Xu
Artem Moskalev
Shuwei Li
Tommaso Mansi
Mangal Prakash
Rui Liao
BDL
171
2
0
06 Mar 2025
TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster
Kanghui Ning
Zijie Pan
Yu Liu
Yushan Jiang
Junxuan Zhang
Kashif Rasul
Anderson Schneider
Lintao Ma
Yuriy Nevmyvaka
Dongjin Song
AI4TS
VLM
200
3
0
06 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
79
0
0
06 Mar 2025
Learning to Reduce Search Space for Generalizable Neural Routing Solver
Changliang Zhou
Xi Lin
Zhenkun Wang
Qingfu Zhang
266
2
0
05 Mar 2025
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions
Emmy Liu
Amanda Bertsch
Lintang Sutawika
Lindia Tjuatja
Patrick Fernandes
...
Siyang Song
Carolin (Haas) Lawrence
Aditi Raghunathan
Kiril Gashteovski
Graham Neubig
249
2
0
05 Mar 2025
Predicting Space Tourism Demand Using Explainable AI
Tan-Hanh Pham
Jingchen Bi
Rodrigo Mesa-Arangom
Kim-Doang Nguyen
100
0
0
05 Mar 2025
Can Frontier LLMs Replace Annotators in Biomedical Text Mining? Analyzing Challenges and Exploring Solutions
Yichong Zhao
Susumu Goto
91
0
0
05 Mar 2025
All-atom Diffusion Transformers: Unified generative modelling of molecules and materials
Chaitanya K. Joshi
Xiang Fu
Yi-Lun Liao
Vahe Gharakhanyan
Benjamin Kurt Miller
Anuroop Sriram
Zachary W. Ulissi
DiffM
179
9
0
05 Mar 2025
The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation
Jie He
Tao Wang
Deyi Xiong
Qun Liu
ELM
LRM
169
32
0
05 Mar 2025
Mocap-2-to-3: Lifting 2D Diffusion-Based Pretrained Models for 3D Motion Capture
Zhumei Wang
Zechen Hu
Ruoxi Guo
Huaijin Pi
Ziyong Feng
Sida Peng
Xiaowei Zhou
165
0
0
05 Mar 2025
STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks
Tianqing Zhang
Kairong Yu
Xian Zhong
Hongwei Wang
Qi Xu
Qiang Zhang
121
1
0
04 Mar 2025
PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset
Haider Asif
Abdul Basit
Nouhaila Innan
Muhammad Kashif
Alberto Marchisio
Muhammad Shafique
Muhammad Shafique
119
3
0
04 Mar 2025
Monocular visual simultaneous localization and mapping: (r)evolution from geometry to deep learning-based pipelines
Olaya Álvarez-Tunón
Yury Brodskiy
Erdal Kayacan
240
6
0
04 Mar 2025
Boltzmann Attention Sampling for Image Analysis with Small Objects
Theodore Zhao
Sid Kiblawi
Naoto Usuyama
Ho Hin Lee
Sam Preston
Hoifung Poon
Mu-Hsin Wei
MedIm
184
0
0
04 Mar 2025
CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework
Yanlong Xu
Haoxuan Qu
Qingbin Liu
Wenxiao Zhang
Xun Yang
401
0
0
04 Mar 2025
Creating Sorted Grid Layouts with Gradient-based Optimization
Kai Uwe Barthel
Florian Barthel
Peter Eisert
Nico Hezel
Konstantin Schall
178
1
0
04 Mar 2025
FourierNAT: A Fourier-Mixing-Based Non-Autoregressive Transformer for Parallel Sequence Generation
Andrew Kiruluta
Eric Lundy
Andreas Lemos
AI4TS
97
0
0
04 Mar 2025
Enabling AI Scientists to Recognize Innovation: A Domain-Agnostic Algorithm for Assessing Novelty
Yao Wang
Mingxuan Cui
Arthur Jiang
122
0
0
03 Mar 2025
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Anh Tong
Thanh Nguyen-Tang
Dongeun Lee
Duc Nguyen
Toan M. Tran
David Hall
Cheongwoong Kang
Jaesik Choi
133
1
0
03 Mar 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
Pankaj Choudhury
Yogesh Aggarwal
Prabhanjan Jadhav
Prithwijit Guha
Sukumar Nandi
180
0
0
03 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu Cheng
128
1
0
03 Mar 2025
CNN-based Robust Sound Source Localization with SRP-PHAT for the Extreme Edge
Jun Yin
Marian Verhelst
176
1
0
03 Mar 2025
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
487
0
0
03 Mar 2025
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
Ziyang Zhang
Yang Yu
Yucheng Chen
Xulei Yang
S. Yeo
MedIm
140
2
0
02 Mar 2025
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
Tianyi Wang
Jianan Fan
Dingxin Zhang
Dongnan Liu
Yong-quan Xia
Heng Huang
Weidong Cai
134
0
0
01 Mar 2025
Remasking Discrete Diffusion Models with Inference-Time Scaling
Guanghan Wang
Yair Schiff
Subham Sekhar Sahoo
Volodymyr Kuleshov
DiffM
140
16
0
01 Mar 2025
Jointly Understand Your Command and Intention:Reciprocal Co-Evolution between Scene-Aware 3D Human Motion Synthesis and Analysis
Xuehao Gao
Yang Yang
Shaoyi Du
Guo-Jun Qi
Junwei Han
119
1
0
01 Mar 2025
Previous
1
2
3
...
9
10
11
...
42
43
44
Next