Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.03762
Cited By
Attention Is All You Need
12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Attention Is All You Need"
50 / 17,351 papers shown
Title
Bridge the Gap between Past and Future: Siamese Model Optimization for Context-Aware Document Ranking
Songhao Wu
Quan Tu
Mingjie Zhong
Hong Liu
Jia Xu
Jinjie Gu
Rui Yan
7
0
0
20 May 2025
Simplicity is Key: An Unsupervised Pretraining Approach for Sparse Radio Channels
Jonathan Ott
Maximilian Stahlke
Tobias Feigl
Bjoern M. Eskofier
Christopher Mutschler
2
0
0
19 May 2025
Power Allocation for Delay Optimization in Device-to-Device Networks: A Graph Reinforcement Learning Approach
Hao Fang
Kai Huang
Hao Ye
Chongtao Guo
Le Liang
Xiao Li
Shi Jin
9
0
0
19 May 2025
Any-to-Any Learning in Computational Pathology via Triplet Multimodal Pretraining
Qichen Sun
Zhengrui Guo
Rui Peng
Hao Chen
Jinzhuo Wang
MedIm
11
0
0
19 May 2025
Dual-Agent Reinforcement Learning for Automated Feature Generation
Wanfu Gao
Zengyao Man
Hanlin Pan
Kunpeng Liu
9
0
0
19 May 2025
R1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model
Ali Naseh
Harsh Chaudhari
Jaechul Roh
Mingshi Wu
Alina Oprea
Amir Houmansadr
AAML
ELM
12
0
0
19 May 2025
An Empirical Study of Many-to-Many Summarization with Large Language Models
Jiaan Wang
Fandong Meng
Zengkui Sun
Yunlong Liang
Yuxuan Cao
Jiarong Xu
Haoxiang Shi
Jie Zhou
17
0
0
19 May 2025
SPKLIP: Aligning Spike Video Streams with Natural Language
Yongchang Gao
Meiling Jin
Zhaofei Yu
Tiejun Huang
Guozhang Chen
CLIP
VLM
2
0
0
19 May 2025
CMLFormer: A Dual Decoder Transformer with Switching Point Learning for Code-Mixed Language Modeling
Aditeya Baral
Allen George Ajith
Roshan Nayak
Mrityunjay Abhijeet Bhanja
7
0
0
19 May 2025
Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can Produce
Haojin Wang
Zining Zhu
Freda Shi
12
0
0
18 May 2025
SEPT: Standard-Definition Map Enhanced Scene Perception and Topology Reasoning for Autonomous Driving
Muleilan Pei
Jiayao Shan
Peiliang Li
Jieqi Shi
Jing Huo
Yang Gao
Shaojie Shen
12
0
0
18 May 2025
PSC: Extending Context Window of Large Language Models via Phase Shift Calibration
Wenqiao Zhu
Chao Xu
Lulu Wang
Jun Wu
2
1
0
18 May 2025
GlobalGeoTree: A Multi-Granular Vision-Language Dataset for Global Tree Species Classification
Yang Mu
Zhitong Xiong
Yi Wang
Muhammad Shahzad
Franz Essl
Mark van Kleunen
Xiao Xiang Zhu
VLM
2
0
0
18 May 2025
Bridging Quantized Artificial Neural Networks and Neuromorphic Hardware
Zhenhui Chen
Haoran Xu
De Ma
2
0
0
18 May 2025
GMSA: Enhancing Context Compression via Group Merging and Layer Semantic Alignment
Jiwei Tang
Zhicheng Zhang
Shunlong Wu
Jingheng Ye
Lichen Bai
...
Tingwei Lu
Jiaqi Chen
Lin Hai
Hai-Tao Zheng
Hong-Gee Kim
7
0
0
18 May 2025
CompBench: Benchmarking Complex Instruction-guided Image Editing
Bohan Jia
Wenxuan Huang
Yuntian Tang
Junbo Qiao
Jincheng Liao
...
Lin Chen
Fei Zhao
Zihan Wang
Yuan Xie
Shaohui Lin
CoGe
12
0
0
18 May 2025
Model alignment using inter-modal bridges
Ali Gholamzadeh
Noor Sajid
2
0
0
18 May 2025
A Survey of Attacks on Large Language Models
Wenrui Xu
Keshab K. Parhi
AAML
ELM
14
0
0
18 May 2025
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
Hanlin Zhu
Shibo Hao
Zhiting Hu
Jiantao Jiao
Stuart Russell
Yuandong Tian
OffRL
LRM
9
0
0
18 May 2025
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference
Yichen Guo
Hanze Li
Zonghao Zhang
Jinhao You
Kai Tang
Xiande Huang
VLM
9
0
0
18 May 2025
Spotlight Your Instructions: Instruction-following with Dynamic Attention Steering
Praveen Venkateswaran
Danish Contractor
LLMSV
LRM
21
0
0
17 May 2025
GeoMaNO: Geometric Mamba Neural Operator for Partial Differential Equations
Xi Han
Jingwei Zhang
Dimitris Samaras
Fei Hou
Hong Qin
AI4CE
2
0
0
17 May 2025
Learning to Dissipate Energy in Oscillatory State-Space Models
Jared Boyer
T. Konstantin Rusch
Daniela Rus
9
0
0
17 May 2025
S-Crescendo: A Nested Transformer Weaving Framework for Scalable Nonlinear System in S-Domain Representation
Junlang Huang
Hao Chen
Li Luo
Yong Cai
Lexin Zhang
Tianhao Ma
Yitian Zhang
Zhong Guan
2
0
0
17 May 2025
SAINT: Attention-Based Modeling of Sub-Action Dependencies in Multi-Action Policies
Matthew Landers
Taylor W. Killian
Thomas Hartvigsen
Afsaneh Doryab
7
0
0
17 May 2025
MedVKAN: Efficient Feature Extraction with Mamba and KAN for Medical Image Segmentation
Hancan Zhu
Jinhao Chen
Guanghua He
Mamba
26
0
0
17 May 2025
Induction Head Toxicity Mechanistically Explains Repetition Curse in Large Language Models
Shuxun Wang
Qingyu Yin
Chak Tou Leong
Qiang Zhang
Linyi Yang
2
0
0
17 May 2025
Exploring the Potential of SSL Models for Sound Event Detection
Hanfang Cui
Longfei Song
Li Li
Dongxing Xu
Yanhua Long
7
0
0
17 May 2025
Diffmv: A Unified Diffusion Framework for Healthcare Predictions with Random Missing Views and View Laziness
Chuang Zhao
Hui Tang
Hongke Zhao
Xiaomeng Li
DiffM
MedIm
7
0
0
17 May 2025
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Josh Alman
Zhao Song
15
0
0
17 May 2025
STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking
Sicheng Shen
Dongcheng Zhao
Linghao Feng
Zeyang Yue
Jindong Li
Tenglong Li
Guobin Shen
Yi Zeng
24
0
0
16 May 2025
A
L
L
M
4
A
D
D
\mathcal{A}LLM4ADD
A
LL
M
4
A
DD
: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection
Hao Gu
Jiangyan Yi
Chenglong Wang
Jianhua Tao
Zheng Lian
Jiayi He
Yong Ren
Yujie Chen
Zhengqi Wen
12
0
0
16 May 2025
NoPE: The Counting Power of Transformers with No Positional Encodings
Chris Köcher
Alexander Kozachinskiy
Anthony Widjaja Lin
Marco Sälzer
Georg Zetzsche
12
0
0
16 May 2025
Probing Subphonemes in Morphology Models
Gal Astrach
Yuval Pinter
17
0
0
16 May 2025
Bi-directional Recurrence Improves Transformer in Partially Observable Markov Decision Processes
Ashok Arora
Neetesh Kumar
27
0
0
16 May 2025
Attention on the Sphere
Boris Bonev
Max Rietmann
Andrea Paris
Alberto Carpentieri
Thorsten Kurth
32
0
0
16 May 2025
Unveiling the Potential of Vision-Language-Action Models with Open-Ended Multimodal Instructions
Wei Zhao
Gongsheng Li
Zhefei Gong
Pengxiang Ding
Han Zhao
Donglin Wang
LM&Ro
22
0
0
16 May 2025
What Can We Learn From MIMO Graph Convolutions?
Andreas Roth
Thomas Liebig
17
0
0
16 May 2025
Concept Drift Guided LayerNorm Tuning for Efficient Multimodal Metaphor Identification
Wenhao Qian
Zhenzhen Hu
Zijie Song
Jia Li
17
0
0
16 May 2025
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei
Hang Wang
Bingbing Ni
22
0
0
16 May 2025
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis
Akarsh Kumar
Jeff Clune
Joel Lehman
Kenneth O. Stanley
OOD
21
0
0
16 May 2025
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
C. Jin
Ziheng Jiang
Zhihao Bai
Zheng Zhong
Jing Liu
...
Yanghua Peng
Xuanzhe Liu
Xuanzhe Liu
Xin Jin
Xin Liu
MoE
12
0
0
16 May 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing
Yi Su
Yuechi Zhou
Quantong Qiu
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
MQ
22
0
0
16 May 2025
LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
Danilo de Oliveira
Julius Richter
Tal Peer
Timo Germann
DiffM
22
0
0
16 May 2025
Modeling cognitive processes of natural reading with transformer-based Language Models
Bruno Bianchi
Fermín Travi
Juan E. Kamienkowski
17
0
0
16 May 2025
DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning
Weilai Xiang
Hongyu Yang
Di Huang
Yunhong Wang
24
0
0
16 May 2025
Distilled Circuits: A Mechanistic Study of Internal Restructuring in Knowledge Distillation
Reilly Haskins
Benjamin Adams
14
0
0
16 May 2025
A Set-Sequence Model for Time Series
Elliot L. Epstein
Apaar Sadhwani
Kay Giesecke
AI4TS
BDL
17
0
0
16 May 2025
Parkour in the Wild: Learning a General and Extensible Agile Locomotion Policy Using Multi-expert Distillation and RL Fine-tuning
Nikita Rudin
Junzhe He
Joshua Aurand
Marco Hutter
17
0
0
16 May 2025
Token-Level Uncertainty Estimation for Large Language Model Reasoning
Tunyu Zhang
Haizhou Shi
Yibin Wang
Hengyi Wang
Xiaoxiao He
...
Ligong Han
Kai Xu
Huatian Zhang
Dimitris N. Metaxas
Hao Wang
LRM
9
0
0
16 May 2025
1
2
3
4
...
346
347
348
Next