Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.03762
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
Attention Is All You Need
12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Attention Is All You Need"
50 / 27,180 papers shown
Title
SCFormer: Structured Channel-wise Transformer with Cumulative Historical State for Multivariate Time Series Forecasting
Shiwei Guo
Zheyu Chen
Yupeng Ma
Yunfei Han
Yi Wang
AI4TS
418
0
0
05 May 2025
EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning
Lingxiao Kong
Cong Yang
Susanne Neufang
Oya Beyan
Zeyd Boukhers
OffRL
106
0
0
05 May 2025
Bielik v3 Small: Technical Report
Krzysztof Ociepa
Łukasz Flis
Remigiusz Kinas
Krzysztof Wróbel
Adrian Gwoździej
104
0
0
05 May 2025
Bielik 11B v2 Technical Report
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
114
0
0
05 May 2025
Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models
Zhengliang Shi
Lingyong Yan
Weiwei Sun
Yue Feng
Pengjie Ren
Xinyu Ma
Shuaiqiang Wang
D. Yin
Maarten de Rijke
Zhaochun Ren
RALM
76
1
0
05 May 2025
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Qingkai Fang
Yan Zhou
Shoutao Guo
Shaolei Zhang
Yang Feng
AuLLM
99
4
0
05 May 2025
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
Sai Krishna Mendu
Harish Yenala
Aditi Gulati
Shanu Kumar
Parag Agrawal
122
1
0
04 May 2025
DNAZEN: Enhanced Gene Sequence Representations via Mixed Granularities of Coding Units
Lei Mao
Yuanhe Tian
Yan Song
30
0
0
04 May 2025
Leveraging LLM Agents and Digital Twins for Fault Handling in Process Plants
Milapji Singh Gill
Javal Vyas
Artan Markaj
Felix Gehlhoff
Mehmet Mercangöz
61
0
0
04 May 2025
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Jiancong Xiao
Bojian Hou
Zhanliang Wang
Ruochen Jin
Q. Long
Weijie Su
Li Shen
98
2
0
04 May 2025
An Empirical Study of Qwen3 Quantization
Xingyu Zheng
Yuye Li
Haoran Chu
Yue Feng
Xudong Ma
Jie Luo
Jinyang Guo
Haotong Qin
Michele Magno
Xianglong Liu
MQ
82
6
0
04 May 2025
Deep Representation Learning for Electronic Design Automation
Pratik Shrestha
Saran Phatharodom
Alec Aversa
David Blankenship
Zhengfeng Wu
Ioannis Savidis
134
0
0
04 May 2025
Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning
Can Küçüksözen
Yücel Yemez
OCL
169
0
0
04 May 2025
A Comprehensive Analysis of Adversarial Attacks against Spam Filters
Esra Hotoğlu
Sevil Sen
Burcu Can
AAML
62
0
0
04 May 2025
Learning Local Causal World Models with State Space Models and Attention
Francesco Petri
Luigi Asprino
Aldo Gangemi
CML
60
0
0
04 May 2025
CASA: CNN Autoencoder-based Score Attention for Efficient Multivariate Long-term Time-series Forecasting
Minhyuk Lee
HyeKyung Yoon
MyungJoo Kang
AI4TS
153
0
0
04 May 2025
Interpretable Emergent Language Using Inter-Agent Transformers
Mannan Bhardwaj
AI4CE
360
1
0
04 May 2025
Wide & Deep Learning for Node Classification
Yancheng Chen
Wenguo Yang
Zhipeng Jiang
GNN
110
0
0
04 May 2025
MC3D-AD: A Unified Geometry-aware Reconstruction Model for Multi-category 3D Anomaly Detection
Jiayi Cheng
C. Gao
Jie Zhou
J. Wen
Tao Dai
Jiadong Wang
73
0
0
04 May 2025
Always Skip Attention
Yiping Ji
Hemanth Saratchandran
Peyman Moghaddam
Simon Lucey
453
3
0
04 May 2025
Neural Orchestration for Multi-Agent Systems: A Deep Learning Framework for Optimal Agent Selection in Multi-Domain Task Environments
Kushagra Agrawal
Nisharg Nargund
AI4CE
31
0
0
03 May 2025
Co
3
^{3}
3
Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Xingqun Qi
Yatian Wang
Hengyuan Zhang
J. Pan
Wei Xue
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Yike Guo
SLR
123
0
0
03 May 2025
MISE: Meta-knowledge Inheritance for Social Media-Based Stressor Estimation
Xin Wang
Ling Feng
Huijun Zhang
Lei Cao
Kaisheng Zeng
Qi Li
Yang Ding
Yi Dai
David A. Clifton
114
0
0
03 May 2025
Intra-Layer Recurrence in Transformers for Language Modeling
Anthony Nguyen
Wenjun Lin
59
0
0
03 May 2025
Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement
Long Bai
Boyi Ma
Ruohan Wang
Guankun Wang
Beilei Cui
...
Mobarakol Islam
Zhe Min
Jiewen Lai
Nassir Navab
Hongliang Ren
130
0
0
03 May 2025
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
Congqi Cao
Lanshu Hu
Yating Yu
Y. Zhang
VLM
437
0
0
03 May 2025
Multi-Scale Graph Learning for Anti-Sparse Downscaling
Yingda Fan
Runlong Yu
Janet R. Barclay
A. Appling
Yiming Sun
Yiqun Xie
Xiaowei Jia
AI4CE
75
0
0
03 May 2025
Easz: An Agile Transformer-based Image Compression Framework for Resource-constrained IoTs
Yu Mao
Jingzong Li
Jun Wang
Hong Xu
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
82
0
0
03 May 2025
Positional Attention for Efficient BERT-Based Named Entity Recognition
Mo Sun
Siheng Xiong
Yuankai Cai
Bowen Zuo
29
0
0
03 May 2025
OODTE: A Differential Testing Engine for the ONNX Optimizer
Nikolaos Louloudakis
Ajitha Rajan
86
0
0
03 May 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
235
0
0
03 May 2025
Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings
Alexander Davis
Rafael Souza
Jia-Hao Lim
402
0
0
03 May 2025
High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers
Brian Wong
Kaito Tanaka
76
0
0
03 May 2025
Learning Multi-frame and Monocular Prior for Estimating Geometry in Dynamic Scenes
S. Park
Jinwoo Shin
121
1
0
03 May 2025
2DXformer: Dual Transformers for Wind Power Forecasting with Dual Exogenous Variables
Yuhui Zhang
Jiahai Jiang
Yule Yan
Liang Yang
Ping Zhang
AI4TS
58
0
0
02 May 2025
Enhancing User Sequence Modeling through Barlow Twins-based Self-Supervised Learning
Yuhan Liu
Lin Ning
Neo Wu
Karan Singhal
Philip Mansfield
D. Berlowitz
Sushant Prakash
Bradley Green
SSL
117
0
0
02 May 2025
Multimodal Transformers are Hierarchical Modal-wise Heterogeneous Graphs
Yijie Jin
Junjie Peng
Xuanchao Lin
Haochen Yuan
Lan Wang
Cangzhi Zheng
67
0
0
02 May 2025
3D Human Pose Estimation via Spatial Graph Order Attention and Temporal Body Aware Transformer
Kamel Aouaidjia
Aofan Li
Wenhao Zhang
Chongsheng Zhang
ViT
47
0
0
02 May 2025
GeloVec: Higher Dimensional Geometric Smoothing for Coherent Visual Feature Extraction in Image Segmentation
Boris Kriuk
Matey Yordanov
95
0
0
02 May 2025
PREMISE: Matching-based Prediction for Accurate Review Recommendation
Wei Han
Hui Chen
Soujanya Poria
88
0
0
02 May 2025
Multi-agents based User Values Mining for Recommendation
Lawrence Yunliang Chen
Wei Yuan
Tong Chen
Xiangyu Zhao
Nguyen Quoc Viet Hung
Hongzhi Yin
OffRL
125
0
0
02 May 2025
A Domain Adaptation of Large Language Models for Classifying Mechanical Assembly Components
Fatemeh Elhambakhsh
Daniele Grandi
Hyunwoong Ko
AI4CE
54
0
0
02 May 2025
CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature Confusion
Boyuan Meng
Xinming Zhang
Peilin Li
Zhe Wu
Yiming Li
Wenkai Zhao
B. Yu
Hui-Liang Shen
ViT
349
0
0
02 May 2025
AI agents may be worth the hype but not the resources (yet): An initial exploration of machine translation quality and costs in three language pairs in the legal and news domains
Vicent Briva-Iglesias
Gokhan Dogru
LLMAG
ELM
70
0
0
02 May 2025
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
119
0
0
02 May 2025
A Self-Supervised Transformer for Unusable Shared Bike Detection
Yin Huang
Yongqi Dong
Youhua Tang
Alvaro García Hernandez
81
0
0
02 May 2025
A Character-based Diffusion Embedding Algorithm for Enhancing the Generation Quality of Generative Linguistic Steganographic Texts
Yingquan Chen
Qianmu Li
Xiaocong Wu
Huifeng Li
Qing Chang
DiffM
112
0
0
02 May 2025
Enhancing SPARQL Query Rewriting for Complex Ontology Alignments
Anicet Lepetit Ondo
Laurence Capus
Mamadou Bousso
21
0
0
02 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
120
0
0
02 May 2025
How Effective are Large Time Series Models in Hydrology? A Study on Water Level Forecasting in Everglades
Rahuul Rangaraj
Jimeng Shi
Azam Shirali
Rajendra Paudel
Yanzhao Wu
Giri Narasimhan
107
1
0
02 May 2025
Previous
1
2
3
...
35
36
37
...
542
543
544
Next