Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.03762
Cited By
Attention Is All You Need
12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Attention Is All You Need"
50 / 18,577 papers shown
Title
Detecting Concept Drift in Neural Networks Using Chi-squared Goodness of Fit Testing
Jacob Glenn Ayers
Buvaneswari A. Ramanan
Manzoor A. Khan
32
0
0
07 May 2025
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Chetan Pathade
AAML
SILM
59
1
0
07 May 2025
DiffPattern-Flex: Efficient Layout Pattern Generation via Discrete Diffusion
Zixiao Wang
Wenqian Zhao
Yunheng Shen
Yang Bai
Guojin Chen
Farzan Farnia
Bei Yu
33
0
0
07 May 2025
Physics-inspired Energy Transition Neural Network for Sequence Learning
Zhou Wu
Junyi An
Baile Xu
Furao Shen
Jian Zhao
PINN
27
0
0
06 May 2025
Latent Adaptive Planner for Dynamic Manipulation
Donghun Noh
Deqian Kong
Minglu Zhao
Andrew Lizarraga
Jianwen Xie
Ying Nian Wu
Dennis W. Hong
202
0
0
06 May 2025
Action Spotting and Precise Event Detection in Sports: Datasets, Methods, and Challenges
Hao Xu
Arbind Agrahari Baniya
Sam Well
Mohamed Reda Bouadjenek
Richard Dazeley
S. Aryal
AI4TS
29
0
0
06 May 2025
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
S. Linok
Vadim Semenov
Anastasia Trunova
Oleg Bulichev
Dmitry A. Yudin
54
0
0
06 May 2025
Rethinking Boundary Detection in Deep Learning-Based Medical Image Segmentation
Yi-Mou Lin
Dong-Ming Zhang
X. B. Fang
Yufan Chen
K.-T. Cheng
Hao Chen
33
0
0
06 May 2025
Sentence Embeddings as an intermediate target in end-to-end summarisation
Maciej Zembrzuski
Saad Mahamood
47
0
0
06 May 2025
Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation
Jincheng Zhang
Gyorgy Fazekas
C. Saitis
53
0
0
06 May 2025
Assessing and Enhancing the Robustness of LLM-based Multi-Agent Systems Through Chaos Engineering
Joshua Owotogbe
LLMAG
64
0
0
06 May 2025
Improving Failure Prediction in Aircraft Fastener Assembly Using Synthetic Data in Imbalanced Datasets
G. J. G. Lahr
Ricardo V. Godoy
Thiago H. Segreto
Jose O. Savazzi
Arash Ajoudani
Thiago Boaventura
G. Caurin
AI4CE
26
0
0
06 May 2025
Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning
Caleb Chuck
Fan Feng
Carl Qi
Chang Shi
Siddhant Agarwal
Amy Zhang
S. Niekum
47
0
0
06 May 2025
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
Junyu Ma
Tianqing Fang
Zizhuo Zhang
Hongming Zhang
Haitao Mi
Dong Yu
ReLM
RALM
LRM
216
0
0
06 May 2025
Mitigating Image Captioning Hallucinations in Vision-Language Models
Fei Zhao
Chenyi Zhang
Runlin Zhang
Tianyang Wang
Xi Li
VLM
44
0
0
06 May 2025
Geospatial Mechanistic Interpretability of Large Language Models
Stef De Sabbata
Stefano Mizzaro
Kevin Roitero
AI4CE
37
0
0
06 May 2025
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
68
0
0
06 May 2025
Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation
Songchen Fu
Siang Chen
Shaojing Zhao
Letian Bai
Ta Li
Yonghong Yan
32
0
0
06 May 2025
CaRaFFusion: Improving 2D Semantic Segmentation with Camera-Radar Point Cloud Fusion and Zero-Shot Image Inpainting
Huawei Sun
Bora Kunter Sahin
Georg Stettinger
Maximilian Bernhard
Matthias Schubert
Robert Wille
49
0
0
06 May 2025
Robust Understanding of Human-Robot Social Interactions through Multimodal Distillation
Tongfei Bian
Mathieu Chollet
T. Guha
31
0
0
06 May 2025
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights
Zhaiming Shen
Alex Havrilla
Rongjie Lai
A. Cloninger
Wenjing Liao
39
0
0
06 May 2025
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Hafez Ghaemi
Eilif Muller
Shahab Bakhtiari
54
0
0
06 May 2025
PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model
Y.B. Wang
S.Z. Zhou
J.F. Wu
T. Hu
J.N. Zhang
Zerui Li
Y. Liu
DiffM
VGen
71
0
0
06 May 2025
Enhancing Target-unspecific Tasks through a Features Matrix
Fangming Cui
Yonggang Zhang
Xuan Wang
Xinmei Tian
Jun Yu
AAML
50
0
0
06 May 2025
Prediction-powered estimators for finite population statistics in highly imbalanced textual data: Public hate crime estimation
Hannes Waldetoft
Jakob Torgander
Måns Magnusson
34
0
0
05 May 2025
MSFNet-CPD: Multi-Scale Cross-Modal Fusion Network for Crop Pest Detection
Jiaqi Zhang
Zhuodong Liu
Kejian Yu
43
0
0
05 May 2025
SCFormer: Structured Channel-wise Transformer with Cumulative Historical State for Multivariate Time Series Forecasting
Shiwei Guo
Zheyu Chen
Yupeng Ma
Yunfei Han
Yi Wang
AI4TS
220
0
0
05 May 2025
Advancing Constrained Monotonic Neural Networks: Achieving Universal Approximation Beyond Bounded Activations
Davide Sartor
Alberto Sinigaglia
Gian Antonio Susto
39
0
0
05 May 2025
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
41
1
0
05 May 2025
Database-Agnostic Gait Enrollment using SetTransformers
Nicoleta Basoc
Adrian Cosma
Andy Catruna
Emilian Radoi
SLR
36
0
0
05 May 2025
Large Language Model Partitioning for Low-Latency Inference at the Edge
Dimitrios Kafetzis
Ramin Khalili
Iordanis Koutsopoulos
29
0
0
05 May 2025
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Qingkai Fang
Yan Zhou
Shoutao Guo
Shaolei Zhang
Yang Feng
AuLLM
58
2
0
05 May 2025
Data Augmentation With Back translation for Low Resource languages: A case of English and Luganda
Richard Kimera
DongNyeong Heo
Daniela N. Rim
Heeyoul Choi
194
0
0
05 May 2025
LLM4FTS: Enhancing Large Language Models for Financial Time Series Prediction
Zian Liu
Renjun Jia
AI4TS
AIFin
60
0
0
05 May 2025
Bielik 11B v2 Technical Report
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
34
0
0
05 May 2025
DELTA: Dense Depth from Events and LiDAR using Transformer's Attention
Vincent Brebion
Julien Moreau
Franck Davoine
45
0
0
05 May 2025
DPNet: Dynamic Pooling Network for Tiny Object Detection
Luqi Gong
Haotian Chen
Yushen Chen
Tianliang Yao
Chao Li
Shuai Zhao
Guangjie Han
ObjD
215
0
0
05 May 2025
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
Yushen Chen
Jiawei Zhang
Baotong Lu
Qianxi Zhang
Chengruidong Zhang
...
Chen Chen
Mingxing Zhang
Yuqing Yang
Fan Yang
Mao Yang
38
0
0
05 May 2025
Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks
Juyoung Yun
40
0
0
05 May 2025
Bielik v3 Small: Technical Report
Krzysztof Ociepa
Łukasz Flis
Remigiusz Kinas
Krzysztof Wróbel
Adrian Gwoździej
29
0
0
05 May 2025
A Theoretical Analysis of Compositional Generalization in Neural Networks: A Necessary and Sufficient Condition
Yuanpeng Li
CoGe
224
0
0
05 May 2025
EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning
Lingxiao Kong
Cong Yang
Susanne Neufang
Oya Beyan
Zeyd Boukhers
OffRL
39
0
0
05 May 2025
Leveraging LLM Agents and Digital Twins for Fault Handling in Process Plants
Milapji Singh Gill
Javal Vyas
Artan Markaj
Felix Gehlhoff
Mehmet Mercangöz
41
0
0
04 May 2025
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
Sai Krishna Mendu
Harish Yenala
Aditi Gulati
Shanu Kumar
Parag Agrawal
36
0
0
04 May 2025
Always Skip Attention
Yiping Ji
Hemanth Saratchandran
Peyman Moghaddam
Simon Lucey
232
0
0
04 May 2025
Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning
Can Küçüksözen
Yücel Yemez
OCL
55
0
0
04 May 2025
Learning Local Causal World Models with State Space Models and Attention
Francesco Petri
Luigi Asprino
Aldo Gangemi
CML
40
0
0
04 May 2025
Deep Representation Learning for Electronic Design Automation
Pratik Shrestha
Saran Phatharodom
Alec Aversa
David Blankenship
Zhengfeng Wu
Ioannis Savidis
22
0
0
04 May 2025
Wide & Deep Learning for Node Classification
Yancheng Chen
Wenguo Yang
Zhipeng Jiang
GNN
38
0
0
04 May 2025
An Empirical Study of Qwen3 Quantization
Xingyu Zheng
Yuye Li
Haoran Chu
Yue Feng
Xudong Ma
Jie Luo
Jinyang Guo
Haotong Qin
Michele Magno
Xianglong Liu
MQ
33
1
0
04 May 2025
Previous
1
2
3
...
6
7
8
...
370
371
372
Next