ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.03762
  4. Cited By
Attention Is All You Need
v1v2v3v4v5v6v7 (latest)

Attention Is All You Need

12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
    3DV
ArXiv (abs)PDFHTML

Papers citing "Attention Is All You Need"

50 / 27,176 papers shown
Title
Compact Recurrent Transformer with Persistent Memory
Compact Recurrent Transformer with Persistent Memory
Edison Mucllari
Z. Daniels
David C. Zhang
Qiang Ye
CLLVLM
121
0
0
02 May 2025
Nesterov Method for Asynchronous Pipeline Parallel Optimization
Nesterov Method for Asynchronous Pipeline Parallel Optimization
Thalaiyasingam Ajanthan
Sameera Ramasinghe
Yan Zuo
Gil Avraham
Alexander Long
81
0
0
02 May 2025
SpectrumFM: A Foundation Model for Intelligent Spectrum Management
SpectrumFM: A Foundation Model for Intelligent Spectrum Management
F. Zhou
Chunyu Liu
Hao Zhang
Wei Wu
Qihui Wu
Derrick Wing Kwan Ng
Tony Q. S. Quek
Chan-Byoung Chae
62
0
0
02 May 2025
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
Chen Xu
Yuxuan Yue
Zukang Xu
Xing Hu
Jiangyong Yu
Zhixuan Chen
Sifan Zhou
Zhihang Yuan
Dawei Yang
MQ
62
0
0
02 May 2025
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
Xing Hu
Zhixuan Chen
Dawei Yang
Zukang Xu
Chen Xu
Zhihang Yuan
Sifan Zhou
Jiangyong Yu
MoEMQ
110
2
0
02 May 2025
Global Collinearity-aware Polygonizer for Polygonal Building Mapping in Remote Sensing
Global Collinearity-aware Polygonizer for Polygonal Building Mapping in Remote Sensing
Fahong Zhang
Yilei Shi
Xiao Xiang Zhu
70
1
0
02 May 2025
Don't be lazy: CompleteP enables compute-efficient deep transformers
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
114
2
0
02 May 2025
On the effectiveness of Large Language Models in the mechanical design domain
On the effectiveness of Large Language Models in the mechanical design domain
Daniele Grandi
Fabian Riquelme
34
0
0
02 May 2025
Multimodal and Multiview Deep Fusion for Autonomous Marine Navigation
Multimodal and Multiview Deep Fusion for Autonomous Marine Navigation
Dimitrios Dagdilelis
Panagiotis Grigoriadis
R. Galeazzi
3DPC
433
0
0
02 May 2025
Distilling Two-Timed Flow Models by Separately Matching Initial and Terminal Velocities
Distilling Two-Timed Flow Models by Separately Matching Initial and Terminal Velocities
Pramook Khungurn
Pratch Piyawongwisal
Sira Sriswadi
Supasorn Suwajanakorn
141
0
0
02 May 2025
A Transformer-based Neural Architecture Search Method
A Transformer-based Neural Architecture Search Method
Shang Wang
Huanrong Tang
Jianquan Ouyang
74
0
0
02 May 2025
Beyond Attention: Toward Machines with Intrinsic Higher Mental States
Beyond Attention: Toward Machines with Intrinsic Higher Mental States
Ahsan Adeel
OffRLLRM
76
0
0
02 May 2025
Enhancing ML Model Interpretability: Leveraging Fine-Tuned Large Language Models for Better Understanding of AI
Enhancing ML Model Interpretability: Leveraging Fine-Tuned Large Language Models for Better Understanding of AI
Jonas Bokstaller
Julia Altheimer
Julian Dormehl
Alina Buss
Jasper Wiltfang
Johannes Schneider
Maximilian Röglinger
67
0
0
02 May 2025
Artificial Intelligence in Government: Why People Feel They Lose Control
Artificial Intelligence in Government: Why People Feel They Lose Control
Alexander Wuttke
Adrian Rauchfleisch
Andreas Jungherr
85
1
0
02 May 2025
Multi-Hierarchical Fine-Grained Feature Mapping Driven by Feature Contribution for Molecular Odor Prediction
Multi-Hierarchical Fine-Grained Feature Mapping Driven by Feature Contribution for Molecular Odor Prediction
Hong Xin Xie
Jian De Sun
Fan Fu Xue
Zi Fei Han
S. Feng
Qi Chen
119
0
0
01 May 2025
Protocol-agnostic and Data-free Backdoor Attacks on Pre-trained Models in RF Fingerprinting
Protocol-agnostic and Data-free Backdoor Attacks on Pre-trained Models in RF Fingerprinting
Tianya Zhao
Ningning Wang
Junqing Zhang
Xuyu Wang
AAML
70
0
0
01 May 2025
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
Kwon Byung-Ki
Qi Dai
Lee Hyoseok
Chong Luo
Tae-Hyun Oh
166
0
0
01 May 2025
Interpretable Spatial-Temporal Fusion Transformers: Multi-Output Prediction for Parametric Dynamical Systems with Time-Varying Inputs
Interpretable Spatial-Temporal Fusion Transformers: Multi-Output Prediction for Parametric Dynamical Systems with Time-Varying Inputs
Shuwen Sun
Lihong Feng
P. Benner
81
0
0
01 May 2025
Temporal Attention Evolutional Graph Convolutional Network for Multivariate Time Series Forecasting
Temporal Attention Evolutional Graph Convolutional Network for Multivariate Time Series Forecasting
Xinlong Zhao
Lingling Zhang
Tianbo Zou
Yan Zhang
AI4TS
153
0
0
01 May 2025
Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations
Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations
Yu-Hsiang Lan
Anton Alyakin
AI4TS
87
0
0
01 May 2025
Reasoning Capabilities and Invariability of Large Language Models
Reasoning Capabilities and Invariability of Large Language Models
Alessandro Raganato
Rafael Peñaloza
Marco Viviani
G. Pasi
ReLMLRM
128
0
0
01 May 2025
On the Importance of Gaussianizing Representations
On the Importance of Gaussianizing Representations
Daniel Eftekhari
Vardan Papyan
77
0
0
01 May 2025
Scalable Meta-Learning via Mixed-Mode Differentiation
Scalable Meta-Learning via Mixed-Mode Differentiation
Iurii Kemaev
Dan A. Calian
Luisa M. Zintgraf
Gregory Farquhar
H. V. Hasselt
108
1
0
01 May 2025
Unlocking the Potential of Linear Networks for Irregular Multivariate Time Series Forecasting
Unlocking the Potential of Linear Networks for Irregular Multivariate Time Series Forecasting
Chengsen Wang
Q. Qi
Jiangming Wang
Haifeng Sun
Zirui Zhuang
J. Liao
AI4TS
76
0
0
01 May 2025
D-Tracker: Modeling Interest Diffusion in Social Activity Tensor Data Streams
D-Tracker: Modeling Interest Diffusion in Social Activity Tensor Data Streams
Shingo Higashiguchi
Yasuko Matsubara
Koki Kawabata
Taichi Murayama
Yasushi Sakurai
AI4TS
84
0
0
01 May 2025
Efficient Recommendation with Millions of Items by Dynamic Pruning of Sub-Item Embeddings
Efficient Recommendation with Millions of Items by Dynamic Pruning of Sub-Item Embeddings
Aleksandr V. Petrov
Craig MacDonald
Nicola Tonellotto
57
0
0
01 May 2025
Visual Trajectory Prediction of Vessels for Inland Navigation
Visual Trajectory Prediction of Vessels for Inland Navigation
Alexander Puzicha
Konstantin Wüstefeld
Kathrin Wilms
Frank Weichert
70
0
0
01 May 2025
FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension
FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension
Jushi Kai
Boyi Zeng
Yansen Wang
Haoli Bai
Ziwei He
Bo Jiang
Zhouhan Lin
130
0
0
01 May 2025
CognitionNet: A Collaborative Neural Network for Play Style Discovery in Online Skill Gaming Platform
CognitionNet: A Collaborative Neural Network for Play Style Discovery in Online Skill Gaming Platform
Rukma Talwadker
Surajit Chakrabarty
Aditya Pareek
Tridib Mukherjee
Deepak Saini
104
6
0
01 May 2025
Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures
Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures
Heng-Sheng Chang
P. Mehta
83
0
0
01 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoEVLM
268
2
0
01 May 2025
Towards Explainable Temporal User Profiling with LLMs
Towards Explainable Temporal User Profiling with LLMs
Milad Sabouri
M. Mansoury
Kun-hsien Lin
B. Mobasher
93
1
0
01 May 2025
CSE-SFP: Enabling Unsupervised Sentence Representation Learning via a Single Forward Pass
CSE-SFP: Enabling Unsupervised Sentence Representation Learning via a Single Forward Pass
Bowen Zhang
Zixin Song
Chunping Li
48
1
0
01 May 2025
Rethinking Time Encoding via Learnable Transformation Functions
Rethinking Time Encoding via Learnable Transformation Functions
Xi Chen
Yateng Tang
Jiarong Xu
Jiawei Zhang
Siwei Zhang
Sijia Peng
Xuehao Zheng
Yun Xiong
AI4TS
216
0
0
01 May 2025
SA-GAT-SR: Self-Adaptable Graph Attention Networks with Symbolic Regression for high-fidelity material property prediction
SA-GAT-SR: Self-Adaptable Graph Attention Networks with Symbolic Regression for high-fidelity material property prediction
Junchi Liu
Ying Tang
Sergei Tretiak
Wenhui Duan
Liujiang Zhou
179
0
0
01 May 2025
Self-Ablating Transformers: More Interpretability, Less Sparsity
Self-Ablating Transformers: More Interpretability, Less Sparsity
Jeremias Ferrao
Luhan Mikaelson
Keenan Pepper
Natalia Perez-Campanero Antolin
MILM
77
0
0
01 May 2025
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo
Chenghao Qiu
Maojiang Su
Zhihan Zhou
Zoe Mehta
Guo Ye
Jerry Yao-Chieh Hu
Han Liu
AAML
107
1
0
01 May 2025
InstructAttribute: Fine-grained Object Attributes editing with Instruction
InstructAttribute: Fine-grained Object Attributes editing with Instruction
Xingxi Yin
Jingfeng Zhang
Zhi Li
You Li
Yanzhe Zhang
Yin Zhang
DiffM
453
1
0
01 May 2025
Steering Large Language Models with Register Analysis for Arbitrary Style Transfer
Steering Large Language Models with Register Analysis for Arbitrary Style Transfer
Xinchen Yang
Marine Carpuat
LLMSV
563
0
0
01 May 2025
TeLoGraF: Temporal Logic Planning via Graph-encoded Flow Matching
TeLoGraF: Temporal Logic Planning via Graph-encoded Flow Matching
Yue Meng
Chuchu Fan
97
0
0
01 May 2025
Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models
Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models
Andrew J. Adiletta
B. Sunar
418
0
0
01 May 2025
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Muyi Bao
Shuchang Lyu
Zhaoyang Xu
Huiyu Zhou
Jinchang Ren
Shiming Xiang
Xuelong Li
Guangliang Cheng
Mamba
263
0
0
01 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Kai Zhang
Lizhuang Ma
Jiangming Wang
Jun Wang
Weinan Zhang
Wei Zhang
MQ
80
0
0
01 May 2025
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Yeseong Kim
111
0
0
01 May 2025
Learning to Estimate Package Delivery Time in Mixed Imbalanced Delivery and Pickup Logistics Services
Learning to Estimate Package Delivery Time in Mixed Imbalanced Delivery and Pickup Logistics Services
Jinhui Yi
Huan Yan
Haotian Wang
Jian Yuan
Yongbin Li
AI4TS
162
1
0
01 May 2025
iMacSR: Intermediate Multi-Access Supervision and Regularization in Training Autonomous Driving Models
iMacSR: Intermediate Multi-Access Supervision and Regularization in Training Autonomous Driving Models
Wei-Bin Kou
Guangxu Zhu
Yichen Jin
Shuai Wang
Ming Tang
Yik-Chung Wu
70
0
0
01 May 2025
AI2-Active Safety: AI-enabled Interaction-aware Active Safety Analysis with Vehicle Dynamics
AI2-Active Safety: AI-enabled Interaction-aware Active Safety Analysis with Vehicle Dynamics
Keshu Wu
Zehan Li
Sixu Li
Xinyue Ye
Dominique Lord
Yang Zhou
123
1
0
01 May 2025
Enhancing Tropical Cyclone Path Forecasting with an Improved Transformer Network
Enhancing Tropical Cyclone Path Forecasting with an Improved Transformer Network
Nguyen Van Thanh
Nguyen Dang Huynh
Nguyen Ngoc Tan
Nguyen Thai Minh
Nguyen Nam Hoang
53
0
0
01 May 2025
The Coral Protocol: Open Infrastructure Connecting The Internet of Agents
The Coral Protocol: Open Infrastructure Connecting The Internet of Agents
Roman J. Georgio
Caelum Forder
Suman Deb
Peter Carroll
Önder Gürcan
140
0
0
30 Apr 2025
Generative Machine Learning in Adaptive Control of Dynamic Manufacturing Processes: A Review
Generative Machine Learning in Adaptive Control of Dynamic Manufacturing Processes: A Review
Suk Ki Lee
Hyunwoong Ko
AI4CE
101
0
0
30 Apr 2025
Previous
123...363738...542543544
Next