ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.03762
  4. Cited By
Attention Is All You Need
v1v2v3v4v5v6v7 (latest)

Attention Is All You Need

12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
    3DV
ArXiv (abs)PDFHTML

Papers citing "Attention Is All You Need"

50 / 27,143 papers shown
Title
Blockbuster, Part 1: Block-level AI Operator Fusion
Blockbuster, Part 1: Block-level AI Operator Fusion
Ofer Dekel
53
0
0
29 Apr 2025
TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts
TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts
Pradip Kunwar
Minh Vu
Maanak Gupta
Mahmoud Abdelsalam
Manish Bhattarai
MoEMoMe
445
0
0
29 Apr 2025
Revisiting the MIMIC-IV Benchmark: Experiments Using Language Models for Electronic Health Records
Revisiting the MIMIC-IV Benchmark: Experiments Using Language Models for Electronic Health Records
Jesus Lovon
Thouria Ben-Haddi
Jules Di Scala
José G. Moreno
L. Tamine
140
3
0
29 Apr 2025
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
Zayd Muhammad Kawakibi Zuhri
Erland Hilman Fuadi
Alham Fikri Aji
54
0
0
29 Apr 2025
What's Wrong with Your Synthetic Tabular Data? Using Explainable AI to Evaluate Generative Models
What's Wrong with Your Synthetic Tabular Data? Using Explainable AI to Evaluate Generative Models
Jan Kapar
Niklas Koenen
Martin Jullum
114
0
0
29 Apr 2025
PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking
PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking
Xiatao Sun
Yinxing Chen
Daniel Rakita
VGen
151
0
0
29 Apr 2025
On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks
On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks
Adrian Rebmann
Fabian David Schmidt
Goran Glavaš
Han van der Aa
LRM
61
0
0
29 Apr 2025
MemeBLIP2: A novel lightweight multimodal system to detect harmful memes
MemeBLIP2: A novel lightweight multimodal system to detect harmful memes
Jiaqi Liu
Ran Tong
Aowei Shen
Shuzheng Li
Changlin Yang
Lisha Xu
VLM
139
1
0
29 Apr 2025
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
Jianyu Wu
Yizhou Wang
Xiangyu Yue
Xinzhu Ma
Jinpei Guo
Dongzhan Zhou
Wanli Ouyang
Shixiang Tang
150
0
0
29 Apr 2025
Confidence-based Intent Prediction for Teleoperation in Bimanual Robotic Suturing
Confidence-based Intent Prediction for Teleoperation in Bimanual Robotic Suturing
Zhaoyang Jacopo Hu
Haozheng Xu
Sion Kim
Yanan Li
Ferdinando Rodriguez y Baena
Etienne Burdet
80
0
0
29 Apr 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
138
0
0
29 Apr 2025
WenyanGPT: A Large Language Model for Classical Chinese Tasks
WenyanGPT: A Large Language Model for Classical Chinese Tasks
Xinyu Yao
Mengdi Wang
Bo Chen
Xiaobing Zhao
115
0
0
29 Apr 2025
Model Connectomes: A Generational Approach to Data-Efficient Language Models
Model Connectomes: A Generational Approach to Data-Efficient Language Models
Klemen Kotar
Greta Tuckute
168
0
0
29 Apr 2025
How to Coordinate UAVs and UGVs for Efficient Mission Planning? Optimizing Energy-Constrained Cooperative Routing with a DRL Framework
How to Coordinate UAVs and UGVs for Efficient Mission Planning? Optimizing Energy-Constrained Cooperative Routing with a DRL Framework
Md Safwan Mondal
S. Ramasamy
Luca Russo
James D. Humann
James M. Dotterweich
Pranav A. Bhounsule
95
0
0
29 Apr 2025
LLM Enhancer: Merged Approach using Vector Embedding for Reducing Large Language Model Hallucinations with External Knowledge
LLM Enhancer: Merged Approach using Vector Embedding for Reducing Large Language Model Hallucinations with External Knowledge
Naheed Rayhan
Md. Ashrafuzzaman
49
0
0
29 Apr 2025
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
Kang Yang
Xinjun Mao
Shangwen Wang
Yanjie Wang
Tanghaoran Zhang
Bo Lin
Yihao Qin
Zhang Zhang
Yao Lu
Kamal Al-Sabahi
ALM
283
1
0
28 Apr 2025
Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI
Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI
Hugo Georgenthum
Cristian Cosentino
Fabrizio Marozzo
Pietro Liò
MedIm
443
0
0
28 Apr 2025
ClearVision: Leveraging CycleGAN and SigLIP-2 for Robust All-Weather Classification in Traffic Camera Imagery
ClearVision: Leveraging CycleGAN and SigLIP-2 for Robust All-Weather Classification in Traffic Camera Imagery
Anush Lakshman Sivaraman
Kojo Adu-Gyamfi
Ibne Farabi Shihab
Anuj Sharma
54
1
0
28 Apr 2025
UNet with Axial Transformer : A Neural Weather Model for Precipitation Nowcasting
UNet with Axial Transformer : A Neural Weather Model for Precipitation Nowcasting
Maitreya Sonawane
Sumit Mamtani
164
0
0
28 Apr 2025
Exploiting Inter-Sample Correlation and Intra-Sample Redundancy for Partially Relevant Video Retrieval
Exploiting Inter-Sample Correlation and Intra-Sample Redundancy for Partially Relevant Video Retrieval
Junlong Ren
Gangjian Zhang
Yitao Hu
Jian Shu
Haoran Wang
100
0
0
28 Apr 2025
AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection
AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection
Jianbo Gao
Keke Gai
Jing Yu
Liehuang Zhu
Qi Wu
AAML
96
0
0
28 Apr 2025
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
A. Zandieh
Majid Daliri
Majid Hadian
Vahab Mirrokni
MQ
127
0
0
28 Apr 2025
Towards Ball Spin and Trajectory Analysis in Table Tennis Broadcast Videos via Physically Grounded Synthetic-to-Real Transfer
Towards Ball Spin and Trajectory Analysis in Table Tennis Broadcast Videos via Physically Grounded Synthetic-to-Real Transfer
Daniel Kienzle
Robin Schon
Rainer Lienhart
ShinÍchi Satoh
122
0
0
28 Apr 2025
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
Xilong Xie
Liang Wang
Limin Xiao
Meng Han
Lin Sun
S. Zheng
Xiangrong Xu
MQ
84
0
0
28 Apr 2025
CodeBC: A More Secure Large Language Model for Smart Contract Code Generation in Blockchain
CodeBC: A More Secure Large Language Model for Smart Contract Code Generation in Blockchain
LingXiang Wang
Hainan Zhang
Qinnan Zhang
Ziwei Wang
Hongwei Zheng
Jin Dong
Zhiming Zheng
145
0
0
28 Apr 2025
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
Kohei Saijo
Tetsuji Ogawa
85
1
0
28 Apr 2025
Magnifier: A Multi-grained Neural Network-based Architecture for Burned Area Delineation
Magnifier: A Multi-grained Neural Network-based Architecture for Burned Area Delineation
Daniele Rege Cambrin
Luca Colomba
Paolo Garza
93
0
0
28 Apr 2025
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation
Zhe Dong
Yuzhe Sun
Tianzhu Liu
Wangmeng Zuo
Yanfeng Gu
90
0
0
28 Apr 2025
Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom
Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom
Rishika Sen
Sujoy Roychowdhury
Sumit Soman
H. G. Ranjani
Srikhetra Mohanty
126
0
0
28 Apr 2025
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report
Paul Kassianik
Baturay Saglam
Alexander Chen
Blaine Nelson
Anu Vellore
...
Hyrum Anderson
Kojin Oshiba
Omar Santos
Yaron Singer
Amin Karbasi
PILM
89
2
0
28 Apr 2025
Foundation Model-Driven Framework for Human-Object Interaction Prediction with Segmentation Mask Integration
Foundation Model-Driven Framework for Human-Object Interaction Prediction with Segmentation Mask Integration
Juhan Park
Kyungjae Lee
Hyung Jin Chang
Jungchan Cho
VLM
113
0
0
28 Apr 2025
DISCO: learning to DISCover an evolution Operator for multi-physics-agnostic prediction
DISCO: learning to DISCover an evolution Operator for multi-physics-agnostic prediction
Rudy Morel
Jiequn Han
Edouard Oyallon
AI4CE
94
1
0
28 Apr 2025
A Transformer-Based Approach for Diagnosing Fault Cases in Optical Fiber Amplifiers
A Transformer-Based Approach for Diagnosing Fault Cases in Optical Fiber Amplifiers
Dominic Schneider
Lutz Rapp
Christoph Ament
84
0
0
28 Apr 2025
GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability
GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability
Sehyeong Jo
Gangjae Jang
Haesol Park
143
0
0
28 Apr 2025
Learning Hierarchical Interaction for Accurate Molecular Property Prediction
Learning Hierarchical Interaction for Accurate Molecular Property Prediction
Huiyang Hong
Xinkai Wu
Hongyu Sun
Chaoyang Xie
Qi Wang
Yongqian Li
187
0
0
28 Apr 2025
Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose
Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose
Narges Rashvand
Ghazal Alinezhad Noghre
Armin Danesh Pazho
B. R. Ardabili
Hamed Tabkhi
ViT
102
0
0
28 Apr 2025
A Transformer-based Multimodal Fusion Model for Efficient Crowd Counting Using Visual and Wireless Signals
A Transformer-based Multimodal Fusion Model for Efficient Crowd Counting Using Visual and Wireless Signals
Zhe Cui
Yuli Li
Le-Nam Tran
54
0
0
28 Apr 2025
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Athinagoras Skiadopoulos
Mark Zhao
Swapnil Gandhi
Thomas Norrie
Shrijeet Mukherjee
Christos Kozyrakis
MoE
141
0
0
28 Apr 2025
Breast Cancer Detection from Multi-View Screening Mammograms with Visual Prompt Tuning
Breast Cancer Detection from Multi-View Screening Mammograms with Visual Prompt Tuning
Han Chen
Anne L. Martel
103
0
0
28 Apr 2025
LZ Penalty: An information-theoretic repetition penalty for autoregressive language models
LZ Penalty: An information-theoretic repetition penalty for autoregressive language models
Antonio A. Ginart
Naveen Kodali
Jason D. Lee
Caiming Xiong
Siyang Song
John Emmons
33
0
0
28 Apr 2025
SAMBLE: Shape-Specific Point Cloud Sampling for an Optimal Trade-Off Between Local Detail and Global Uniformity
SAMBLE: Shape-Specific Point Cloud Sampling for an Optimal Trade-Off Between Local Detail and Global Uniformity
Chengzhi Wu
Yuxin Wan
Hao Fu
Julius Pfrommer
Zeyun Zhong
Junwei Zheng
Jiaming Zhang
Jürgen Beyerer
3DPC
107
0
0
28 Apr 2025
GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning
GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning
Juyi Sheng
Yangjun Liu
Sheng Xu
Zhixin Yang
Mengyuan Liu
134
0
0
28 Apr 2025
Geometry-Informed Neural Operator Transformer
Geometry-Informed Neural Operator Transformer
Qibang Liu
Vincient Zhong
Hadi Meidani
Diab Abueidda
S. Koric
Philippe Geubelle
AI4CE
91
1
0
28 Apr 2025
TeleSparse: Practical Privacy-Preserving Verification of Deep Neural Networks
TeleSparse: Practical Privacy-Preserving Verification of Deep Neural Networks
Mohammad Maheri
Hamed Haddadi
Alex Davidson
113
0
0
27 Apr 2025
RadioFormer: A Multiple-Granularity Radio Map Estimation Transformer with 1\textpertenthousand Spatial Sampling
RadioFormer: A Multiple-Granularity Radio Map Estimation Transformer with 1\textpertenthousand Spatial Sampling
Zheng Fang
Kangjun Liu
Ke Chen
Qingyu Liu
Junxuan Zhang
Lingyang Song
Yaowei Wang
70
0
0
27 Apr 2025
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
Alexander Baumann
Leonardo Ayala
Siyang Song
Jan Sellner
Alexander Studier-Fischer
Berkin Özdemir
Lena Maier-Hein
Slobodan Ilic
108
0
0
27 Apr 2025
WuNeng: Hybrid State with Attention
WuNeng: Hybrid State with Attention
Liu Xiao
Li Zhiyuan
Lin Yueyu
427
0
0
27 Apr 2025
Learning to Drive from a World Model
Learning to Drive from a World Model
Mitchell Goff
Greg Hogan
George Hotz
Armand du Parc Locmaria
Kacper Raczy
Harald Schäfer
Adeeb Shihadeh
Weixing Zhang
Yassine Yousfi
80
2
0
27 Apr 2025
LRFusionPR: A Polar BEV-Based LiDAR-Radar Fusion Network for Place Recognition
LRFusionPR: A Polar BEV-Based LiDAR-Radar Fusion Network for Place Recognition
Zhangshuo Qi
Luqi Cheng
Zijie Zhou
Guangming Xiong
113
0
0
27 Apr 2025
Adaptive Dual-domain Learning for Underwater Image Enhancement
Adaptive Dual-domain Learning for Underwater Image Enhancement
Lingtao Peng
Liheng Bian
57
0
0
27 Apr 2025
Previous
123...383940...541542543
Next