Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.03762
Cited By
Attention Is All You Need
12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Attention Is All You Need"
50 / 18,509 papers shown
Title
Matrix-Free Preconditioning in Online Learning
Ashok Cutkosky
Tamás Sarlós
ODL
30
16
0
29 May 2019
Path-Augmented Graph Transformer Network
Benson Chen
Regina Barzilay
Tommi Jaakkola
ViT
GNN
22
59
0
29 May 2019
What Makes Training Multi-Modal Classification Networks Hard?
Weiyao Wang
Du Tran
Matt Feiszli
34
442
0
29 May 2019
Defending Against Neural Fake News
Rowan Zellers
Ari Holtzman
Hannah Rashkin
Yonatan Bisk
Ali Farhadi
Franziska Roesner
Yejin Choi
AAML
55
1,000
0
29 May 2019
Mixed Precision Training With 8-bit Floating Point
Naveen Mellempudi
Sudarshan Srinivasan
Dipankar Das
Bharat Kaul
MQ
18
69
0
29 May 2019
Attention Based Pruning for Shift Networks
G. B. Hacene
Carlos Lassance
Vincent Gripon
Matthieu Courbariaux
Yoshua Bengio
41
25
0
29 May 2019
DDP-GCN: Multi-Graph Convolutional Network for Spatiotemporal Traffic Forecasting
Kyungeun Lee
Wonjong Rhee
AI4TS
GNN
21
108
0
29 May 2019
Ensuring Readability and Data-fidelity using Head-modifier Templates in Deep Type Description Generation
Jiangjie Chen
Ao Wang
Haiyun Jiang
Suo Feng
Chenguang Li
Yanghua Xiao
29
3
0
29 May 2019
Adaptive Deep Kernel Learning
Prudencio Tossou
Basile Dura
François Laviolette
M. Marchand
Alexandre Lacoste
27
29
0
28 May 2019
Unsupervised Learning from Video with Deep Neural Embeddings
Chengxu Zhuang
Tianwei She
A. Andonian
Max Sobol Mark
Daniel L. K. Yamins
SSL
17
56
0
28 May 2019
BreizhCrops: A Time Series Dataset for Crop Type Mapping
M. Rußwurm
Charlotte Pelletier
Maximilian Zollner
Sébastien Lefèvre
Marco Korner
AI4TS
29
72
0
28 May 2019
Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)
Mariya Toneva
Leila Wehbe
MILM
AI4CE
42
220
0
28 May 2019
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks
Boris Ginsburg
P. Castonguay
Oleksii Hrinchuk
Oleksii Kuchaiev
Vitaly Lavrukhin
Ryan Leary
Jason Chun Lok Li
Huyen Nguyen
Yang Zhang
Jonathan M. Cohen
ODL
25
13
0
27 May 2019
CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Linhao Dong
Bo Xu
27
125
0
27 May 2019
Quantization-Based Regularization for Autoencoders
Hanwei Wu
M. Flierl
DRL
16
2
0
27 May 2019
Levenshtein Transformer
Jiatao Gu
Changhan Wang
Jake Zhao
49
359
0
27 May 2019
Let's Agree to Agree: Neural Networks Share Classification Order on Real Datasets
Guy Hacohen
Leshem Choshen
D. Weinshall
AI4TS
OOD
16
56
0
26 May 2019
TIGS: An Inference Algorithm for Text Infilling with Gradient Search
Dayiheng Liu
Jie Fu
Pengfei Liu
Jiancheng Lv
DiffM
18
27
0
26 May 2019
Hashing based Answer Selection
Dong Xu
Wu-Jun Li
19
6
0
26 May 2019
Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers
Liwei Wu
Shuqing Li
Cho-Jui Hsieh
James Sharpnack
21
31
0
25 May 2019
ESA: Entity Summarization with Attention
Dongjun Wei
Yaxin Liu
Fuqing Zhu
Liangjun Zang
Wei Zhou
Jizhong Han
Songlin Hu
17
13
0
25 May 2019
Soft Contextual Data Augmentation for Neural Machine Translation
Jinhua Zhu
Fei Gao
Lijun Wu
Yingce Xia
Tao Qin
Wen-gang Zhou
Xueqi Cheng
Tie-Yan Liu
27
125
0
25 May 2019
Discrete Flows: Invertible Generative Models of Discrete Data
Dustin Tran
Keyon Vafa
Kumar Krishna Agrawal
Laurent Dinh
Ben Poole
DRL
24
114
0
24 May 2019
An Explicitly Relational Neural Network Architecture
Murray Shanahan
Kyriacos Nikiforou
Antonia Creswell
Christos Kaplanis
David Barrett
M. Garnelo
NAI
3DV
GAN
25
68
0
24 May 2019
Fast Flow Reconstruction via Robust Invertible nxn Convolution
Thanh-Dat Truong
Khoa Luu
C. Duong
Ngan Le
M. Tran
19
7
0
24 May 2019
Personalizing Dialogue Agents via Meta-Learning
Zhaojiang Lin
Andrea Madotto
Chien-Sheng Wu
Pascale Fung
58
180
0
24 May 2019
CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation
Jiawei Ma
Zheng Shou
Alireza Zareian
Hassan Mansour
A. Vetro
Shih-Fu Chang
AI4TS
25
61
0
23 May 2019
Interpreting Adversarially Trained Convolutional Neural Networks
Tianyuan Zhang
Zhanxing Zhu
AAML
GAN
FAtt
28
158
0
23 May 2019
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks
Xiang Li
Xiaolin Hu
Jian Yang
24
193
0
23 May 2019
Theme-aware generation model for chinese lyrics
Jie Wang
Xinyan Zhao
23
10
0
23 May 2019
Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution
Yinchuan Xu
Junlin Yang
GNN
18
21
0
21 May 2019
Lightweight Network Architecture for Real-Time Action Recognition
Alexander Kozlov
Vadim Andronov
Y. Gritsenko
ViT
25
33
0
21 May 2019
GAPNet: Graph Attention based Point Neural Network for Exploiting Local Feature of Point Cloud
Can Chen
L. Z. Fragonara
Antonios Tsourdos
3DPC
19
86
0
21 May 2019
CNNs found to jump around more skillfully than RNNs: Compositional generalization in seq2seq convolutional networks
Roberto Dessì
Marco Baroni
UQCV
16
44
0
21 May 2019
KGAT: Knowledge Graph Attention Network for Recommendation
Xiang Wang
Xiangnan He
Yixin Cao
Meng Liu
Tat-Seng Chua
OffRL
32
1,793
0
20 May 2019
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
27
377
0
20 May 2019
Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction
Yinfei Yang
Oshin Agarwal
Chris Tar
Byron C. Wallace
A. Nenkova
29
14
0
19 May 2019
Story Ending Prediction by Transferable BERT
Zhongyang Li
Xiao Ding
Ting Liu
34
52
0
17 May 2019
Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language
Yuri Kuratov
M. Arkhipov
11
274
0
17 May 2019
Gmail Smart Compose: Real-Time Assisted Writing
Mengzhao Chen
Benjamin Lee
G. Bansal
Yuan Cao
Shuyuan Zhang
...
Yinan Wang
Andrew M. Dai
Zhehuai Chen
Timothy Sohn
Yonghui Wu
18
203
0
17 May 2019
Exact-K Recommendation via Maximal Clique Optimization
Yu Gong
Yu Zhu
Lu Duan
Qingwen Liu
Ziyu Guan
Fei Sun
Wenwu Ou
Kenny Q. Zhu
OffRL
CML
18
59
0
17 May 2019
HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization
Xingxing Zhang
Furu Wei
M. Zhou
37
377
0
16 May 2019
What do you learn from context? Probing for sentence structure in contextualized word representations
Ian Tenney
Patrick Xia
Berlin Chen
Alex Jinpeng Wang
Adam Poliak
...
Najoung Kim
Benjamin Van Durme
Samuel R. Bowman
Dipanjan Das
Ellie Pavlick
91
848
0
15 May 2019
A Surprisingly Robust Trick for Winograd Schema Challenge
Vid Kocijan
Ana-Maria Cretu
Oana-Maria Camburu
Yordan Yordanov
Thomas Lukasiewicz
23
101
0
15 May 2019
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations
Fenglin Liu
Yuanxin Liu
Xuancheng Ren
Xiaodong He
Xu Sun
VLM
34
81
0
15 May 2019
BERT Rediscovers the Classical NLP Pipeline
Ian Tenney
Dipanjan Das
Ellie Pavlick
MILM
SSeg
50
1,439
0
15 May 2019
Sparse Sequence-to-Sequence Models
Ben Peters
Vlad Niculae
André F. T. Martins
TPM
27
209
0
14 May 2019
Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation
Loïc Vial
Benjamin Lecouteux
D. Schwab
16
90
0
14 May 2019
Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation
Ning Dai
Jianze Liang
Xipeng Qiu
Xuanjing Huang
DRL
16
202
0
14 May 2019
TauRieL: Targeting Traveling Salesman Problem with a deep reinforcement learning inspired architecture
Gorker Alp Malazgirt
O. Unsal
A. Cristal
19
5
0
14 May 2019
Previous
1
2
3
...
359
360
361
...
369
370
371
Next