Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.03762
Cited By
Attention Is All You Need
12 June 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Attention Is All You Need"
50 / 18,000 papers shown
Title
Hello, It's GPT-2 -- How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems
Paweł Budzianowski
Ivan Vulić
22
308
0
12 Jul 2019
R-Transformer: Recurrent Neural Network Enhanced Transformer
Z. Wang
Yao Ma
Zitao Liu
Jiliang Tang
ViT
16
105
0
12 Jul 2019
Faster Neural Network Training with Data Echoing
Dami Choi
Alexandre Passos
Christopher J. Shallue
George E. Dahl
23
48
0
12 Jul 2019
Privileged Features Distillation at Taobao Recommendations
Chen Xu
Quan Li
Junfeng Ge
Jinyang Gao
Xiaoyong Yang
Changhua Pei
Fei Sun
Jian Wu
Hanxiao Sun
Wenwu Ou
15
67
0
11 Jul 2019
Object Detection in Video with Spatial-temporal Context Aggregation
Hao Luo
Lichao Huang
Han Shen
Yuan Li
Chang Huang
Xinggang Wang
18
14
0
11 Jul 2019
LakhNES: Improving multi-instrumental music generation with cross-domain pre-training
Chris Donahue
H. H. Mao
Yiting Li
G. Cottrell
Julian McAuley
35
117
0
10 Jul 2019
Sparse Networks from Scratch: Faster Training without Losing Performance
Tim Dettmers
Luke Zettlemoyer
20
334
0
10 Jul 2019
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Kevin Clark
Minh-Thang Luong
Urvashi Khandelwal
Christopher D. Manning
Quoc V. Le
21
228
0
10 Jul 2019
Improving the Performance of the LSTM and HMM Model via Hybridization
Larkin Liu
Yu-Chung Lin
Joshua Reid
22
9
0
09 Jul 2019
ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation
Hainan Zhang
Yanyan Lan
Liang Pang
J. Guo
Xueqi Cheng
24
114
0
09 Jul 2019
XFake: Explainable Fake News Detector with Visualizations
Fan Yang
Shiva K. Pentyala
Sina Mohseni
Mengnan Du
Hao Yuan
Rhema Linder
Eric D. Ragan
Shuiwang Ji
Xia Hu
13
118
0
08 Jul 2019
Generating Sentences from Disentangled Syntactic and Semantic Spaces
Yu Bao
Hao Zhou
Shujian Huang
Lei Li
Lili Mou
Olga Vechtomova
Xinyu Dai
Jiajun Chen
DRL
21
107
0
06 Jul 2019
Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine Translation
Aizhan Imankulova
Raj Dabre
Atsushi Fujita
K. Imamura
33
32
0
06 Jul 2019
Graph Representation Learning via Hard and Channel-Wise Attention Networks
Hongyang Gao
Shuiwang Ji
GNN
25
57
0
05 Jul 2019
Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model
Giuseppe Castellucci
Valentina Bellomaria
Andrea Favalli
Raniero Romagnoli
VLM
19
73
0
05 Jul 2019
Head-Driven Phrase Structure Grammar Parsing on Penn Treebank
Junru Zhou
Zhao Hai
42
144
0
05 Jul 2019
Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks
V. Kosaraju
Amir Sadeghian
Roberto Martín-Martín
Ian Reid
S. Hamid Rezatofighi
Silvio Savarese
22
593
0
04 Jul 2019
Learning Blended, Precise Semantic Program Embeddings
Ke Wang
Z. Su
NAI
30
25
0
03 Jul 2019
On the Weaknesses of Reinforcement Learning for Neural Machine Translation
Leshem Choshen
Lior Fox
Zohar Aizenbud
Omri Abend
17
104
0
03 Jul 2019
Brno Mobile OCR Dataset
Martin Kiss
Michal Hradiš
O. Kodym
11
18
0
02 Jul 2019
Weak Supervision Enhanced Generative Network for Question Generation
Yutong Wang
Jiyuan Zheng
Qijiong Liu
Zhou Zhao
Jun Xiao
Yueting Zhuang
OOD
14
6
0
01 Jul 2019
Few-Shot Representation Learning for Out-Of-Vocabulary Words
Ziniu Hu
Ting-Li Chen
Kai-Wei Chang
Yizhou Sun
24
76
0
01 Jul 2019
BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language Recognition
Shaoshi Ling
Julian Salazar
Yuzong Liu
Katrin Kirchhoff
SSL
30
28
0
30 Jun 2019
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
Jie Cao
Michael J. Tanana
Zac E. Imel
E. Poitras
David C. Atkins
Vivek Srikumar
OffRL
31
55
0
30 Jun 2019
Deep Gamblers: Learning to Abstain with Portfolio Theory
Liu Ziyin
Zhikang T. Wang
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
Masahito Ueda
23
110
0
29 Jun 2019
Frame attention networks for facial expression recognition in videos
Debin Meng
Xiaojiang Peng
Kai Wang
Yu Qiao
CVBM
19
169
0
29 Jun 2019
Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings
Marcely Zanon Boito
Aline Villavicencio
Laurent Besacier
17
8
0
29 Jun 2019
Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks
Zhu Zhang
Zhou Zhao
Zhijie Lin
Jingkuan Song
Xiaofei He
BDL
27
14
0
28 Jun 2019
Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation
Eva Vanmassenhove
D. Shterionov
Andy Way
22
91
0
28 Jun 2019
Relating Simple Sentence Representations in Deep Neural Networks and the Brain
Sharmistha Jat
Hao Tang
Partha P. Talukdar
Tom Michael Mitchell
22
21
0
27 Jun 2019
The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation
Mai Oudah
Amjad Almahairi
Nizar Habash
30
34
0
27 Jun 2019
Selection via Proxy: Efficient Data Selection for Deep Learning
Cody Coleman
Christopher Yeh
Stephen Mussmann
Baharan Mirzasoleiman
Peter Bailis
Percy Liang
J. Leskovec
Matei A. Zaharia
26
329
0
26 Jun 2019
Sharing Attention Weights for Fast Transformer
Tong Xiao
Yinqiao Li
Jingbo Zhu
Zhengtao Yu
Tongran Liu
17
50
0
26 Jun 2019
Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs
Soheil Kolouri
Aniruddha Saha
Hamed Pirsiavash
Heiko Hoffmann
AAML
25
231
0
26 Jun 2019
Program Synthesis and Semantic Parsing with Learned Code Idioms
Richard Shin
Miltiadis Allamanis
Marc Brockschmidt
Oleksandr Polozov
24
87
0
26 Jun 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
36
797
0
25 Jun 2019
Saliency-driven Word Alignment Interpretation for Neural Machine Translation
Shuoyang Ding
Hainan Xu
Philipp Koehn
22
55
0
25 Jun 2019
Training an Interactive Helper
Mark P. Woodward
Chelsea Finn
Karol Hausman
27
1
0
24 Jun 2019
Self Multi-Head Attention for Speaker Recognition
Miquel India
Pooyan Safari
Javier Hernando
19
110
0
24 Jun 2019
Adversarial Multimodal Network for Movie Question Answering
Zhaoquan Yuan
Siyuan Sun
Lixin Duan
Xiao Wu
Changsheng Xu
24
3
0
24 Jun 2019
Classification and Clustering of Arguments with Contextualized Word Embeddings
Nils Reimers
Benjamin Schiller
Tilman Beck
Johannes Daxenberger
Christian Stab
Iryna Gurevych
16
165
0
24 Jun 2019
Embedding Projection for Targeted Cross-Lingual Sentiment: Model Comparisons and a Real-World Study
Jeremy Barnes
Roman Klinger
27
15
0
24 Jun 2019
EQuANt (Enhanced Question Answer Network)
Franccois-Xavier Aubet
D. Danks
Yuchen Zhu
21
3
0
24 Jun 2019
Evaluating the Supervised and Zero-shot Performance of Multi-lingual Translation Models
Chris Hokamp
John Glover
D. Ghalandari
21
14
0
24 Jun 2019
Sequence Generation: From Both Sides to the Middle
Long Zhou
Jiajun Zhang
Chengqing Zong
Heng Yu
36
22
0
23 Jun 2019
Learning Belief Representations for Imitation Learning in POMDPs
Tanmay Gangwani
Joel Lehman
Qiang Liu
Jian Peng
24
36
0
22 Jun 2019
Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation
Chenze Shao
Yang Feng
Jinchao Zhang
Fandong Meng
Xilin Chen
Jie Zhou
24
42
0
22 Jun 2019
Graph Star Net for Generalized Multi-Task Learning
H. Lu
Seth H. Huang
Tian Ye
Xiuyan Guo
GNN
33
46
0
21 Jun 2019
Improving Zero-shot Translation with Language-Independent Constraints
Ngoc-Quan Pham
Jan Niehues
Thanh-Le Ha
A. Waibel
31
60
0
20 Jun 2019
Evaluating Protein Transfer Learning with TAPE
Roshan Rao
Nicholas Bhattacharya
Neil Thomas
Yan Duan
Xi Chen
John F. Canny
Pieter Abbeel
Yun S. Song
SSL
56
782
0
19 Jun 2019
Previous
1
2
3
...
346
347
348
...
358
359
360
Next