Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.01787
Cited By
Learning Deep Transformer Models for Machine Translation
5 June 2019
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning Deep Transformer Models for Machine Translation"
50 / 344 papers shown
Title
AuthNet: Neural Network with Integrated Authentication Logic
Yuling Cai
Fan Xiang
Guozhu Meng
Yinzhi Cao
Kai Chen
AAML
103
0
0
24 May 2024
PILA: A Historical-Linguistic Dataset of Proto-Italic and Latin
Stephen Lawrence Bothwell
Brian DuSell
David Chiang
Brian Krostenko
63
1
0
25 Apr 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
74
4
0
22 Apr 2024
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Dengchun Li
Yingzi Ma
Naizheng Wang
Zhengmao Ye
Zhiyuan Cheng
...
Yan Zhang
Lei Duan
Jie Zuo
Cal Yang
Mingjie Tang
MoE
128
59
0
22 Apr 2024
Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization
Chanyeon Kim
Jongwoon Park
Hyun-sool Bae
Woo Chang Kim
91
3
0
03 Apr 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
92
5
0
29 Mar 2024
COVID-CT-H-UNet: a novel COVID-19 CT segmentation network based on attention mechanism and Bi-category Hybrid loss
Anay Panja
Somenath Kuiry
Alaka Das
M. Nasipuri
N. Das
42
1
0
16 Mar 2024
Read between the lines -- Functionality Extraction From READMEs
Praveen Venkateswaran
Srikanth G. Tamilselvam
Dinesh Garg
27
0
0
15 Mar 2024
Spatiotemporal Pooling on Appropriate Topological Maps Represented as Two-Dimensional Images for EEG Classification
Takuto Fukushima
Ryusuke Miyamoto
51
1
0
07 Mar 2024
Mastering Memory Tasks with World Models
Mohammad Reza Samsami
Artem Zholus
Janarthanan Rajendran
Sarath Chandar
CLL
OffRL
102
28
0
07 Mar 2024
AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks
Jiacen Xu
Jack W. Stokes
Geoff McDonald
Xuesong Bai
David Marshall
Siyue Wang
Adith Swaminathan
Zhou Li
106
59
0
02 Mar 2024
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
126
57
0
26 Feb 2024
Bridging Associative Memory and Probabilistic Modeling
Rylan Schaeffer
Nika Zahedi
Mikail Khona
Dhruv Pai
Sang T. Truong
...
Sarthak Chandra
Andres Carranza
Ila Rani Fiete
Andrey Gromov
Oluwasanmi Koyejo
DiffM
115
4
0
15 Feb 2024
OrderBkd: Textual backdoor attack through repositioning
Irina Alekseevskaia
Konstantin Arkhipenko
75
3
0
12 Feb 2024
NLP for Knowledge Discovery and Information Extraction from Energetics Corpora
Francis G. VanGessel
Efrem Perry
Salil Mohan
Oliver M. Barham
Mark Cavolowsky
111
0
0
10 Feb 2024
Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning
Ningyuan Tang
Minghao Fu
Ke Zhu
Jianxin Wu
104
10
0
06 Feb 2024
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Bin Lin
Zhenyu Tang
Yang Ye
Jiaxi Cui
Bin Zhu
...
Jinfa Huang
Junwu Zhang
Yatian Pang
Munan Ning
Li-ming Yuan
VLM
MLLM
MoE
144
180
0
29 Jan 2024
Separable Physics-Informed Neural Networks for the solution of elasticity problems
V. A. Es'kin
Danil V. Davydov
Julia V. Guréva
Alexey O. Malkhanov
Mikhail E. Smorkalov
PINN
AI4CE
81
3
0
24 Jan 2024
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo
Chenjie Cao
Xinlin Ren
Yanwei Fu
92
29
0
22 Jan 2024
Reconstructing the Invisible: Video Frame Restoration through Siamese Masked Conditional Variational Autoencoder
Yongchen Zhou
Richard Jiang
44
0
0
18 Jan 2024
Code Simulation Challenges for Large Language Models
Emanuele La Malfa
Christoph Weinhuber
Orazio Torre
Fangru Lin
Samuele Marro
Anthony Cohn
Nigel Shadbolt
Michael Wooldridge
LLMAG
LRM
67
8
0
17 Jan 2024
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
Nanye Ma
Mark Goldstein
M. S. Albergo
Nicholas M. Boffi
Eric Vanden-Eijnden
Saining Xie
DiffM
150
214
0
16 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
88
6
0
09 Jan 2024
An Empirical Study of Scaling Law for OCR
Miao Rang
Zhenni Bi
Chuanjian Liu
Yunhe Wang
Kai Han
93
6
0
29 Dec 2023
Early and Accurate Detection of Tomato Leaf Diseases Using TomFormer
Asim Khan
Umair Nawaz
K. Lochan
Lakmal D. Seneviratne
Irfan Hussain
MedIm
44
5
0
26 Dec 2023
Transformer-Based Multi-Object Smoothing with Decoupled Data Association and Smoothing
Juliano Pinto
Georg Hess
Yuxuan Xia
H. Wymeersch
Lennart Svensson
VOT
64
4
0
22 Dec 2023
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Zhaoyang Zhang
Wenqi Shao
Yixiao Ge
Xiaogang Wang
Liang Feng
Ping Luo
58
3
0
20 Dec 2023
Why "classic" Transformers are shallow and how to make them go deep
Yueyao Yu
Yin Zhang
ViT
104
0
0
11 Dec 2023
Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and Baselines
Stephen Lawrence Bothwell
Justin DeBenedetto
Theresa Crnkovich
Hildegund Müller
David Chiang
ObjD
79
2
0
30 Nov 2023
INarIG: Iterative Non-autoregressive Instruct Generation Model For Word-Level Auto Completion
Hengchao Shang
Zongyao Li
Daimeng Wei
Jiaxin Guo
Minghan Wang
Xiaoyu Chen
Lizhi Lei
Hao Yang
101
0
0
30 Nov 2023
Bitformer: An efficient Transformer with bitwise operation-based attention for Big Data Analytics at low-cost low-precision devices
Gaoxiang Duan
Junkai Zhang
Xiaoying Zheng
Yongxin Zhu
61
2
0
22 Nov 2023
Trustworthy Large Models in Vision: A Survey
Ziyan Guo
Li Xu
Jun Liu
MU
126
0
0
16 Nov 2023
Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment
Jakir Hasan
Shrestha Datta
Ameya Debnath
21
0
0
07 Nov 2023
High-resolution power equipment recognition based on improved self-attention
Siyi Zhang
Cheng Liu
Xiang Li
Xin Zhai
Zhen Wei
Sizhe Li
Xun Ma
21
0
0
06 Nov 2023
Ultra-Long Sequence Distributed Transformer
Xiao Wang
Isaac Lyngaas
A. Tsaris
Peng Chen
Sajal Dash
Mayanka Chandra Shekar
Tao Luo
Hong-Jun Yoon
Mohamed Wahib
John P. Gounley
124
4
0
04 Nov 2023
What Formal Languages Can Transformers Express? A Survey
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
113
60
0
01 Nov 2023
PartialFormer: Modeling Part Instead of Whole for Machine Translation
Tong Zheng
Bei Li
Huiwen Bao
Jiale Wang
Weiqiao Shan
Tong Xiao
Jingbo Zhu
MoE
AI4CE
47
0
0
23 Oct 2023
Sequence Length Independent Norm-Based Generalization Bounds for Transformers
Jacob Trauger
Ambuj Tewari
89
12
0
19 Oct 2023
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann
Simon Schrodi
Jelena Bratulić
Nadine Behrmann
Volker Fischer
Thomas Brox
116
8
0
19 Oct 2023
Who Are All The Stochastic Parrots Imitating? They Should Tell Us!
Sagi Shaier
Lawrence E Hunter
Katharina von der Wense
75
4
0
16 Oct 2023
Large-Scale OD Matrix Estimation with A Deep Learning Method
Zheli Xiong
Defu Lian
Enhong Chen
Gang Chen
Xiaomin Cheng
40
0
0
09 Oct 2023
A Simple and Robust Framework for Cross-Modality Medical Image Segmentation applied to Vision Transformers
Matteo Bastico
David Ryckelynck
Laurent Corté
Yannick Tillier
Etienne Decencière
MedIm
ViT
66
2
0
09 Oct 2023
Controllable Multi-document Summarization: Coverage & Coherence Intuitive Policy with Large Language Model Based Rewards
Litton J. Kurisinkel
Nancy F. Chen
78
1
0
05 Oct 2023
LLM Based Multi-Document Summarization Exploiting Main-Event Biased Monotone Submodular Content Extraction
Litton J. Kurisinkel
Nancy F. Chen
77
6
0
05 Oct 2023
Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns
Brian DuSell
David Chiang
113
12
0
03 Oct 2023
Scaling Experiments in Self-Supervised Cross-Table Representation Learning
Maximilian Schambach
Dominique Paul
Wei Le
LMTD
55
2
0
29 Sep 2023
Sleep Stage Classification Using a Pre-trained Deep Learning Model
Hassan Ardeshir
Mohammad Araghi
63
1
0
12 Sep 2023
Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds
Marcel Hirt
Domenico Campolo
Victoria Leong
Juan-Pablo Ortega
DRL
89
0
0
01 Sep 2023
Homological Convolutional Neural Networks
Antonio Briola
Yuanrong Wang
Silvia Bartolucci
T. Aste
LMTD
80
7
0
26 Aug 2023
Implicit Self-supervised Language Representation for Spoken Language Diarization
Student Member Ieee Jagabandhu Mishra
S. M. I. S. R. Mahadeva Prasanna
66
0
0
21 Aug 2023
Previous
1
2
3
4
5
6
7
Next