ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.01787
  4. Cited By
Learning Deep Transformer Models for Machine Translation

Learning Deep Transformer Models for Machine Translation

5 June 2019
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
ArXiv (abs)PDFHTML

Papers citing "Learning Deep Transformer Models for Machine Translation"

50 / 344 papers shown
Title
AuthNet: Neural Network with Integrated Authentication Logic
AuthNet: Neural Network with Integrated Authentication Logic
Yuling Cai
Fan Xiang
Guozhu Meng
Yinzhi Cao
Kai Chen
AAML
103
0
0
24 May 2024
PILA: A Historical-Linguistic Dataset of Proto-Italic and Latin
PILA: A Historical-Linguistic Dataset of Proto-Italic and Latin
Stephen Lawrence Bothwell
Brian DuSell
David Chiang
Brian Krostenko
63
1
0
25 Apr 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
74
4
0
22 Apr 2024
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based
  Mixture of Experts
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Dengchun Li
Yingzi Ma
Naizheng Wang
Zhengmao Ye
Zhiyuan Cheng
...
Yan Zhang
Lei Duan
Jie Zuo
Cal Yang
Mingjie Tang
MoE
128
59
0
22 Apr 2024
Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization
Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization
Chanyeon Kim
Jongwoon Park
Hyun-sool Bae
Woo Chang Kim
91
3
0
03 Apr 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
92
5
0
29 Mar 2024
COVID-CT-H-UNet: a novel COVID-19 CT segmentation network based on
  attention mechanism and Bi-category Hybrid loss
COVID-CT-H-UNet: a novel COVID-19 CT segmentation network based on attention mechanism and Bi-category Hybrid loss
Anay Panja
Somenath Kuiry
Alaka Das
M. Nasipuri
N. Das
42
1
0
16 Mar 2024
Read between the lines -- Functionality Extraction From READMEs
Read between the lines -- Functionality Extraction From READMEs
Praveen Venkateswaran
Srikanth G. Tamilselvam
Dinesh Garg
27
0
0
15 Mar 2024
Spatiotemporal Pooling on Appropriate Topological Maps Represented as
  Two-Dimensional Images for EEG Classification
Spatiotemporal Pooling on Appropriate Topological Maps Represented as Two-Dimensional Images for EEG Classification
Takuto Fukushima
Ryusuke Miyamoto
51
1
0
07 Mar 2024
Mastering Memory Tasks with World Models
Mastering Memory Tasks with World Models
Mohammad Reza Samsami
Artem Zholus
Janarthanan Rajendran
Sarath Chandar
CLLOffRL
102
28
0
07 Mar 2024
AutoAttacker: A Large Language Model Guided System to Implement
  Automatic Cyber-attacks
AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks
Jiacen Xu
Jack W. Stokes
Geoff McDonald
Xuesong Bai
David Marshall
Siyue Wang
Adith Swaminathan
Zhou Li
106
59
0
02 Mar 2024
Why Transformers Need Adam: A Hessian Perspective
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
126
57
0
26 Feb 2024
Bridging Associative Memory and Probabilistic Modeling
Bridging Associative Memory and Probabilistic Modeling
Rylan Schaeffer
Nika Zahedi
Mikail Khona
Dhruv Pai
Sang T. Truong
...
Sarthak Chandra
Andres Carranza
Ila Rani Fiete
Andrey Gromov
Oluwasanmi Koyejo
DiffM
115
4
0
15 Feb 2024
OrderBkd: Textual backdoor attack through repositioning
OrderBkd: Textual backdoor attack through repositioning
Irina Alekseevskaia
Konstantin Arkhipenko
75
3
0
12 Feb 2024
NLP for Knowledge Discovery and Information Extraction from Energetics
  Corpora
NLP for Knowledge Discovery and Information Extraction from Energetics Corpora
Francis G. VanGessel
Efrem Perry
Salil Mohan
Oliver M. Barham
Mark Cavolowsky
111
0
0
10 Feb 2024
Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning
Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning
Ningyuan Tang
Minghao Fu
Ke Zhu
Jianxin Wu
104
10
0
06 Feb 2024
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Bin Lin
Zhenyu Tang
Yang Ye
Jiaxi Cui
Bin Zhu
...
Jinfa Huang
Junwu Zhang
Yatian Pang
Munan Ning
Li-ming Yuan
VLMMLLMMoE
144
180
0
29 Jan 2024
Separable Physics-Informed Neural Networks for the solution of
  elasticity problems
Separable Physics-Informed Neural Networks for the solution of elasticity problems
V. A. Es'kin
Danil V. Davydov
Julia V. Guréva
Alexey O. Malkhanov
Mikhail E. Smorkalov
PINNAI4CE
81
3
0
24 Jan 2024
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View
  Stereo
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo
Chenjie Cao
Xinlin Ren
Yanwei Fu
92
29
0
22 Jan 2024
Reconstructing the Invisible: Video Frame Restoration through Siamese
  Masked Conditional Variational Autoencoder
Reconstructing the Invisible: Video Frame Restoration through Siamese Masked Conditional Variational Autoencoder
Yongchen Zhou
Richard Jiang
44
0
0
18 Jan 2024
Code Simulation Challenges for Large Language Models
Code Simulation Challenges for Large Language Models
Emanuele La Malfa
Christoph Weinhuber
Orazio Torre
Fangru Lin
Samuele Marro
Anthony Cohn
Nigel Shadbolt
Michael Wooldridge
LLMAGLRM
67
8
0
17 Jan 2024
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable
  Interpolant Transformers
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
Nanye Ma
Mark Goldstein
M. S. Albergo
Nicholas M. Boffi
Eric Vanden-Eijnden
Saining Xie
DiffM
150
214
0
16 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
88
6
0
09 Jan 2024
An Empirical Study of Scaling Law for OCR
An Empirical Study of Scaling Law for OCR
Miao Rang
Zhenni Bi
Chuanjian Liu
Yunhe Wang
Kai Han
93
6
0
29 Dec 2023
Early and Accurate Detection of Tomato Leaf Diseases Using TomFormer
Early and Accurate Detection of Tomato Leaf Diseases Using TomFormer
Asim Khan
Umair Nawaz
K. Lochan
Lakmal D. Seneviratne
Irfan Hussain
MedIm
44
5
0
26 Dec 2023
Transformer-Based Multi-Object Smoothing with Decoupled Data Association
  and Smoothing
Transformer-Based Multi-Object Smoothing with Decoupled Data Association and Smoothing
Juliano Pinto
Georg Hess
Yuxuan Xia
H. Wymeersch
Lennart Svensson
VOT
64
4
0
22 Dec 2023
Cached Transformers: Improving Transformers with Differentiable Memory
  Cache
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Zhaoyang Zhang
Wenqi Shao
Yixiao Ge
Xiaogang Wang
Liang Feng
Ping Luo
58
3
0
20 Dec 2023
Why "classic" Transformers are shallow and how to make them go deep
Why "classic" Transformers are shallow and how to make them go deep
Yueyao Yu
Yin Zhang
ViT
104
0
0
11 Dec 2023
Introducing Rhetorical Parallelism Detection: A New Task with Datasets,
  Metrics, and Baselines
Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and Baselines
Stephen Lawrence Bothwell
Justin DeBenedetto
Theresa Crnkovich
Hildegund Müller
David Chiang
ObjD
79
2
0
30 Nov 2023
INarIG: Iterative Non-autoregressive Instruct Generation Model For
  Word-Level Auto Completion
INarIG: Iterative Non-autoregressive Instruct Generation Model For Word-Level Auto Completion
Hengchao Shang
Zongyao Li
Daimeng Wei
Jiaxin Guo
Minghan Wang
Xiaoyu Chen
Lizhi Lei
Hao Yang
101
0
0
30 Nov 2023
Bitformer: An efficient Transformer with bitwise operation-based
  attention for Big Data Analytics at low-cost low-precision devices
Bitformer: An efficient Transformer with bitwise operation-based attention for Big Data Analytics at low-cost low-precision devices
Gaoxiang Duan
Junkai Zhang
Xiaoying Zheng
Yongxin Zhu
61
2
0
22 Nov 2023
Trustworthy Large Models in Vision: A Survey
Trustworthy Large Models in Vision: A Survey
Ziyan Guo
Li Xu
Jun Liu
MU
126
0
0
16 Nov 2023
Character-Level Bangla Text-to-IPA Transcription Using Transformer
  Architecture with Sequence Alignment
Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment
Jakir Hasan
Shrestha Datta
Ameya Debnath
21
0
0
07 Nov 2023
High-resolution power equipment recognition based on improved
  self-attention
High-resolution power equipment recognition based on improved self-attention
Siyi Zhang
Cheng Liu
Xiang Li
Xin Zhai
Zhen Wei
Sizhe Li
Xun Ma
21
0
0
06 Nov 2023
Ultra-Long Sequence Distributed Transformer
Ultra-Long Sequence Distributed Transformer
Xiao Wang
Isaac Lyngaas
A. Tsaris
Peng Chen
Sajal Dash
Mayanka Chandra Shekar
Tao Luo
Hong-Jun Yoon
Mohamed Wahib
John P. Gounley
124
4
0
04 Nov 2023
What Formal Languages Can Transformers Express? A Survey
What Formal Languages Can Transformers Express? A Survey
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
113
60
0
01 Nov 2023
PartialFormer: Modeling Part Instead of Whole for Machine Translation
PartialFormer: Modeling Part Instead of Whole for Machine Translation
Tong Zheng
Bei Li
Huiwen Bao
Jiale Wang
Weiqiao Shan
Tong Xiao
Jingbo Zhu
MoEAI4CE
47
0
0
23 Oct 2023
Sequence Length Independent Norm-Based Generalization Bounds for
  Transformers
Sequence Length Independent Norm-Based Generalization Bounds for Transformers
Jacob Trauger
Ambuj Tewari
89
12
0
19 Oct 2023
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced
  Optimization Problems
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann
Simon Schrodi
Jelena Bratulić
Nadine Behrmann
Volker Fischer
Thomas Brox
116
8
0
19 Oct 2023
Who Are All The Stochastic Parrots Imitating? They Should Tell Us!
Who Are All The Stochastic Parrots Imitating? They Should Tell Us!
Sagi Shaier
Lawrence E Hunter
Katharina von der Wense
75
4
0
16 Oct 2023
Large-Scale OD Matrix Estimation with A Deep Learning Method
Large-Scale OD Matrix Estimation with A Deep Learning Method
Zheli Xiong
Defu Lian
Enhong Chen
Gang Chen
Xiaomin Cheng
40
0
0
09 Oct 2023
A Simple and Robust Framework for Cross-Modality Medical Image
  Segmentation applied to Vision Transformers
A Simple and Robust Framework for Cross-Modality Medical Image Segmentation applied to Vision Transformers
Matteo Bastico
David Ryckelynck
Laurent Corté
Yannick Tillier
Etienne Decencière
MedImViT
66
2
0
09 Oct 2023
Controllable Multi-document Summarization: Coverage & Coherence
  Intuitive Policy with Large Language Model Based Rewards
Controllable Multi-document Summarization: Coverage & Coherence Intuitive Policy with Large Language Model Based Rewards
Litton J. Kurisinkel
Nancy F. Chen
78
1
0
05 Oct 2023
LLM Based Multi-Document Summarization Exploiting Main-Event Biased
  Monotone Submodular Content Extraction
LLM Based Multi-Document Summarization Exploiting Main-Event Biased Monotone Submodular Content Extraction
Litton J. Kurisinkel
Nancy F. Chen
77
6
0
05 Oct 2023
Stack Attention: Improving the Ability of Transformers to Model
  Hierarchical Patterns
Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns
Brian DuSell
David Chiang
113
12
0
03 Oct 2023
Scaling Experiments in Self-Supervised Cross-Table Representation
  Learning
Scaling Experiments in Self-Supervised Cross-Table Representation Learning
Maximilian Schambach
Dominique Paul
Wei Le
LMTD
55
2
0
29 Sep 2023
Sleep Stage Classification Using a Pre-trained Deep Learning Model
Sleep Stage Classification Using a Pre-trained Deep Learning Model
Hassan Ardeshir
Mohammad Araghi
63
1
0
12 Sep 2023
Learning multi-modal generative models with permutation-invariant
  encoders and tighter variational bounds
Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds
Marcel Hirt
Domenico Campolo
Victoria Leong
Juan-Pablo Ortega
DRL
89
0
0
01 Sep 2023
Homological Convolutional Neural Networks
Homological Convolutional Neural Networks
Antonio Briola
Yuanrong Wang
Silvia Bartolucci
T. Aste
LMTD
80
7
0
26 Aug 2023
Implicit Self-supervised Language Representation for Spoken Language
  Diarization
Implicit Self-supervised Language Representation for Spoken Language Diarization
Student Member Ieee Jagabandhu Mishra
S. M. I. S. R. Mahadeva Prasanna
66
0
0
21 Aug 2023
Previous
1234567
Next