ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.04444
  4. Cited By
Character-Level Language Modeling with Deeper Self-Attention

Character-Level Language Modeling with Deeper Self-Attention

9 August 2018
Rami Al-Rfou
Dokook Choe
Noah Constant
Mandy Guo
Llion Jones
ArXivPDFHTML

Papers citing "Character-Level Language Modeling with Deeper Self-Attention"

50 / 77 papers shown
Title
FreeMesh: Boosting Mesh Generation with Coordinates Merging
FreeMesh: Boosting Mesh Generation with Coordinates Merging
Jian Liu
Haohan Weng
Biwen Lei
Xianghui Yang
Zibo Zhao
Zhuo Chen
Song Guo
Tao Han
Chunchao Guo
20
0
0
19 May 2025
Self-Vocabularizing Training for Neural Machine Translation
Self-Vocabularizing Training for Neural Machine Translation
Pin-Jie Lin
Ernie Chang
Yangyang Shi
Vikas Chandra
71
0
0
18 Mar 2025
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
Jinyang Li
En Yu
Sijia Chen
Wenbing Tao
78
1
0
13 Mar 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
41
2
0
02 Mar 2025
MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies
MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies
Ehsaneddin Asgari
Yassine El Kheir
Mohammad Ali Sadraei Javaheri
74
0
0
02 Feb 2025
Adaptive Large Language Models By Layerwise Attention Shortcuts
Adaptive Large Language Models By Layerwise Attention Shortcuts
Prateek Verma
Mert Pilanci
KELM
OffRL
58
0
0
17 Sep 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
52
8
0
29 Feb 2024
Generative Models as a Complex Systems Science: How can we make sense of
  large language model behavior?
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
34
14
0
31 Jul 2023
StageInteractor: Query-based Object Detector with Cross-stage
  Interaction
StageInteractor: Query-based Object Detector with Cross-stage Interaction
Yao Teng
Haisong Liu
Sheng Guo
Limin Wang
ObjD
34
8
0
11 Apr 2023
Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models
Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models
Xinyang Liu
Dongsheng Wang
Bowei Fang
Miaoge Li
Zhibin Duan
Yishi Xu
Bo Chen
Mingyuan Zhou
VLM
VPVLM
29
5
0
16 Mar 2023
An Overview on Language Models: Recent Developments and Outlook
An Overview on Language Models: Recent Developments and Outlook
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
33
42
0
10 Mar 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
26
15
0
19 Jan 2023
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
118
2,319
0
09 Nov 2022
Mega: Moving Average Equipped Gated Attention
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
38
183
0
21 Sep 2022
Batch Layer Normalization, A new normalization layer for CNNs and RNN
Batch Layer Normalization, A new normalization layer for CNNs and RNN
A. Ziaee
Erion cCano
19
13
0
19 Sep 2022
Momentum Transformer: Closing the Performance Gap Between Self-attention
  and Its Linearization
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
34
9
0
01 Aug 2022
Interaction Transformer for Human Reaction Generation
Interaction Transformer for Human Reaction Generation
Baptiste Chopin
Hao Tang
N. Otberdout
Mohamed Daoudi
N. Sebe
ViT
38
27
0
04 Jul 2022
Deep Transformer Q-Networks for Partially Observable Reinforcement
  Learning
Deep Transformer Q-Networks for Partially Observable Reinforcement Learning
Kevin Esslinger
Robert W. Platt
Chris Amato
OffRL
35
35
0
02 Jun 2022
Training Language Models with Memory Augmentation
Training Language Models with Memory Augmentation
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
249
128
0
25 May 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
44
150
0
27 Apr 2022
Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation
Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation
Raman Goel
Seba Susan
Sachin Vashisht
Armaan Dhanda
25
9
0
24 Apr 2022
What Language Model Architecture and Pretraining Objective Work Best for
  Zero-Shot Generalization?
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Thomas Wang
Adam Roberts
Daniel Hesslow
Teven Le Scao
Hyung Won Chung
Iz Beltagy
Julien Launay
Colin Raffel
48
168
0
12 Apr 2022
TR-MOT: Multi-Object Tracking by Reference
TR-MOT: Multi-Object Tracking by Reference
Mingfei Chen
Yue Liao
Si Liu
Fei-Yue Wang
Lei Li
VOT
57
9
0
30 Mar 2022
Parallel Instance Query Network for Named Entity Recognition
Parallel Instance Query Network for Named Entity Recognition
Yongliang Shen
Xiaobin Wang
Zeqi Tan
Guangwei Xu
Pengjun Xie
Fei Huang
Weiming Lu
Yueting Zhuang
24
57
0
20 Mar 2022
PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs
PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs
Zehao Dong
Muhan Zhang
Fuhai Li
Yixin Chen
CML
GNN
33
17
0
19 Mar 2022
Signal in Noise: Exploring Meaning Encoded in Random Character Sequences
  with Character-Aware Language Models
Signal in Noise: Exploring Meaning Encoded in Random Character Sequences with Character-Aware Language Models
Mark Chu
Bhargav Srinivasa Desikan
E. Nadler
Ruggerio L. Sardo
Elise Darragh-Ford
Douglas Guilbeault
20
0
0
15 Mar 2022
RelTR: Relation Transformer for Scene Graph Generation
RelTR: Relation Transformer for Scene Graph Generation
Yuren Cong
M. Yang
Bodo Rosenhahn
ViT
100
136
0
27 Jan 2022
Visual Keyword Spotting with Attention
Visual Keyword Spotting with Attention
Prajwal K R
Liliane Momeni
Triantafyllos Afouras
Andrew Zisserman
16
13
0
29 Oct 2021
GNN-LM: Language Modeling based on Global Contexts via GNN
GNN-LM: Language Modeling based on Global Contexts via GNN
Yuxian Meng
Shi Zong
Xiaoya Li
Xiaofei Sun
Tianwei Zhang
Fei Wu
Jiwei Li
LRM
29
37
0
17 Oct 2021
Efficient Nearest Neighbor Language Models
Efficient Nearest Neighbor Language Models
Junxian He
Graham Neubig
Taylor Berg-Kirkpatrick
RALM
195
103
0
09 Sep 2021
Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard
Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard
Sahng-Min Yoo
Ue-Hwan Kim
Yewon Hwang
Jong-Hwan Kim
OffRL
19
1
0
20 Aug 2021
Structured Denoising Diffusion Models in Discrete State-Spaces
Structured Denoising Diffusion Models in Discrete State-Spaces
Jacob Austin
Daniel D. Johnson
Jonathan Ho
Daniel Tarlow
Rianne van den Berg
DiffM
44
852
0
07 Jul 2021
Evaluating Various Tokenizers for Arabic Text Classification
Evaluating Various Tokenizers for Arabic Text Classification
Zaid Alyafeai
Maged S. Al-Shaibani
Mustafa Ghaleb
Irfan Ahmad
37
41
0
14 Jun 2021
Dynamic Language Models for Continuously Evolving Content
Dynamic Language Models for Continuously Evolving Content
Spurthi Amba Hombaiah
Tao Chen
Mingyang Zhang
Michael Bendersky
Marc Najork
CLL
KELM
40
37
0
11 Jun 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
33
57
0
11 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,088
0
08 Jun 2021
Choose a Transformer: Fourier or Galerkin
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
42
226
0
31 May 2021
Security Vulnerability Detection Using Deep Learning Natural Language
  Processing
Security Vulnerability Detection Using Deep Learning Natural Language Processing
Noah Ziems
Shaoen Wu
19
55
0
06 May 2021
On the limit of English conversational speech recognition
On the limit of English conversational speech recognition
Zoltán Tüske
G. Saon
Brian Kingsbury
22
50
0
03 May 2021
Writing in The Air: Unconstrained Text Recognition from Finger Movement
  Using Spatio-Temporal Convolution
Writing in The Air: Unconstrained Text Recognition from Finger Movement Using Spatio-Temporal Convolution
Ue-Hwan Kim
Yewon Hwang
Sun-Kyung Lee
Jong-Hwan Kim
33
19
0
19 Apr 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning
  Architectures
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
60
94
0
23 Mar 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
  Representation
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
J. Clark
Dan Garrette
Iulia Turc
John Wieting
36
210
0
11 Mar 2021
OperA: Attention-Regularized Transformers for Surgical Phase Recognition
OperA: Attention-Regularized Transformers for Surgical Phase Recognition
Tobias Czempiel
Magdalini Paschali
D. Ostler
S. T. Kim
Benjamin Busam
Nassir Navab
MedIm
42
86
0
05 Mar 2021
Linear Transformers Are Secretly Fast Weight Programmers
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
46
225
0
22 Feb 2021
UnibucKernel: Geolocating Swiss German Jodels Using Ensemble Learning
UnibucKernel: Geolocating Swiss German Jodels Using Ensemble Learning
Mihaela Găman
Sebastian Cojocariu
Radu Tudor Ionescu
29
4
0
18 Feb 2021
Argmax Flows and Multinomial Diffusion: Learning Categorical
  Distributions
Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions
Emiel Hoogeboom
Didrik Nielsen
P. Jaini
Patrick Forré
Max Welling
DiffM
222
396
0
10 Feb 2021
PopMAG: Pop Music Accompaniment Generation
PopMAG: Pop Music Accompaniment Generation
Yi Ren
Jinzheng He
Xu Tan
Tao Qin
Zhou Zhao
Tie-Yan Liu
36
115
0
18 Aug 2020
Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable
  End-to-End Speech Recognition
Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition
Wenyong Huang
Wenchao Hu
Y. Yeung
Xiao Chen
25
50
0
13 Aug 2020
Learning Sparse Prototypes for Text Generation
Learning Sparse Prototypes for Text Generation
Junxian He
Taylor Berg-Kirkpatrick
Graham Neubig
27
23
0
29 Jun 2020
Recurrent Quantum Neural Networks
Recurrent Quantum Neural Networks
Johannes Bausch
21
152
0
25 Jun 2020
12
Next