Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.04444
Cited By
Character-Level Language Modeling with Deeper Self-Attention
9 August 2018
Rami Al-Rfou
Dokook Choe
Noah Constant
Mandy Guo
Llion Jones
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Character-Level Language Modeling with Deeper Self-Attention"
50 / 77 papers shown
Title
FreeMesh: Boosting Mesh Generation with Coordinates Merging
Jian Liu
Haohan Weng
Biwen Lei
Xianghui Yang
Zibo Zhao
Zhuo Chen
Song Guo
Tao Han
Chunchao Guo
20
0
0
19 May 2025
Self-Vocabularizing Training for Neural Machine Translation
Pin-Jie Lin
Ernie Chang
Yangyang Shi
Vikas Chandra
68
0
0
18 Mar 2025
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
Jinyang Li
En Yu
Sijia Chen
Wenbing Tao
75
1
0
13 Mar 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
41
2
0
02 Mar 2025
MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies
Ehsaneddin Asgari
Yassine El Kheir
Mohammad Ali Sadraei Javaheri
74
0
0
02 Feb 2025
Adaptive Large Language Models By Layerwise Attention Shortcuts
Prateek Verma
Mert Pilanci
KELM
OffRL
58
0
0
17 Sep 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
52
8
0
29 Feb 2024
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
34
14
0
31 Jul 2023
StageInteractor: Query-based Object Detector with Cross-stage Interaction
Yao Teng
Haisong Liu
Sheng Guo
Limin Wang
ObjD
34
8
0
11 Apr 2023
Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models
Xinyang Liu
Dongsheng Wang
Bowei Fang
Miaoge Li
Zhibin Duan
Yishi Xu
Bo Chen
Mingyuan Zhou
VLM
VPVLM
29
5
0
16 Mar 2023
An Overview on Language Models: Recent Developments and Outlook
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
33
42
0
10 Mar 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
26
15
0
19 Jan 2023
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
118
2,319
0
09 Nov 2022
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
33
183
0
21 Sep 2022
Batch Layer Normalization, A new normalization layer for CNNs and RNN
A. Ziaee
Erion cCano
19
13
0
19 Sep 2022
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
34
9
0
01 Aug 2022
Interaction Transformer for Human Reaction Generation
Baptiste Chopin
Hao Tang
N. Otberdout
Mohamed Daoudi
N. Sebe
ViT
38
27
0
04 Jul 2022
Deep Transformer Q-Networks for Partially Observable Reinforcement Learning
Kevin Esslinger
Robert W. Platt
Chris Amato
OffRL
35
35
0
02 Jun 2022
Training Language Models with Memory Augmentation
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
247
128
0
25 May 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
44
150
0
27 Apr 2022
Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation
Raman Goel
Seba Susan
Sachin Vashisht
Armaan Dhanda
25
9
0
24 Apr 2022
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Thomas Wang
Adam Roberts
Daniel Hesslow
Teven Le Scao
Hyung Won Chung
Iz Beltagy
Julien Launay
Colin Raffel
48
168
0
12 Apr 2022
TR-MOT: Multi-Object Tracking by Reference
Mingfei Chen
Yue Liao
Si Liu
Fei Wang
Lei Li
VOT
57
9
0
30 Mar 2022
Parallel Instance Query Network for Named Entity Recognition
Yongliang Shen
Xiaobin Wang
Zeqi Tan
Guangwei Xu
Pengjun Xie
Fei Huang
Weiming Lu
Yueting Zhuang
24
57
0
20 Mar 2022
PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs
Zehao Dong
Muhan Zhang
Fuhai Li
Yixin Chen
CML
GNN
33
17
0
19 Mar 2022
Signal in Noise: Exploring Meaning Encoded in Random Character Sequences with Character-Aware Language Models
Mark Chu
Bhargav Srinivasa Desikan
E. Nadler
Ruggerio L. Sardo
Elise Darragh-Ford
Douglas Guilbeault
20
0
0
15 Mar 2022
RelTR: Relation Transformer for Scene Graph Generation
Yuren Cong
M. Yang
Bodo Rosenhahn
ViT
100
136
0
27 Jan 2022
Visual Keyword Spotting with Attention
Prajwal K R
Liliane Momeni
Triantafyllos Afouras
Andrew Zisserman
16
13
0
29 Oct 2021
GNN-LM: Language Modeling based on Global Contexts via GNN
Yuxian Meng
Shi Zong
Xiaoya Li
Xiaofei Sun
Tianwei Zhang
Fei Wu
Jiwei Li
LRM
29
37
0
17 Oct 2021
Efficient Nearest Neighbor Language Models
Junxian He
Graham Neubig
Taylor Berg-Kirkpatrick
RALM
195
103
0
09 Sep 2021
Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard
Sahng-Min Yoo
Ue-Hwan Kim
Yewon Hwang
Jong-Hwan Kim
OffRL
17
1
0
20 Aug 2021
Structured Denoising Diffusion Models in Discrete State-Spaces
Jacob Austin
Daniel D. Johnson
Jonathan Ho
Daniel Tarlow
Rianne van den Berg
DiffM
44
852
0
07 Jul 2021
Evaluating Various Tokenizers for Arabic Text Classification
Zaid Alyafeai
Maged S. Al-Shaibani
Mustafa Ghaleb
Irfan Ahmad
37
41
0
14 Jun 2021
Dynamic Language Models for Continuously Evolving Content
Spurthi Amba Hombaiah
Tao Chen
Mingyang Zhang
Michael Bendersky
Marc Najork
CLL
KELM
40
37
0
11 Jun 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
33
57
0
11 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,088
0
08 Jun 2021
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
42
226
0
31 May 2021
Security Vulnerability Detection Using Deep Learning Natural Language Processing
Noah Ziems
Shaoen Wu
19
55
0
06 May 2021
On the limit of English conversational speech recognition
Zoltán Tüske
G. Saon
Brian Kingsbury
22
50
0
03 May 2021
Writing in The Air: Unconstrained Text Recognition from Finger Movement Using Spatio-Temporal Convolution
Ue-Hwan Kim
Yewon Hwang
Sun-Kyung Lee
Jong-Hwan Kim
33
19
0
19 Apr 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
60
94
0
23 Mar 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
J. Clark
Dan Garrette
Iulia Turc
John Wieting
36
210
0
11 Mar 2021
OperA: Attention-Regularized Transformers for Surgical Phase Recognition
Tobias Czempiel
Magdalini Paschali
D. Ostler
S. T. Kim
Benjamin Busam
Nassir Navab
MedIm
42
86
0
05 Mar 2021
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
46
225
0
22 Feb 2021
UnibucKernel: Geolocating Swiss German Jodels Using Ensemble Learning
Mihaela Găman
Sebastian Cojocariu
Radu Tudor Ionescu
22
4
0
18 Feb 2021
Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions
Emiel Hoogeboom
Didrik Nielsen
P. Jaini
Patrick Forré
Max Welling
DiffM
222
396
0
10 Feb 2021
PopMAG: Pop Music Accompaniment Generation
Yi Ren
Jinzheng He
Xu Tan
Tao Qin
Zhou Zhao
Tie-Yan Liu
33
115
0
18 Aug 2020
Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition
Wenyong Huang
Wenchao Hu
Y. Yeung
Xiao Chen
25
50
0
13 Aug 2020
Learning Sparse Prototypes for Text Generation
Junxian He
Taylor Berg-Kirkpatrick
Graham Neubig
27
23
0
29 Jun 2020
Recurrent Quantum Neural Networks
Johannes Bausch
21
152
0
25 Jun 2020
1
2
Next