ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.02410
  4. Cited By
Exploring the Limits of Language Modeling

Exploring the Limits of Language Modeling

7 February 2016
Rafal Jozefowicz
Oriol Vinyals
M. Schuster
Noam M. Shazeer
Yonghui Wu
ArXivPDFHTML

Papers citing "Exploring the Limits of Language Modeling"

50 / 167 papers shown
Title
Provably Secure Public-Key Steganography Based on Admissible Encoding
Provably Secure Public-Key Steganography Based on Admissible Encoding
Xinsong Zhang
Kejiang Chen
Na Zhao
Wenbo Zhang
N. Yu
29
0
0
28 Apr 2025
LZ Penalty: An information-theoretic repetition penalty for autoregressive language models
LZ Penalty: An information-theoretic repetition penalty for autoregressive language models
Antonio A. Ginart
Naveen Kodali
J. Lee
Caiming Xiong
Shri Kiran Srinivasan
John Emmons
29
0
0
28 Apr 2025
Deep Learning-based Intrusion Detection Systems: A Survey
Deep Learning-based Intrusion Detection Systems: A Survey
Zhiwei Xu
Yujuan Wu
Shiheng Wang
Jiabao Gao
Tian Qiu
Ziqi Wang
Hai Wan
Xibin Zhao
26
1
0
10 Apr 2025
Node Embeddings via Neighbor Embeddings
Node Embeddings via Neighbor Embeddings
Jan Niklas Böhm
Marius Keute
Alica Guzmán
Sebastian Damrich
Andrew Draganov
D. Kobak
GNN
62
0
0
31 Mar 2025
A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation
Shanhe You
Xuewen Luo
Xinhe Liang
Jiashu Yu
Chen Zheng
Jiangtao Gong
77
1
0
07 Mar 2025
Does Self-Attention Need Separate Weights in Transformers?
Md. Kowsher
Nusrat Jahan Prottasha
Chun-Nam Yu
O. Garibay
Niloofar Yousefi
244
0
0
30 Nov 2024
Multi-Objective Evolutionary Neural Architecture Search for Recurrent
  Neural Networks
Multi-Objective Evolutionary Neural Architecture Search for Recurrent Neural Networks
Reinhard Booysen
Anna Sergeevna Bosman
40
1
0
17 Mar 2024
Autocompletion of Chief Complaints in the Electronic Health Records
  using Large Language Models
Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models
K. M. S. Islam
A. S. Nipu
Praveen Madiraju
Priya Deshpande
LM&MA
37
7
0
11 Jan 2024
PIXAR: Auto-Regressive Language Modeling in Pixel Space
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Yintao Tai
Xiyang Liao
Alessandro Suglia
Antonio Vergari
MLLM
26
7
0
06 Jan 2024
Towards a Unified Framework of Contrastive Learning for Disentangled
  Representations
Towards a Unified Framework of Contrastive Learning for Disentangled Representations
Stefan Matthes
Zhiwei Han
Hao Shen
37
4
0
08 Nov 2023
Debiasing, calibrating, and improving Semi-supervised Learning
  performance via simple Ensemble Projector
Debiasing, calibrating, and improving Semi-supervised Learning performance via simple Ensemble Projector
Khanh-Binh Nguyen
27
2
0
24 Oct 2023
A Comprehensive Review of Generative AI in Healthcare
A Comprehensive Review of Generative AI in Healthcare
Yasin Shokrollahi
Sahar Yarmohammadtoosky
Matthew M. Nikahd
Pengfei Dong
Xianqi Li
Linxia Gu
MedIm
AI4CE
27
19
0
01 Oct 2023
Hierarchical Attention Encoder Decoder
Hierarchical Attention Encoder Decoder
Asier Mujika
BDL
27
3
0
01 Jun 2023
What is the best recipe for character-level encoder-only modelling?
What is the best recipe for character-level encoder-only modelling?
Kris Cao
42
2
0
09 May 2023
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
  Large Language Models
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Shan Zhong
Zhongzhan Huang
Wushao Wen
Jinghui Qin
Liang Lin
26
40
0
09 May 2023
A Comprehensive Survey on Knowledge Distillation of Diffusion Models
A Comprehensive Survey on Knowledge Distillation of Diffusion Models
Weijian Luo
DiffM
MedIm
52
33
0
09 Apr 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Accommodating Audio Modality in CLIP for Multimodal Processing
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
24
10
0
12 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
58
12,368
0
27 Feb 2023
Minimal Width for Universal Property of Deep RNN
Minimal Width for Universal Property of Deep RNN
Changhoon Song
Geonho Hwang
Jun ho Lee
Myung-joo Kang
25
9
0
25 Nov 2022
Word-Level Representation From Bytes For Language Modeling
Word-Level Representation From Bytes For Language Modeling
Chul Lee
Qipeng Guo
Xipeng Qiu
23
1
0
23 Nov 2022
Collateral facilitation in humans and language models
Collateral facilitation in humans and language models
J. Michaelov
Benjamin Bergen
22
11
0
09 Nov 2022
Are Deep Sequence Classifiers Good at Non-Trivial Generalization?
Are Deep Sequence Classifiers Good at Non-Trivial Generalization?
Francesco Cazzaro
A. Quattoni
X. Carreras
MQ
26
0
0
24 Oct 2022
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual
  Text-Video Retrieval
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Andrew Rouditchenko
Yung-Sung Chuang
Nina Shvetsova
Samuel Thomas
Rogerio Feris
Brian Kingsbury
Leonid Karlinsky
David Harwath
Hilde Kuehne
James R. Glass
VLM
34
4
0
07 Oct 2022
Improving Self-Supervised Learning by Characterizing Idealized
  Representations
Improving Self-Supervised Learning by Characterizing Idealized Representations
Yann Dubois
Tatsunori Hashimoto
Stefano Ermon
Percy Liang
SSL
83
40
0
13 Sep 2022
Contrastive Learning as Goal-Conditioned Reinforcement Learning
Contrastive Learning as Goal-Conditioned Reinforcement Learning
Benjamin Eysenbach
Tianjun Zhang
Ruslan Salakhutdinov
Sergey Levine
SSL
OffRL
37
140
0
15 Jun 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
46
3,360
0
29 Apr 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for
  Video-text Retrieval
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
18
44
0
26 Apr 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
  for Semantic and Generative Capabilities
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Hsiang-Sheng Tsai
Heng-Jui Chang
Wen-Chin Huang
Zili Huang
Kushal Lakhotia
...
Hsuan-Jui Chen
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
26
109
0
14 Mar 2022
Interpolation-based Contrastive Learning for Few-Label Semi-Supervised
  Learning
Interpolation-based Contrastive Learning for Few-Label Semi-Supervised Learning
Xihong Yang
Xiaochang Hu
Sihang Zhou
Xinwang Liu
En Zhu
SSL
184
43
0
24 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient
  Image Captioning
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
29
15
0
11 Feb 2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
  Large-Scale Generative Language Model
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
...
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
90
732
0
28 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Bridging Video-text Retrieval with Multiple Choice Questions
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
29
108
0
13 Jan 2022
Between words and characters: A Brief History of Open-Vocabulary
  Modeling and Tokenization in NLP
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
32
142
0
20 Dec 2021
Improving language models by retrieving from trillions of tokens
Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud
A. Mensch
Jordan Hoffmann
Trevor Cai
Eliza Rutherford
...
Simon Osindero
Karen Simonyan
Jack W. Rae
Erich Elsen
Laurent Sifre
KELM
RALM
90
1,024
0
08 Dec 2021
Leveraging Sequence Embedding and Convolutional Neural Network for
  Protein Function Prediction
Leveraging Sequence Embedding and Convolutional Neural Network for Protein Function Prediction
Wei-Cheng Tseng
Po-Han Chi
Jiahong Wu
Min Sun
24
0
0
01 Dec 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token
  Modeling
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wei Wang
Lijuan Wang
Zicheng Liu
VLM
53
218
0
24 Nov 2021
Say What? Collaborative Pop Lyric Generation Using Multitask Transfer
  Learning
Say What? Collaborative Pop Lyric Generation Using Multitask Transfer Learning
Naveen Ram
Tanay Gummadi
Rahul Bhethanabotla
Richard J. Savery
Gil Weinberg
20
9
0
15 Nov 2021
Self-Normalized Importance Sampling for Neural Language Modeling
Self-Normalized Importance Sampling for Neural Language Modeling
Zijian Yang
Yingbo Gao
Alexander Gerstenberger
Jintao Jiang
Ralf Schluter
Hermann Ney
21
1
0
11 Nov 2021
An Empirical Study of Training End-to-End Vision-and-Language
  Transformers
An Empirical Study of Training End-to-End Vision-and-Language Transformers
Zi-Yi Dou
Yichong Xu
Zhe Gan
Jianfeng Wang
Shuohang Wang
...
Pengchuan Zhang
Lu Yuan
Nanyun Peng
Zicheng Liu
Michael Zeng
VLM
38
369
0
03 Nov 2021
GNN-LM: Language Modeling based on Global Contexts via GNN
GNN-LM: Language Modeling based on Global Contexts via GNN
Yuxian Meng
Shi Zong
Xiaoya Li
Xiaofei Sun
Tianwei Zhang
Fei Wu
Jiwei Li
LRM
24
37
0
17 Oct 2021
Language Modelling via Learning to Rank
Language Modelling via Learning to Rank
A. Frydenlund
Gagandeep Singh
Frank Rudzicz
47
7
0
13 Oct 2021
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining
  Large Language Model Prompts
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts
Tongshuang Wu
Michael Terry
Carrie J. Cai
LLMAG
AI4CE
LRM
37
447
0
04 Oct 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
72
44
0
21 Sep 2021
Identifiable Energy-based Representations: An Application to Estimating
  Heterogeneous Causal Effects
Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects
Yao Zhang
Jeroen Berrevoets
M. Schaar
CML
36
5
0
06 Aug 2021
Different kinds of cognitive plausibility: why are transformers better
  than RNNs at predicting N400 amplitude?
Different kinds of cognitive plausibility: why are transformers better than RNNs at predicting N400 amplitude?
J. Michaelov
Megan D. Bardolph
S. Coulson
Benjamin Bergen
18
22
0
20 Jul 2021
A Survey on Low-Resource Neural Machine Translation
A Survey on Low-Resource Neural Machine Translation
Rui Wang
Xu Tan
Renqian Luo
Tao Qin
Tie-Yan Liu
3DV
33
58
0
09 Jul 2021
Charformer: Fast Character Transformers via Gradient-based Subword
  Tokenization
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
51
152
0
23 Jun 2021
Recurrent Neural Network from Adder's Perspective: Carry-lookahead RNN
Recurrent Neural Network from Adder's Perspective: Carry-lookahead RNN
Haowei Jiang
Fei-wei Qin
Jin Cao
Yong Peng
Yanli Shao
LRM
ODL
16
42
0
22 Jun 2021
Investigating Alternatives to the Root Mean Square for Adaptive Gradient
  Methods
Investigating Alternatives to the Root Mean Square for Adaptive Gradient Methods
Brett Daley
Chris Amato
ODL
26
0
0
10 Jun 2021
1234
Next