ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

26 September 2019

Papers citing "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"

50 / 2,916 papers shown

Title
Matching with Transformers in MELT S. Hertling Jan Portisch Heiko Paulheim 44 9 0 15 Sep 2021
Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training Bo Zheng Li Dong Shaohan Huang Saksham Singhal Wanxiang Che Ting Liu Xia Song Furu Wei VLM 26 22 0 15 Sep 2021
EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation Chenhe Dong Guangrun Wang Hang Xu Jiefeng Peng Xiaozhe Ren Xiaodan Liang 41 28 0 15 Sep 2021
Transformer-based Language Models for Factoid Question Answering at BioASQ9b Urvashi Khanna Diego Mollá Aliod 41 5 0 15 Sep 2021
Incorporating Residual and Normalization Layers into Analysis of Masked Language Models Goro Kobayashi Tatsuki Kuribayashi Sho Yokoi Kentaro Inui 171 46 0 15 Sep 2021
Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering Zhe Lin Yitao Cai Xiaojun Wan 45 13 0 15 Sep 2021
Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering Siddhant Garg Alessandro Moschitti 34 26 0 14 Sep 2021
Explainable Identification of Dementia from Transcripts using Transformer Networks Loukas Ilias D. Askounis 31 39 0 14 Sep 2021
ePiC: Employing Proverbs in Context as a Benchmark for Abstract Language Understanding Sayan Ghosh Shashank Srivastava 29 11 0 14 Sep 2021
STraTA: Self-Training with Task Augmentation for Better Few-shot Learning Tu Vu Minh-Thang Luong Quoc V. Le Grady Simon Mohit Iyyer 131 61 0 13 Sep 2021
Packed Levitated Marker for Entity and Relation Extraction Deming Ye Yankai Lin Peng Li Maosong Sun 146 106 0 13 Sep 2021
Compute and Energy Consumption Trends in Deep Learning Inference Radosvet Desislavov Fernando Martínez-Plumed José Hernández-Orallo 35 113 0 12 Sep 2021
"Let Your Characters Tell Their Story": A Dataset for Character-Centric Narrative Understanding Faeze Brahman Meng Huang Oyvind Tafjord Chao Zhao Mrinmaya Sachan Snigdha Chaturvedi 32 53 0 12 Sep 2021
FBERT: A Neural Transformer for Identifying Offensive Content Diptanu Sarkar Marcos Zampieri Tharindu Ranasinghe Alexander Ororbia VLM 41 55 0 10 Sep 2021
Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training Yu Meng Yunyi Zhang Jiaxin Huang Xuan Wang Yu Zhang Heng Ji Jiawei Han 51 71 0 10 Sep 2021
On the validity of pre-trained transformers for natural language processing in the software engineering domain Julian von der Mosel Alexander Trautsch Steffen Herbold 45 67 0 10 Sep 2021
Knowledge-Aware Meta-learning for Low-Resource Text Classification Huaxiu Yao Yingxin Wu Maruan Al-Shedivat Eric Xing VLM CLIP 72 11 0 10 Sep 2021
EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling Jue Wang Haofan Wang Jincan Deng Weijia Wu Debing Zhang VLM CLIP 72 19 0 10 Sep 2021
Query-driven Segment Selection for Ranking Long Documents Youngwoo Kim Razieh Rahimi Hamed Bonab James Allan RALM 30 5 0 10 Sep 2021
Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations Vladimir Araujo Andrés Villa Marcelo Mendoza Marie-Francine Moens Alvaro Soto 44 7 0 10 Sep 2021
Is Attention Better Than Matrix Decomposition? Zhengyang Geng Meng-Hao Guo Hongxu Chen Xia Li Ke Wei Zhouchen Lin 62 139 0 09 Sep 2021
Bag of Tricks for Optimizing Transformer Efficiency Ye Lin Yanyang Li Tong Xiao Jingbo Zhu 36 6 0 09 Sep 2021
Graph Based Network with Contextualized Representations of Turns in Dialogue Bongseok Lee Y. Choi 66 69 0 09 Sep 2021
What's Hidden in a One-layer Randomly Weighted Transformer? Sheng Shen Z. Yao Douwe Kiela Kurt Keutzer Michael W. Mahoney 39 4 0 08 Sep 2021
A Bayesian Framework for Information-Theoretic Probing Tiago Pimentel Ryan Cotterell 35 24 0 08 Sep 2021
Self- and Pseudo-self-supervised Prediction of Speaker and Key-utterance for Multi-party Dialogue Reading Comprehension Yiyang Li Hai Zhao 32 23 0 08 Sep 2021
Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction Oscar Sainz Oier López de Lacalle Gorka Labaka Ander Barrena Eneko Agirre 16 117 0 08 Sep 2021
ArchivalQA: A Large-scale Benchmark Dataset for Open Domain Question Answering over Historical News Collections Jiexin Wang Adam Jatowt Masatoshi Yoshikawa 52 33 0 08 Sep 2021
Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering Chenyu You Nuo Chen Yuexian Zou SSL 32 63 0 08 Sep 2021
NumGPT: Improving Numeracy Ability of Generative Pre-trained Models Zhihua Jin Xin Jiang Xingbo Wang Qun Liu Yong Wang Xiaozhe Ren Huamin Qu 24 19 0 07 Sep 2021
IndicBART: A Pre-trained Model for Indic Natural Language Generation Raj Dabre Himani Shrotriya Anoop Kunchukuttan Ratish Puduppully Mitesh M. Khapra Pratyush Kumar 57 71 0 07 Sep 2021
Sent2Span: Span Detection for PICO Extraction in the Biomedical Text without Span Annotations Shifeng Liu Yifang Sun Bing Li Wei Wang Florence T. Bourgeois A. Dunn 24 14 0 06 Sep 2021
STaCK: Sentence Ordering with Temporal Commonsense Knowledge Deepanway Ghosal Navonil Majumder Rada Mihalcea Soujanya Poria 50 11 0 06 Sep 2021
Re-entry Prediction for Online Conversations via Self-Supervised Learning Lingzhi Wang Xingshan Zeng Huang Hu Kam-Fai Wong Daxin Jiang 40 6 0 05 Sep 2021
FewshotQA: A simple framework for few-shot learning of question answering tasks using pre-trained text-to-text models Rakesh Chada P. Natarajan 41 45 0 04 Sep 2021
Frustratingly Simple Pretraining Alternatives to Masked Language Modeling Atsuki Yamaguchi G. Chrysostomou Katerina Margatina Nikolaos Aletras 32 25 0 04 Sep 2021
Do Prompt-Based Models Really Understand the Meaning of their Prompts? Albert Webson Ellie Pavlick LRM 66 359 0 02 Sep 2021
So Cloze yet so Far: N400 Amplitude is Better Predicted by Distributional Information than Human Predictability Judgements J. Michaelov S. Coulson Benjamin Bergen 24 44 0 02 Sep 2021
Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of Generated Hate Speech Tomer Wullach A. Adler Einat Minkov 11 41 0 01 Sep 2021
Does Knowledge Help General NLU? An Empirical Study Ruochen Xu Yuwei Fang Chenguang Zhu Michael Zeng ELM 34 9 0 01 Sep 2021
What Have Been Learned & What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification Biyang Guo S. Han Hailiang Huang 19 5 0 01 Sep 2021
It's not Rocket Science : Interpreting Figurative Language in Narratives Tuhin Chakrabarty Yejin Choi Vered Shwartz 29 55 0 31 Aug 2021
Effectiveness of Deep Networks in NLP using BiDAF as an example architecture Soumyendu Sarkar 34 2 0 31 Aug 2021
Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools Nils Feldhus Robert Schwarzenberg Sebastian Möller 37 14 0 31 Aug 2021
DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion Wei Niu Jiexiong Guan Yanzhi Wang G. Agrawal Bin Ren AI4CE 35 147 0 30 Aug 2021
ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding Lingyun Feng Jianwei Yu Deng Cai Songxiang Liu Haitao Zheng Yan Wang ELM 79 14 0 30 Aug 2021
Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning Ran Tian Joshua Maynez Ankur P. Parikh ViT 42 2 0 30 Aug 2021
Generating Answer Candidates for Quizzes and Answer-Aware Question Generators Kristiyan Vachev Momchil Hardalov Georgi Karadzhov Georgi Georgiev Ivan Koychev Preslav Nakov AI4Ed 31 5 0 29 Aug 2021
Span Fine-tuning for Pre-trained Language Models Rongzhou Bao Zhuosheng Zhang Hai Zhao 19 2 0 29 Aug 2021
Analyzing and Mitigating Interference in Neural Architecture Search Jin Xu Xu Tan Kaitao Song Renqian Luo Yichong Leng Tao Qin Tie-Yan Liu Jian Li MoMe 39 29 0 29 Aug 2021