Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models

17 March 2022

Aaron Mueller

Papers citing "Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models"

29 / 29 papers shown

Title
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora Alex Warstadt Aaron Mueller Leshem Choshen E. Wilcox Chengxu Zhuang ... Rafael Mosquera Bhargavi Paranjape Adina Williams Tal Linzen Ryan Cotterell 152 120 0 10 Apr 2025
How Does Code Pretraining Affect Language Model Task Performance? Jackson Petty Sjoerd van Steenkiste Tal Linzen 91 12 0 06 Sep 2024
Transformers Generalize Linearly Jackson Petty Robert Frank AI4CE 233 16 0 24 Sep 2021
Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models Matthew Finlayson Aaron Mueller Sebastian Gehrmann Stuart M. Shieber Tal Linzen Yonatan Belinkov 102 110 0 10 Jun 2021
The Low-Dimensional Linear Geometry of Contextualized Word Representations Evan Hernandez Jacob Andreas MILM 82 42 0 15 May 2021
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction Shauli Ravfogel Grusha Prasad Tal Linzen Yoav Goldberg 53 59 0 14 May 2021
mT5: A massively multilingual pre-trained text-to-text transformer Linting Xue Noah Constant Adam Roberts Mihir Kale Rami Al-Rfou Aditya Siddhant Aditya Barua Colin Raffel 118 2,533 0 22 Oct 2020
Can neural networks acquire a structural bias from raw linguistic data? Alex Warstadt Samuel R. Bowman AI4CE 46 54 0 14 Jul 2020
Finding Universal Grammatical Relations in Multilingual BERT Ethan A. Chi John Hewitt Christopher D. Manning 38 151 0 09 May 2020
A Systematic Assessment of Syntactic Generalization in Neural Language Models Jennifer Hu Jon Gauthier Peng Qian Ethan Gotlieb Wilcox R. Levy ELM 69 220 0 07 May 2020
How Can We Accelerate Progress Towards Human-like Linguistic Generalization? Tal Linzen 267 194 0 03 May 2020
Cross-Linguistic Syntactic Evaluation of Word Prediction Models Aaron Mueller Garrett Nicolai Panayiota Petrou-Zeniou N. Talmina Tal Linzen 60 56 0 01 May 2020
Multilingual Denoising Pre-training for Neural Machine Translation Yinhan Liu Jiatao Gu Naman Goyal Xian Li Sergey Edunov Marjan Ghazvininejad M. Lewis Luke Zettlemoyer AI4CE AIMat 111 1,806 0 22 Jan 2020
Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks R. Thomas McCoy Robert Frank Tal Linzen 73 108 0 10 Jan 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 381 20,053 0 23 Oct 2019
Quantity doesn't buy quality syntax with neural language models Marten van Schijndel Aaron Mueller Tal Linzen 59 68 0 31 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 536 24,351 0 26 Jul 2019
What Does BERT Look At? An Analysis of BERT's Attention Kevin Clark Urvashi Khandelwal Omer Levy Christopher D. Manning MILM 209 1,592 0 11 Jun 2019
BERT Rediscovers the Classical NLP Pipeline Ian Tenney Dipanjan Das Ellie Pavlick MILM SSeg 129 1,471 0 15 May 2019
Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages Shauli Ravfogel Yoav Goldberg Tal Linzen 63 70 0 15 Mar 2019
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference R. Thomas McCoy Ellie Pavlick Tal Linzen 129 1,234 0 04 Feb 2019
Assessing BERT's Syntactic Abilities Yoav Goldberg 71 495 0 16 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.6K 94,511 0 11 Oct 2018
What do RNN Language Models Learn about Filler-Gap Dependencies? Ethan Gotlieb Wilcox R. Levy Takashi Morita Richard Futrell LRM 60 168 0 31 Aug 2018
Targeted Syntactic Evaluation of Language Models Rebecca Marvin Tal Linzen 70 415 0 27 Aug 2018
Revisiting the poverty of the stimulus: hierarchical generalization without a hierarchical bias in recurrent neural networks R. Thomas McCoy Robert Frank Tal Linzen 76 81 0 25 Feb 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 654 130,942 0 12 Jun 2017
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies Tal Linzen Emmanuel Dupoux Yoav Goldberg 101 903 0 04 Nov 2016
Sequence to Sequence Learning with Neural Networks Ilya Sutskever Oriol Vinyals Quoc V. Le AIMat 392 20,528 0 10 Sep 2014