The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles

2 June 2023

Papers citing "The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles"

10 / 10 papers shown

Title
Disrupting Diffusion-based Inpainters with Semantic Digression Geonho Son Juhun Lee Simon S. Woo DiffM 42 3 0 14 Jul 2024
Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning with Triplet Graph Transformers Md Shamim Hussain Mohammed J Zaki D. Subramanian ViT 41 6 0 07 Feb 2024
GRPE: Relative Positional Encoding for Graph Transformer Wonpyo Park Woonggi Chang Donggeon Lee Juntae Kim Seung-won Hwang 41 75 0 30 Jan 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 253 701 0 27 Aug 2021
Combiner: Full Attention Transformer with Sparse Computation Cost Hongyu Ren H. Dai Zihang Dai Mengjiao Yang J. Leskovec Dale Schuurmans Bo Dai 87 77 0 12 Jul 2021
Shortformer: Better Language Modeling using Shorter Inputs Ofir Press Noah A. Smith M. Lewis 230 89 0 31 Dec 2020
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 288 2,023 0 28 Jul 2020
The Lottery Ticket Hypothesis for Pre-trained BERT Networks Tianlong Chen Jonathan Frankle Shiyu Chang Sijia Liu Yang Zhang Zhangyang Wang Michael Carbin 156 345 0 23 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy M. Saffar Ashish Vaswani David Grangier MoE 255 580 0 12 Mar 2020
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Y. Gal Zoubin Ghahramani UQCV BDL 287 9,156 0 06 Jun 2015