ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.01705
  4. Cited By
The Information Pathways Hypothesis: Transformers are Dynamic
  Self-Ensembles

The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles

2 June 2023
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
ArXivPDFHTML

Papers citing "The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles"

10 / 10 papers shown
Title
Disrupting Diffusion-based Inpainters with Semantic Digression
Disrupting Diffusion-based Inpainters with Semantic Digression
Geonho Son
Juhun Lee
Simon S. Woo
DiffM
42
3
0
14 Jul 2024
Triplet Interaction Improves Graph Transformers: Accurate Molecular
  Graph Learning with Triplet Graph Transformers
Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning with Triplet Graph Transformers
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
ViT
41
6
0
07 Feb 2024
GRPE: Relative Positional Encoding for Graph Transformer
GRPE: Relative Positional Encoding for Graph Transformer
Wonpyo Park
Woonggi Chang
Donggeon Lee
Juntae Kim
Seung-won Hwang
41
75
0
30 Jan 2022
Train Short, Test Long: Attention with Linear Biases Enables Input
  Length Extrapolation
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
253
701
0
27 Aug 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
87
77
0
12 Jul 2021
Shortformer: Better Language Modeling using Shorter Inputs
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
230
89
0
31 Dec 2020
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
288
2,023
0
28 Jul 2020
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
Tianlong Chen
Jonathan Frankle
Shiyu Chang
Sijia Liu
Yang Zhang
Zhangyang Wang
Michael Carbin
156
345
0
23 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
255
580
0
12 Mar 2020
Dropout as a Bayesian Approximation: Representing Model Uncertainty in
  Deep Learning
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
287
9,156
0
06 Jun 2015
1