ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.15355
  4. Cited By
Optimizing Deeper Transformers on Small Datasets

Optimizing Deeper Transformers on Small Datasets

30 December 2020
Peng-Tao Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
Jackie C.K. Cheung
S. Prince
Yanshuai Cao
    AI4CE
ArXivPDFHTML

Papers citing "Optimizing Deeper Transformers on Small Datasets"

19 / 19 papers shown
Title
A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation
Shanhe You
Xuewen Luo
Xinhe Liang
Jiashu Yu
Chen Zheng
Jiangtao Gong
69
0
0
07 Mar 2025
Delving into Differentially Private Transformer
Delving into Differentially Private Transformer
Youlong Ding
Xueyang Wu
Yining Meng
Yonggang Luo
Hao Wang
Weike Pan
39
5
0
28 May 2024
Hierarchical Classification System for Breast Cancer Specimen Report
  (HCSBC) -- an end-to-end model for characterizing severity and diagnosis
Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC) -- an end-to-end model for characterizing severity and diagnosis
Thiago Santos
Harish Kamath
Christopher R. McAdams
Mary S. Newell
Marina B. Mosunjac
...
Geoffrey H. Smith
Constance Lehman
J. Gichoya
Imon Banerjee
Hari M. Trivedi
22
0
0
02 Nov 2023
RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL
RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL
Haoyang Li
Jing Zhang
Cuiping Li
Hong Chen
26
172
0
12 Feb 2023
Structured Case-based Reasoning for Inference-time Adaptation of
  Text-to-SQL parsers
Structured Case-based Reasoning for Inference-time Adaptation of Text-to-SQL parsers
Abhijeet Awasthi
Soumen Chakrabarti
Sunita Sarawagi
25
5
0
10 Jan 2023
Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
Wenjie Hao
Hongfei Xu
Lingling Mu
Hongying Zan
MoE
18
4
0
24 Dec 2022
Language models are good pathologists: using attention-based sequence
  reduction and text-pretrained transformers for efficient WSI classification
Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
Juan Pisula
Katarzyna Bozek
VLM
MedIm
33
3
0
14 Nov 2022
Diverse Parallel Data Synthesis for Cross-Database Adaptation of
  Text-to-SQL Parsers
Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers
Abhijeet Awasthi
Ashutosh Sathe
Sunita Sarawagi
35
4
0
29 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
114
93
0
06 Oct 2022
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
Dynamic Linear Transformer for 3D Biomedical Image Segmentation
Dynamic Linear Transformer for 3D Biomedical Image Segmentation
Zheyu Zhang
Ulas Bagci
ViT
MedIm
22
12
0
01 Jun 2022
Zero-shot Code-Mixed Offensive Span Identification through Rationale
  Extraction
Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction
Manikandan Ravikiran
Bharathi Raja Chakravarthi
22
3
0
12 May 2022
DeepNet: Scaling Transformers to 1,000 Layers
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
17
156
0
01 Mar 2022
mRAT-SQL+GAP:A Portuguese Text-to-SQL Transformer
mRAT-SQL+GAP:A Portuguese Text-to-SQL Transformer
M. A. José
Fabio Gagliardi Cozman
LMTD
26
11
0
07 Oct 2021
PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding
  from Language Models
PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models
Torsten Scholak
Nathan Schucher
Dzmitry Bahdanau
154
374
0
10 Sep 2021
LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and
  Non-Local Relations
LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations
Ruisheng Cao
Lu Chen
Zhi Chen
Yanbin Zhao
Su Zhu
Kai Yu
22
159
0
02 Jun 2021
Escaping the Big Data Paradigm with Compact Transformers
Escaping the Big Data Paradigm with Compact Transformers
Ali Hassani
Steven Walton
Nikhil Shah
Abulikemu Abuduweili
Jiachen Li
Humphrey Shi
54
462
0
12 Apr 2021
Rewiring the Transformer with Depth-Wise LSTMs
Rewiring the Transformer with Depth-Wise LSTMs
Hongfei Xu
Yang Song
Qiuhui Liu
Josef van Genabith
Deyi Xiong
37
6
0
13 Jul 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
284
2,890
0
15 Sep 2016
1