Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1704.05426
Cited By
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
18 April 2017
Adina Williams
Nikita Nangia
Samuel R. Bowman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference"
50 / 2,728 papers shown
Title
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models
Yada Pruksachatkun
Philip Yeres
Haokun Liu
Jason Phang
Phu Mon Htut
Alex Jinpeng Wang
Ian Tenney
Samuel R. Bowman
SSeg
14
94
0
04 Mar 2020
CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model
Liang Xu
Xuanwei Zhang
Qianqian Dong
SSL
19
70
0
03 Mar 2020
PhoBERT: Pre-trained language models for Vietnamese
Dat Quoc Nguyen
A. Nguyen
174
343
0
02 Mar 2020
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
Hangbo Bao
Li Dong
Furu Wei
Wenhui Wang
Nan Yang
...
Yu-Chiang Frank Wang
Songhao Piao
Jianfeng Gao
Ming Zhou
H. Hon
AI4CE
44
392
0
28 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
22
148
0
26 Feb 2020
Sparse Sinkhorn Attention
Yi Tay
Dara Bahri
Liu Yang
Donald Metzler
Da-Cheng Juan
23
331
0
26 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
47
1,214
0
25 Feb 2020
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Yige Xu
Xipeng Qiu
L. Zhou
Xuanjing Huang
17
66
0
24 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
55
155
0
21 Feb 2020
Contextual Lensing of Universal Sentence Representations
J. Kiros
21
5
0
20 Feb 2020
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
Xiaodong Liu
Yu-Chiang Frank Wang
Jianshu Ji
Hao Cheng
Xueyun Zhu
...
Pengcheng He
Weizhu Chen
Hoifung Poon
Guihong Cao
Jianfeng Gao
AI4CE
31
60
0
19 Feb 2020
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
24
94
0
17 Feb 2020
Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
Carlos Aspillaga
Andrés Carvallo
Vladimir Araujo
ELM
47
31
0
14 Feb 2020
HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing
Xiyou Zhou
Zhiyu Zoey Chen
Xiaoyong Jin
Wenjie Wang
22
32
0
14 Feb 2020
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
Weihao Yu
Zihang Jiang
Yanfei Dong
Jiashi Feng
LRM
25
245
0
11 Feb 2020
Adversarial Filters of Dataset Biases
Ronan Le Bras
Swabha Swayamdipta
Chandra Bhagavatula
Rowan Zellers
Matthew E. Peters
Ashish Sabharwal
Yejin Choi
36
220
0
10 Feb 2020
Pre-training Tasks for Embedding-based Large-scale Retrieval
Wei-Cheng Chang
Felix X. Yu
Yin-Wen Chang
Yiming Yang
Sanjiv Kumar
RALM
13
301
0
10 Feb 2020
Multilingual Alignment of Contextual Word Representations
Steven Cao
Nikita Kitaev
Dan Klein
44
192
0
10 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
234
198
0
07 Feb 2020
Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
Taeuk Kim
Jihun Choi
Daniel Edmiston
Sang-goo Lee
22
90
0
30 Jan 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
258
1,591
0
21 Jan 2020
Length-controllable Abstractive Summarization by Guiding with Summary Prototype
Itsumi Saito
Kyosuke Nishida
Kosuke Nishida
Atsushi Otsuka
Hisako Asano
J. Tomita
Hiroyuki Shindo
Yuji Matsumoto
14
33
0
21 Jan 2020
Multi-level Head-wise Match and Aggregation in Transformer for Textual Sequence Matching
Shuohang Wang
Yunshi Lan
Yi Tay
Jing Jiang
Jingjing Liu
ViT
27
7
0
20 Jan 2020
LP-SparseMAP: Differentiable Relaxed Optimization for Sparse Structured Prediction
Vlad Niculae
André F. T. Martins
TPM
24
19
0
13 Jan 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
Daoyuan Chen
Yaliang Li
Minghui Qiu
Zhen Wang
Bofang Li
Bolin Ding
Hongbo Deng
Jun Huang
Wei Lin
Jingren Zhou
MQ
24
104
0
13 Jan 2020
TextNAS: A Neural Architecture Search Space tailored for Text Representation
Yujing Wang
Yaming Yang
Yiren Chen
Jing Bai
Ce Zhang
Guinan Su
Xiaoyu Kou
Yunhai Tong
Mao Yang
Lidong Zhou
13
55
0
23 Dec 2019
Are Transformers universal approximators of sequence-to-sequence functions?
Chulhee Yun
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
26
336
0
20 Dec 2019
Cross-Lingual Ability of Multilingual BERT: An Empirical Study
Karthikeyan K
Zihan Wang
Stephen D. Mayhew
Dan Roth
LRM
36
333
0
17 Dec 2019
On the relationship between multitask neural networks and multitask Gaussian Processes
Karthikeyan K
S. Bharti
Piyush Rai
BDL
13
0
0
12 Dec 2019
FlauBERT: Unsupervised Language Model Pre-training for French
Hang Le
Loïc Vial
Jibril Frej
Vincent Segonne
Maximin Coavoux
Benjamin Lecouteux
A. Allauzen
Benoît Crabbé
Laurent Besacier
D. Schwab
AI4CE
49
395
0
11 Dec 2019
Unsupervised Transfer Learning via BERT Neuron Selection
M. Valipour
E. Lee
Jaime R. Jamacaro
C. Bessega
29
5
0
10 Dec 2019
COSTRA 1.0: A Dataset of Complex Sentence Transformations
P. Barancíková
Ondrej Bojar
24
7
0
03 Dec 2019
Do Attention Heads in BERT Track Syntactic Dependencies?
Phu Mon Htut
Jason Phang
Shikha Bordia
Samuel R. Bowman
32
136
0
27 Nov 2019
Taking a Stance on Fake News: Towards Automatic Disinformation Assessment via Deep Bidirectional Transformer Language Models for Stance Detection
Chris Dulhanty
Jason L. Deglint
Ibrahim Ben Daya
A. Wong
24
22
0
27 Nov 2019
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
Shiori Sagawa
Pang Wei Koh
Tatsunori B. Hashimoto
Percy Liang
OOD
16
1,200
0
20 Nov 2019
Sieving Fake News From Genuine: A Synopsis
Shahid Alam
Abdulaziz Ravshanbekov
GNN
9
6
0
19 Nov 2019
Multi-task Sentence Encoding Model for Semantic Retrieval in Question Answering Systems
Qiang Huang
Jianhui Bu
Weijian Xie
Shengwen Yang
Weijia Wu
Liping Liu
27
17
0
18 Nov 2019
Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering
Vikas Yadav
Steven Bethard
Mihai Surdeanu
27
75
0
17 Nov 2019
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
Xiaozhi Wang
Tianyu Gao
Zhaocheng Zhu
Zhengyan Zhang
Zhiyuan Liu
Juan-Zi Li
Jian Tang
15
647
0
13 Nov 2019
Learning from Data-Rich Problems: A Case Study on Genetic Variant Calling
Ren Yi
Pi-Chuan Chang
Gunjan Baid
Andrew Carroll
22
2
0
12 Nov 2019
Contract Discovery: Dataset and a Few-Shot Semantic Retrieval Challenge with Competitive Baselines
Łukasz Borchmann
Dawid Wisniewski
Andrzej Gretkowski
Izabela Kosmala
Dawid Jurkiewicz
Lukasz Szalkiewicz
Gabriela Pałka
Karol Kaczmarek
Agnieszka Kaliska
Filip Graliñski
AILaw
27
0
0
10 Nov 2019
CamemBERT: a Tasty French Language Model
Louis Martin
Benjamin Muller
Pedro Ortiz Suarez
Yoann Dupont
Laurent Romary
Eric Villemonte de la Clergerie
Djamé Seddah
Benoît Sagot
42
956
0
10 Nov 2019
Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks
Trapit Bansal
Rishikesh Jha
Andrew McCallum
SSL
21
118
0
10 Nov 2019
Increasing Robustness to Spurious Correlations using Forgettable Examples
Yadollah Yaghoobzadeh
Soroush Mehri
Remi Tachet
Timothy J. Hazen
Alessandro Sordoni
OOD
21
18
0
10 Nov 2019
Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training
Margaret Li
Stephen Roller
Ilia Kulikov
Sean Welleck
Y-Lan Boureau
Kyunghyun Cho
Jason Weston
17
180
0
10 Nov 2019
Stylized Text Generation Using Wasserstein Autoencoders with a Mixture of Gaussian Prior
Amirpasha Ghabussi
Lili Mou
Olga Vechtomova
22
2
0
10 Nov 2019
Multi-Perspective Inferrer: Reasoning Sentences Relationship from Holistic Perspective
Zhenfu Cheng
Zaixiang Zheng
Xinyu Dai
Shujian Huang
Jiajun Chen
28
0
0
09 Nov 2019
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models
Linqing Liu
Haiquan Wang
Jimmy J. Lin
R. Socher
Caiming Xiong
12
21
0
09 Nov 2019
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
T. Zhao
40
559
0
08 Nov 2019
What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning
Jaejun Lee
Raphael Tang
Jimmy J. Lin
34
121
0
08 Nov 2019
Previous
1
2
3
...
48
49
50
...
53
54
55
Next