Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.03004
Cited By
Show Your Work: Improved Reporting of Experimental Results
6 September 2019
Jesse Dodge
Suchin Gururangan
Dallas Card
Roy Schwartz
Noah A. Smith
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Show Your Work: Improved Reporting of Experimental Results"
50 / 65 papers shown
Title
N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs
Ilya Zisman
Alexander Nikulin
Andrei Polubarov
Nikita Lyubaykin
Vladislav Kurenkov
Andrei Polubarov
Igor Kiselev
Vladislav Kurenkov
OffRL
56
2
0
04 Nov 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligorić
Tijana Zrnic
Cinoo Lee
Emmanuel J. Candès
Dan Jurafsky
72
6
0
27 Aug 2024
Better than classical? The subtle art of benchmarking quantum machine learning models
Joseph Bowles
Shahnawaz Ahmed
Maria Schuld
42
65
0
11 Mar 2024
Collaboration or Corporate Capture? Quantifying NLP's Reliance on Industry Artifacts and Contributions
Will Aitken
Mohamed Abdalla
K. Rudie
Catherine Stinson
33
0
0
06 Dec 2023
torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP
Yoshitomo Matsubara
VLM
34
1
0
26 Oct 2023
Target Variable Engineering
Jessica Clark
35
0
0
13 Oct 2023
GPT-4 as an Agronomist Assistant? Answering Agriculture Exams Using Large Language Models
B. Silva
Leonardo Nunes
Roberto Estevão
Vijay Aski
Ranveer Chandra
ELM
LM&MA
43
12
0
10 Oct 2023
An Easy Rejection Sampling Baseline via Gradient Refined Proposals
Edward Raff
Mark McLean
James Holt
20
0
0
30 Sep 2023
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Anna Rogers
A. Luccioni
53
19
0
14 Aug 2023
On the Limitations of Simulating Active Learning
Katerina Margatina
Nikolaos Aletras
31
11
0
21 May 2023
Measuring and Mitigating Local Instability in Deep Neural Networks
Arghya Datta
Subhrangshu Nandi
Jingcheng Xu
Greg Ver Steeg
He Xie
Anoop Kumar
Aram Galstyan
20
3
0
18 May 2023
When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP
Sara Papi
Marco Gaido
Andrea Pilzer
Matteo Negri
59
10
0
28 Mar 2023
Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation
Yaoming Zhu
Zewei Sun
Shanbo Cheng
Yuyang Huang
Liwei Wu
Mingxuan Wang
28
10
0
20 Dec 2022
We need to talk about random seeds
Steven Bethard
31
8
0
24 Oct 2022
Towards a Standardised Performance Evaluation Protocol for Cooperative MARL
R. Gorsane
Omayma Mahjoub
Ruan de Kock
Roland Dubb
Siddarth S. Singh
Arnu Pretorius
OffRL
39
49
0
21 Sep 2022
Making Intelligence: Ethical Values in IQ and ML Benchmarks
Borhane Blili-Hamelin
Leif Hancox-Li
41
16
0
01 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
33
109
0
31 Aug 2022
Resolving the Human Subjects Status of Machine Learning's Crowdworkers
Divyansh Kaushik
Zachary Chase Lipton
A. London
25
2
0
08 Jun 2022
deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks
Dennis Ulmer
Christian Hardmeier
J. Frellsen
48
42
0
14 Apr 2022
Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments
Christopher Hidey
Fei Liu
Rahul Goel
32
4
0
10 Apr 2022
Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization
Brandon Trabucco
Xinyang Geng
Aviral Kumar
Sergey Levine
OffRL
32
95
0
17 Feb 2022
Adaptive Fine-Tuning of Transformer-Based Language Models for Named Entity Recognition
Felix Stollenwerk
12
3
0
05 Feb 2022
Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation
Zoey Liu
Emily Tucker Prudhommeaux
43
4
0
05 Jan 2022
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research
Bernard Koch
Emily L. Denton
A. Hanna
J. Foster
53
140
0
03 Dec 2021
How not to Lie with a Benchmark: Rearranging NLP Leaderboards
Tatiana Shavrina
Valentin Malykh
ALM
ELM
423
10
0
02 Dec 2021
AI and the Everything in the Whole Wide World Benchmark
Inioluwa Deborah Raji
Emily M. Bender
Amandalynne Paullada
Emily L. Denton
A. Hanna
30
291
0
26 Nov 2021
Just What do You Think You're Doing, Dave?' A Checklist for Responsible Data Use in NLP
Anna Rogers
Timothy Baldwin
Kobi Leins
104
64
0
14 Sep 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
26
12
0
24 Aug 2021
Underreporting of errors in NLG output, and what to do about it
Emiel van Miltenburg
Miruna Clinciu
Ondrej Dusek
Dimitra Gkatzia
Stephanie Inglis
...
Saad Mahamood
Emma Manning
S. Schoch
Craig Thomson
Luou Wen
27
38
0
02 Aug 2021
The Benchmark Lottery
Mostafa Dehghani
Yi Tay
A. Gritsenko
Zhe Zhao
N. Houlsby
Fernando Diaz
Donald Metzler
Oriol Vinyals
42
89
0
14 Jul 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
43
105
0
07 Jul 2021
Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence
Alexander Miserlis Hoyle
Pranav Goel
Denis Peskov
Andrew Hian-Cheong
Jordan L. Boyd-Graber
Philip Resnik
41
128
0
05 Jul 2021
How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Swaroop Mishra
Anjana Arunkumar
34
24
0
10 Jun 2021
Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation
Zhiyong Wu
Lingpeng Kong
W. Bi
Xiang Li
B. Kao
LRM
23
77
0
30 May 2021
Measuring Shifts in Attitudes Towards COVID-19 Measures in Belgium Using Multilingual BERT
Kristen M. Scott
Pieter Delobelle
Bettina Berendt
26
3
0
20 Apr 2021
Perspectives on Machine Learning from Psychology's Reproducibility Crisis
Samuel J. Bell
Onno P. Kampman
17
15
0
18 Apr 2021
Making Attention Mechanisms More Robust and Interpretable with Virtual Adversarial Training
Shunsuke Kitada
Hitoshi Iyatomi
AAML
28
8
0
18 Apr 2021
Multilingual and Cross-Lingual Intent Detection from Spoken Data
D. Gerz
Pei-hao Su
Razvan Kusztos
Avishek Mondal
M. Lis
Eshan Singhal
N. Mrksic
Tsung-Hsien Wen
Ivan Vulić
17
35
0
17 Apr 2021
What Will it Take to Fix Benchmarking in Natural Language Understanding?
Samuel R. Bowman
George E. Dahl
ELM
ALM
30
156
0
05 Apr 2021
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
Nicholas Lourie
Ronan Le Bras
Chandra Bhagavatula
Yejin Choi
LRM
30
137
0
24 Mar 2021
Dutch Humor Detection by Generating Negative Examples
Thomas Winters
Pieter Delobelle
16
10
0
26 Oct 2020
Dynamic Contextualized Word Embeddings
Valentin Hofmann
J. Pierrehumbert
Hinrich Schütze
39
51
0
23 Oct 2020
UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus
George Michalopoulos
Yuanxin Wang
H. Kaka
Helen H. Chen
Alexander Wong
28
122
0
20 Oct 2020
Unsupervised Bitext Mining and Translation via Self-trained Contextual Embeddings
Phillip Keung
Julian Salazar
Y. Lu
Noah A. Smith
SSL
27
25
0
15 Oct 2020
Extracting a Knowledge Base of Mechanisms from COVID-19 Papers
Tom Hope
Aida Amini
David Wadden
Madeleine van Zuylen
Sravanthi Parasa
Eric Horvitz
Daniel S. Weld
Roy Schwartz
Hannaneh Hajishirzi
34
29
0
08 Oct 2020
Zero-Shot Stance Detection: A Dataset and Model using Generalized Topic Representations
Emily Allaway
Kathleen McKeown
19
177
0
07 Oct 2020
Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq
Qiang Ning
Hao Wu
Pradeep Dasigi
Dheeru Dua
Matt Gardner
Robert L Logan IV
Ana Marasović
Zhenjin Nie
30
16
0
06 Oct 2020
Understanding tables with intermediate pre-training
Julian Martin Eisenschlos
Syrine Krichene
Thomas Müller
LMTD
15
119
0
01 Oct 2020
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank
Ethan C. Chau
Lucy H. Lin
Noah A. Smith
19
15
0
29 Sep 2020
Improving Low Compute Language Modeling with In-Domain Embedding Initialisation
Charles F Welch
Rada Mihalcea
Jonathan K. Kummerfeld
AI4CE
19
4
0
29 Sep 2020
1
2
Next