Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.02637
Cited By
Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets
6 August 2020
Patrick Lewis
Pontus Stenetorp
Sebastian Riedel
OOD
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets"
50 / 132 papers shown
Title
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez Llorca
ELM
139
1
0
10 Feb 2025
Language model developers should report train-test overlap
Andy K. Zhang
Kevin Klyman
Yifan Mai
Yoav Levine
Yian Zhang
Rishi Bommasani
Percy Liang
VLM
ELM
37
8
0
10 Oct 2024
UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Zixuan Li
Jing Xiong
Fanghua Ye
Chuanyang Zheng
Xun Wu
...
Xiaodan Liang
Chengming Li
Zhenan Sun
Lingpeng Kong
Ngai Wong
RALM
UQLM
32
2
0
03 Oct 2024
RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering
Rujun Han
Yuhao Zhang
Peng Qi
Yumo Xu
Jenyuan Wang
Lan Liu
William Yang Wang
Bonan Min
Vittorio Castelli
RALM
43
17
0
19 Jul 2024
Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression
Aryan Gulati
Xingjian Dong
Carlos Hurtado
Sarath Shekkizhar
Swabha Swayamdipta
Antonio Ortega
OODD
23
2
0
18 Jul 2024
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Jinhyuk Lee
Anthony Chen
Zhuyun Dai
Dheeru Dua
Devendra Singh Sachan
...
Jeremy R. Cole
Sebastian Riedel
Iftekhar Naim
Ming-Wei Chang
Kelvin Guu
RALM
LRM
58
30
0
19 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
39
0
06 Jun 2024
MASSIVE Multilingual Abstract Meaning Representation: A Dataset and Baselines for Hallucination Detection
Michael Regan
Shira Wein
George Baker
Emilio Monti
46
3
0
29 May 2024
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models
Haitao Li
Qingyao Ai
Jia Chen
Qian Dong
Zhijing Wu
Yiqun Liu
Chong Chen
Qi Tian
AILaw
62
13
0
27 Mar 2024
Calibrating Large Language Models Using Their Generations Only
Dennis Ulmer
Martin Gubri
Hwaran Lee
Sangdoo Yun
Seong Joon Oh
UQLM
432
18
1
09 Mar 2024
WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations
Haolin Deng
Chang Wang
Xin Li
Dezhang Yuan
Junlang Zhan
Tianhua Zhou
Jin Ma
Jun Gao
Ruifeng Xu
HILM
66
2
0
04 Mar 2024
Improving Sequential Recommendations with LLMs
Artun Boz
Wouter Zorgdrager
Zoe Kotti
Jesse Harte
Panos Louridas
Dietmar Jannach
Vassilios Karakoidas
Marios Fragkoulis
KELM
LRM
70
4
0
02 Feb 2024
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
Shihan Dou
Enyu Zhou
Yan Liu
Songyang Gao
Jun Zhao
...
Jiang Zhu
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
CLL
MoE
KELM
22
29
0
15 Dec 2023
Do Smaller Language Models Answer Contextualised Questions Through Memorisation Or Generalisation?
Tim Hartill
Joshua Bensemann
Michael Witbrock
Patricia Riddle
KELM
30
0
0
21 Nov 2023
Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models
Yujin Kim
Jaehong Yoon
Seonghyeon Ye
Sangmin Bae
Namgyu Ho
Sung Ju Hwang
Se-Young Yun
KELM
40
10
0
14 Nov 2023
Hint-enhanced In-Context Learning wakes Large Language Models up for knowledge-intensive tasks
Yifan Wang
Qingyan Guo
Xinzhe Ni
Chufan Shi
Lemao Liu
Haiyun Jiang
Yujiu Yang
ReLM
RALM
25
8
0
03 Nov 2023
Give Me the Facts! A Survey on Factual Knowledge Probing in Pre-trained Language Models
Paul Youssef
Osman Alperen Koracs
Meijie Li
Jorg Schlotterer
Christin Seifert
KELM
24
16
0
25 Oct 2023
CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations
Myra Cheng
Tiziano Piccardi
Diyi Yang
LLMAG
18
67
0
17 Oct 2023
Battle of the Large Language Models: Dolly vs LLaMA vs Vicuna vs Guanaco vs Bard vs ChatGPT -- A Text-to-SQL Parsing Comparison
Shuo Sun
Yuchen Zhang
Jiahuan Yan
Yuze Gao
Donovan Ong
Bin Chen
Jian Su
ELM
ALM
43
12
0
16 Oct 2023
Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models
Alex Nyffenegger
Matthias Sturmer
Joel Niklaus
34
6
0
22 Aug 2023
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Anna Rogers
A. Luccioni
53
19
0
14 Aug 2023
Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval
Tim Hartill
Diana Benavides-Prado
Michael Witbrock
Patricia J. Riddle
ReLM
LRM
28
1
0
09 Aug 2023
Teaching Smaller Language Models To Generalise To Unseen Compositional Questions
Tim Hartill
N. Tan
Michael Witbrock
Patricia J. Riddle
ReLM
KELM
LRM
34
2
0
02 Aug 2023
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
36
3
0
21 Jun 2023
When to Read Documents or QA History: On Unified and Selective Open-domain QA
Kyungjae Lee
Sanghyun Han
Seung-won Hwang
Moontae Lee
RALM
24
4
0
07 Jun 2023
Out-of-Distribution Generalization in Text Classification: Past, Present, and Future
Linyi Yang
Yangqiu Song
Xuan Ren
Chenyang Lyu
Yidong Wang
Lingqiao Liu
Jindong Wang
Jennifer Foster
Yue Zhang
OOD
37
2
0
23 May 2023
What's the Meaning of Superhuman Performance in Today's NLU?
Simone Tedeschi
Johan Bos
T. Declerck
Jan Hajic
Daniel Hershcovich
...
Simon Krek
Steven Schockaert
Rico Sennrich
Ekaterina Shutova
Roberto Navigli
ELM
LM&MA
VLM
ReLM
LRM
39
26
0
15 May 2023
Learning to Generalize for Cross-domain QA
Yingjie Niu
Linyi Yang
Ruihai Dong
Yue Zhang
24
6
0
14 May 2023
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
Kent K. Chang
Mackenzie Cramer
Sandeep Soni
David Bamman
RALM
148
112
0
28 Apr 2023
Koala: An Index for Quantifying Overlaps with Pre-training Corpora
Thuy-Trang Vu
Xuanli He
Gholamreza Haffari
Ehsan Shareghi
CLL
27
13
0
26 Mar 2023
REPLUG: Retrieval-Augmented Black-Box Language Models
Weijia Shi
Sewon Min
Michihiro Yasunaga
Minjoon Seo
Rich James
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
VLM
KELM
83
580
0
30 Jan 2023
Understanding Finetuning for Factual Knowledge Extraction from Language Models
Mehran Kazemi
Sid Mittal
Deepak Ramachandran
KELM
34
10
0
26 Jan 2023
What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary
Ori Ram
L. Bezalel
Adi Zicher
Yonatan Belinkov
Jonathan Berant
Amir Globerson
44
36
0
20 Dec 2022
Self-Prompting Large Language Models for Zero-Shot Open-Domain QA
Junlong Li
Jinyuan Wang
Zhuosheng Zhang
Hai Zhao
LRM
39
32
0
16 Dec 2022
CREPE: Open-Domain Question Answering with False Presuppositions
Xinyan Velocity Yu
Sewon Min
Luke Zettlemoyer
Hannaneh Hajishirzi
22
45
0
30 Nov 2022
Empowering Language Models with Knowledge Graph Reasoning for Question Answering
Ziniu Hu
Yichong Xu
Wenhao Yu
Shuohang Wang
Ziyi Yang
Chenguang Zhu
Kai-Wei Chang
Yizhou Sun
KELM
RALM
LRM
19
26
0
15 Nov 2022
DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering
Ella Neeman
Roee Aharoni
Or Honovich
Leshem Choshen
Idan Szpektor
Omri Abend
KELM
CML
23
78
0
10 Nov 2022
Eliciting Knowledge from Large Pre-Trained Models for Unsupervised Knowledge-Grounded Conversation
Yanyang Li
Jianqiao Zhao
M. Lyu
Liwei Wang
24
15
0
03 Nov 2022
An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks
Yuxiang Wu
Yu Zhao
Baotian Hu
Pasquale Minervini
Pontus Stenetorp
Sebastian Riedel
RALM
KELM
51
43
0
30 Oct 2022
Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence
Hung-Ting Chen
Michael J.Q. Zhang
Eunsol Choi
RALM
HILM
47
92
0
25 Oct 2022
Enhancing Tabular Reasoning with Pattern Exploiting Training
Abhilash Shankarampeta
Vivek Gupta
Shuo Zhang
LMTD
RALM
ReLM
68
6
0
21 Oct 2022
Pre-training Language Models with Deterministic Factual Knowledge
Shaobo Li
Xiaoguang Li
Lifeng Shang
Chengjie Sun
Bingquan Liu
Zhenzhou Ji
Xin Jiang
Qun Liu
KELM
47
11
0
20 Oct 2022
Benchmarking Long-tail Generalization with Likelihood Splits
Ameya Godbole
Robin Jia
ALM
29
8
0
13 Oct 2022
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
Pedro Rodriguez
Mahmoud Azab
Becka Silvert
Renato Sanchez
Linzy Labson
Hardik Shah
Seungwhan Moon
50
1
0
10 Oct 2022
A Unified Encoder-Decoder Framework with Entity Memory
Zhihan Zhang
Wenhao Yu
Chenguang Zhu
Meng Jiang
39
11
0
07 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
121
94
0
06 Oct 2022
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering
Shamane Siriwardhana
Rivindu Weerasekera
Elliott Wen
Tharindu Kaluarachchi
R. Rana
Suranga Nanayakkara
VLM
19
161
0
06 Oct 2022
Recitation-Augmented Language Models
Zhiqing Sun
Xuezhi Wang
Yi Tay
Yiming Yang
Denny Zhou
RALM
201
60
0
04 Oct 2022
Zero-Shot Retrieval with Search Agents and Hybrid Environments
Michelle Chen Huebscher
Christian Buck
Massimiliano Ciaramita
S. Rothe
35
9
0
30 Sep 2022
Evaluation of Question Answering Systems: Complexity of judging a natural language
Amer Farea
Zhen Yang
Kien Duong
Nadeesha Perera
F. Emmert-Streib
ELM
31
3
0
10 Sep 2022
1
2
3
Next