Deep Retrieval at CheckThat! 2025: Identifying Scientific Papers from Implicit Social Media Mentions via Hybrid Retrieval and Re-Ranking

29 May 2025

Pascal Sager

Main:5 Pages

3 Figures

Bibliography:4 Pages

Appendix:7 Pages

Abstract

We present the methodology and results of the Deep Retrieval team for subtask 4b of the CLEF CheckThat! 2025 competition, which focuses on retrieving relevant scientific literature for given social media posts. To address this task, we propose a hybrid retrieval pipeline that combines lexical precision, semantic generalization, and deep contextual re-ranking, enabling robust retrieval that bridges the informal-to-formal language gap. Specifically, we combine BM25-based keyword matching with a FAISS vector store using a fine-tuned INF-Retriever-v1 model for dense semantic retrieval. BM25 returns the top 30 candidates, and semantic search yields 100 candidates, which are then merged and re-ranked via a large language model (LLM)-based cross-encoder.

View on arXiv

@article{sager2025_2505.23250,
  title={ Deep Retrieval at CheckThat! 2025: Identifying Scientific Papers from Implicit Social Media Mentions via Hybrid Retrieval and Re-Ranking },
  author={ Pascal J. Sager and Ashwini Kamaraj and Benjamin F. Grewe and Thilo Stadelmann },
  journal={arXiv preprint arXiv:2505.23250},
  year={ 2025 }
}

Comments on this paper