Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?

Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences. One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks. Coincidentally, the NLP community has also recently found that self-generating relevant examples in the context can help large language models (LLMs) better solve a given problem than hand-crafted prompts. However, it is yet not clear whether relevance is the key factor eliciting such capability, i.e., can LLMs benefit more from self-generated relevant examples than irrelevant ones? In this work, we systematically explore whether LLMs can truly perform analogical reasoning on a diverse set of reasoning tasks. With extensive experiments and analysis, we show that self-generated random examples can surprisingly achieve comparable or even better performance on certain tasks, e.g., 4% performance boost on GSM8K with random biological examples. We find that the accuracy of self-generated examples is the key factor and subsequently design two novel methods with improved performance and significantly reduced inference costs. Overall, we aim to advance a deeper understanding of LLM analogical reasoning and hope this work stimulates further research in the design of self-generated contexts.
View on arXiv@article{qin2025_2404.12728, title={ Relevant or Random: Can LLMs Truly Perform Analogical Reasoning? }, author={ Chengwei Qin and Wenhan Xia and Tan Wang and Fangkai Jiao and Yuchen Hu and Bosheng Ding and Ruirui Chen and Shafiq Joty }, journal={arXiv preprint arXiv:2404.12728}, year={ 2025 } }