In this paper, we introduce the Multilingual Moral Reasoning Benchmark (MMRB) to evaluate the moral reasoning abilities of large language models (LLMs) across five typologically diverse languages and three levels of contextual complexity: sentence, paragraph, and document. Our results show moral reasoning performance degrades with increasing context complexity, particularly for low-resource languages such as Vietnamese. We further fine-tune the open-source LLaMA-3-8B model using curated monolingual data for alignment and poisoning. Surprisingly, low-resource languages have a stronger impact on multilingual reasoning than high-resource ones, highlighting their critical role in multilingual NLP.
View on arXiv@article{zhou2025_2504.19759, title={ Moral Reasoning Across Languages: The Critical Role of Low-Resource Languages in LLMs }, author={ Huichi Zhou and Zehao Xu and Munan Zhao and Kaihong Li and Yiqiang Li and Hongtao Wang }, journal={arXiv preprint arXiv:2504.19759}, year={ 2025 } }