The rapid advancement of artificial intelligence, particularly autonomous agentic systems based on Large Language Models (LLMs), presents new opportunities to accelerate drug discovery by improving in-silico modeling and reducing dependence on costly experimental trials. Current AI agent-based systems demonstrate proficiency in solving programming challenges and conducting research, indicating an emerging potential to develop software capable of addressing complex problems such as pharmaceutical design and drug discovery. This paper introduces DO Challenge, a benchmark designed to evaluate the decision-making abilities of AI agents in a single, complex problem resembling virtual screening scenarios. The benchmark challenges systems to independently develop, implement, and execute efficient strategies for identifying promising molecular structures from extensive datasets, while navigating chemical space, selecting models, and managing limited resources in a multi-objective context. We also discuss insights from the DO Challenge 2025, a competition based on the proposed benchmark, which showcased diverse strategies explored by human participants. Furthermore, we present the Deep Thought multi-agent system, which demonstrated strong performance on the benchmark, outperforming most human teams. Among the language models tested, Claude 3.7 Sonnet, Gemini 2.5 Pro and o3 performed best in primary agent roles, and GPT-4o, Gemini 2.0 Flash were effective in auxiliary roles. While promising, the system's performance still fell short of expert-designed solutions and showed high instability, highlighting both the potential and current limitations of AI-driven methodologies in transforming drug discovery and broader scientific research.
View on arXiv@article{smbatyan2025_2504.19912, title={ Can AI Agents Design and Implement Drug Discovery Pipelines? }, author={ Khachik Smbatyan and Tsolak Ghukasyan and Tigran Aghajanyan and Hovhannes Dabaghyan and Sergey Adamyan and Aram Bughdaryan and Vahagn Altunyan and Gagik Navasardyan and Aram Davtyan and Anush Hakobyan and Aram Gharibyan and Arman Fahradyan and Artur Hakobyan and Hasmik Mnatsakanyan and Narek Ginoyan and Garik Petrosyan }, journal={arXiv preprint arXiv:2504.19912}, year={ 2025 } }