Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT

Data has growing significance in exploring cutting-edge materials, and the number of datasets has been generated either by hand or automated approaches. However, the materials science field struggles to effectively utilize the abundance of generated data, especially in applied disciplines where materials are evaluated based on device performance rather than their properties. This article presents a new NLP task called structured information inference (SII) to address the complexities of information extraction at the device level in materials science. We accomplished this task by tuning GPT-3 on an existing perovskite solar cell FAIR (Findable, Accessible, Interoperable, Reusable) dataset with 91.8% F1-score and we updated the dataset with all related scientific papers up to now. The produced data is formatted and normalized, enabling its direct utilization as input in subsequent data analysis. This feature will enable materials scientists to develop their own models by selecting high-quality review papers within their domain. Furthermore, we designed experiments to predict solar cells' electrical performance and design materials or devices with target parameters through LLM. We obtained comparable performance with traditional machine learning methods without feature selection, demonstrating the potential of LLMs to learn scientific knowledge and design new materials like a materials scientist.
View on arXiv