ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.10653
17
0

On the Evaluation of Engineering Artificial General Intelligence

15 May 2025
Sandeep Neema
Susmit Jha
Adam Nagel
Ethan Lew
Chandrasekar Sureshkumar
Aleksa Gordic
Chase Shimmin
Hieu Nguygen
Paul Eremenko
    ELM
ArXivPDFHTML
Abstract

We discuss the challenges and propose a framework for evaluating engineering artificial general intelligence (eAGI) agents. We consider eAGI as a specialization of artificial general intelligence (AGI), deemed capable of addressing a broad range of problems in the engineering of physical systems and associated controllers. We exclude software engineering for a tractable scoping of eAGI and expect dedicated software engineering AI agents to address the software implementation challenges. Similar to human engineers, eAGI agents should possess a unique blend of background knowledge (recall and retrieve) of facts and methods, demonstrate familiarity with tools and processes, exhibit deep understanding of industrial components and well-known design families, and be able to engage in creative problem solving (analyze and synthesize), transferring ideas acquired in one context to another. Given this broad mandate, evaluating and qualifying the performance of eAGI agents is a challenge in itself and, arguably, a critical enabler to developing eAGI agents. In this paper, we address this challenge by proposing an extensible evaluation framework that specializes and grounds Bloom's taxonomy - a framework for evaluating human learning that has also been recently used for evaluating LLMs - in an engineering design context. Our proposed framework advances the state of the art in benchmarking and evaluation of AI agents in terms of the following: (a) developing a rich taxonomy of evaluation questions spanning from methodological knowledge to real-world design problems; (b) motivating a pluggable evaluation framework that can evaluate not only textual responses but also evaluate structured design artifacts such as CAD models and SysML models; and (c) outlining an automatable procedure to customize the evaluation benchmark to different engineering contexts.

View on arXiv
@article{neema2025_2505.10653,
  title={ On the Evaluation of Engineering Artificial General Intelligence },
  author={ Sandeep Neema and Susmit Jha and Adam Nagel and Ethan Lew and Chandrasekar Sureshkumar and Aleksa Gordic and Chase Shimmin and Hieu Nguygen and Paul Eremenko },
  journal={arXiv preprint arXiv:2505.10653},
  year={ 2025 }
}
Comments on this paper