ARSENAL: Automatic Requirements Specification Extraction from Natural Language

Natural language (supplemented with diagrams and some mathematical notations) is convenient for succinct communication of technical descriptions between the various stakeholders (e.g., customers, designers, implementers) involved in the design of software systems. However, natural language descriptions can be informal, incomplete, imprecise and ambiguous, and cannot be processed easily by design and analysis tools. Formal languages, on the other hand, formulate design requirements in a precise and unambiguous mathematical notation, but are more difficult to master and use. We propose a methodology for connecting semi-formal requirements with formal descriptions through an intermediate representation. We have implemented this methodology in a research prototype called Automatic Requirements Specification Extraction from Natural Language (ARSENAL). The main novelty of ARSENAL lies in its ability to generate a fully-specified complete formal model automatically from natural language requirements. Currently, ARSENAL generates formal models in linear-time temporal logic (LTL), but the approach can be adapted for other models, e.g., probabilistic relational models like Markov Logic Networks (MLN). The formal models of the requirements can be used to check important design and system properties, e.g., consistency, satisfiability, realizability. ARSENAL has a modular and flexible architecture that facilitates porting it from one domain to another. We evaluated ARSENAL on complex requirements from two real-world case studies: the Time-Triggered Ethernet (TTEthernet) communication platform used in space, and FAA-Isolette infant incubators used in NICU. We systematically evaluated various aspects of ARSENAL - the accuracy of the natural language processing stage, the degree of automation, and robustness to noise.
View on arXiv