36
9

Automating the Enterprise with Foundation Models

Abstract

Automating enterprise workflows could unlock 4trillion/yearinproductivitygains.Despitebeingofinteresttothedatamanagementcommunityfordecades,theultimatevisionofendtoendworkflowautomationhasremainedelusive.Currentsolutionsrelyonprocessminingandroboticprocessautomation(RPA),inwhichabotishardcodedtofollowasetofpredefinedrulesforcompletingaworkflow.ThroughcasestudiesofahospitalandlargeB2Benterprise,wefindthattheadoptionofRPAhasbeeninhibitedbyhighsetupcosts(1218months),unreliableexecution(60maintenance(requiringmultipleFTEs).Multimodalfoundationmodels(FMs)suchasGPT4offerapromisingnewapproachforendtoendworkflowautomationgiventheirgeneralizedreasoningandplanningabilities.TostudythesecapabilitiesweproposeECLAIR,asystemtoautomateenterpriseworkflowswithminimalhumansupervision.WeconductinitialexperimentsshowingthatmultimodalFMscanaddressthelimitationsoftraditionalRPAwith(1)nearhumanlevelunderstandingofworkflows(93understandingtask)and(2)instantsetupwithminimaltechnicalbarrier(basedsolelyonanaturallanguagedescriptionofaworkflow,ECLAIRachievesendtoendcompletionratesof40validation,andselfimprovementasopenchallenges,andsuggestwaystheycanbesolvedwithdatamanagementtechniques.Codeisavailableat:https://github.com/HazyResearch/eclairagents4 trillion/year in productivity gains. Despite being of interest to the data management community for decades, the ultimate vision of end-to-end workflow automation has remained elusive. Current solutions rely on process mining and robotic process automation (RPA), in which a bot is hard-coded to follow a set of predefined rules for completing a workflow. Through case studies of a hospital and large B2B enterprise, we find that the adoption of RPA has been inhibited by high set-up costs (12-18 months), unreliable execution (60% initial accuracy), and burdensome maintenance (requiring multiple FTEs). Multimodal foundation models (FMs) such as GPT-4 offer a promising new approach for end-to-end workflow automation given their generalized reasoning and planning abilities. To study these capabilities we propose ECLAIR, a system to automate enterprise workflows with minimal human supervision. We conduct initial experiments showing that multimodal FMs can address the limitations of traditional RPA with (1) near-human-level understanding of workflows (93% accuracy on a workflow understanding task) and (2) instant set-up with minimal technical barrier (based solely on a natural language description of a workflow, ECLAIR achieves end-to-end completion rates of 40%). We identify human-AI collaboration, validation, and self-improvement as open challenges, and suggest ways they can be solved with data management techniques. Code is available at: https://github.com/HazyResearch/eclair-agents

View on arXiv
Comments on this paper