43
0
v1v2v3v4 (latest)

Using reinforcement learning to autonomously identify sources of error for agents in group missions

Abstract

When agents swarm to execute a mission, some of them frequently exhibit sudden failure, as observed from the command base. It is generally difficult to determine whether a failure is caused by actuators (hypothesis, hah_a) or sensors (hypothesis, hsh_s) by solely relying on the communication between the command base and concerning agent. However, by instigating collusion between the agents, the cause of failure can be identified; in other words, we expect to detect corresponding displacements for hah_a but not for hsh_s. In this study, we considered the question as to whether artificial intelligence can autonomously generate an action plan g\boldsymbol{g} to pinpoint the cause as aforedescribed. Because the expected response to g\boldsymbol{g} generally depends upon the adopted hypothesis [let the difference be denoted by D(g)D(\boldsymbol{g})], a formulation that uses D(g)D\left(\boldsymbol{g}\right) to pinpoint the cause can be made. Although a g\boldsymbol{g}^* that maximizes D(g)D(\boldsymbol{g}) would be a suitable action plan for this task, such an optimization is difficult to achieve using the conventional gradient method, as D(g)D(\boldsymbol{g}) becomes nonzero in rare events such as collisions with other agents, and most swarm actions g\boldsymbol{g} give D(g)=0D(\boldsymbol{g})=0. In other words, throughout almost the entire space of g\boldsymbol{g}, D(g)D(\boldsymbol{g}) has zero gradient, and the gradient method is not applicable. To overcome this problem, we formulated an action plan using Q-table reinforcement learning. Surprisingly, the optimal action plan generated via reinforcement learning presented a human-like solution to pinpoint the problem by colliding other agents with the failed agent. Using this simple prototype, we demonstrated the potential of applying Q-table reinforcement learning methods to plan autonomous actions to pinpoint the causes of failure.

View on arXiv
Comments on this paper