In this article, we develop a modular framework for the application of
Reinforcement Learning to the problem of Optimal Trade Execution. The framework
is designed with flexibility in mind, in order to ease the implementation of
different simulation setups. Rather than focusing on agents and optimization
methods, we focus on the environment and break down the necessary requirements
to simulate an Optimal Trade Execution under a Reinforcement Learning framework
such as data pre-processing, construction of observations, action processing,
child order execution, simulation of benchmarks, reward calculations etc. We
give examples of each component, explore the difficulties their individual
implementations \& the interactions between them entail, and discuss the
different phenomena that each component induces in the simulation, highlighting
the divergences between the simulation and the behavior of a real market. We
showcase our modular implementation through a setup that, following a
Time-Weighted Average Price (TWAP) order submission schedule, allows the agent
to exclusively place limit orders, simulates their execution via iterating over
snapshots of the Limit Order Book (LOB), and calculates rewards as the \improvementoverthepriceachievedbyaTWAPbenchmarkalgorithmfollowingthesameschedule.Wealsodevelopevaluationproceduresthatincorporateiterativere−trainingandevaluationofagivenagentoverintervalsofatraininghorizon,mimickinghowanagentmaybehavewhenbeingcontinuouslyretrainedasnewmarketdatabecomesavailableandemulatingthemonitoringpracticesthatalgorithmprovidersareboundtoperformundercurrentregulatoryframeworks.