We investigate causal computations taking sequences of inputs to sequences of outputs where the th output depends on the first inputs only. We model these in category theory via a construction taking a Cartesian category to another category with a novel trace-like operation called "delayed trace", which misses yanking and dinaturality axioms of the usual trace. The delayed trace operation provides a feedback mechanism in with an implicit guardedness guarantee. When is equipped with a Cartesian differential operator, we construct a differential operator for using an abstract version of backpropagation through time, a technique from machine learning based on unrolling of functions. This obtains a swath of properties for backpropagation through time, including a chain rule and Schwartz theorem. Our differential operator is also able to compute the derivative of a stateful network without requiring the network to be unrolled.
View on arXiv