Truckload brokerages, a 100billion/yearindustryintheU.S.,playsthecriticalroleofmatchingshipperswithcarriers,oftentomoveloadsseveraldaysintothefuture.Brokeragesnotonlyhavetofindcompaniesthatwillagreetomoveaload,thebrokerageoftenhastofindapricethatboththeshipperandcarrierwillagreeto.Thepricenotonlyvariesbyshipperandcarrier,butalsobythetrafficlanesandothervariablessuchascommoditytype.Brokerageshavetolearnaboutshipperandcarrierresponsefunctionsbyofferingapriceandobservingwhethereachacceptsthequote.Weproposeaknowledgegradientpolicywithbootstrapaggregationforhigh−dimensionalcontextualsettingstoguidepriceexperimentationbymaximizingthevalueofinformation.Thelearningpolicyistestedusingacarefullycalibratedfleetsimulatorthatincludesastochasticlookaheadpolicythatsimulatesfleetmovements,aswellasthestochasticmodelingofdriverassignmentsandthecarrier′sloadcommitmentpolicieswithadvancebooking.