ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.08976
16
1

Reinforcement Learning for Dynamic Bidding in Truckload Markets: an Application to Large-Scale Fleet Management with Advance Commitments

25 February 2018
Yingfei Wang
J. Nascimento
Warrren B Powell
ArXivPDFHTML
Abstract

Truckload brokerages, a 100billion/yearindustryintheU.S.,playsthecriticalroleofmatchingshipperswithcarriers,oftentomoveloadsseveraldaysintothefuture.Brokeragesnotonlyhavetofindcompaniesthatwillagreetomoveaload,thebrokerageoftenhastofindapricethatboththeshipperandcarrierwillagreeto.Thepricenotonlyvariesbyshipperandcarrier,butalsobythetrafficlanesandothervariablessuchascommoditytype.Brokerageshavetolearnaboutshipperandcarrierresponsefunctionsbyofferingapriceandobservingwhethereachacceptsthequote.Weproposeaknowledgegradientpolicywithbootstrapaggregationforhigh−dimensionalcontextualsettingstoguidepriceexperimentationbymaximizingthevalueofinformation.Thelearningpolicyistestedusingacarefullycalibratedfleetsimulatorthatincludesastochasticlookaheadpolicythatsimulatesfleetmovements,aswellasthestochasticmodelingofdriverassignmentsandthecarrier′sloadcommitmentpolicieswithadvancebooking.100 billion/year industry in the U.S., plays the critical role of matching shippers with carriers, often to move loads several days into the future. Brokerages not only have to find companies that will agree to move a load, the brokerage often has to find a price that both the shipper and carrier will agree to. The price not only varies by shipper and carrier, but also by the traffic lanes and other variables such as commodity type. Brokerages have to learn about shipper and carrier response functions by offering a price and observing whether each accepts the quote. We propose a knowledge gradient policy with bootstrap aggregation for high-dimensional contextual settings to guide price experimentation by maximizing the value of information. The learning policy is tested using a carefully calibrated fleet simulator that includes a stochastic lookahead policy that simulates fleet movements, as well as the stochastic modeling of driver assignments and the carrier's load commitment policies with advance booking.100billion/yearindustryintheU.S.,playsthecriticalroleofmatchingshipperswithcarriers,oftentomoveloadsseveraldaysintothefuture.Brokeragesnotonlyhavetofindcompaniesthatwillagreetomoveaload,thebrokerageoftenhastofindapricethatboththeshipperandcarrierwillagreeto.Thepricenotonlyvariesbyshipperandcarrier,butalsobythetrafficlanesandothervariablessuchascommoditytype.Brokerageshavetolearnaboutshipperandcarrierresponsefunctionsbyofferingapriceandobservingwhethereachacceptsthequote.Weproposeaknowledgegradientpolicywithbootstrapaggregationforhigh−dimensionalcontextualsettingstoguidepriceexperimentationbymaximizingthevalueofinformation.Thelearningpolicyistestedusingacarefullycalibratedfleetsimulatorthatincludesastochasticlookaheadpolicythatsimulatesfleetmovements,aswellasthestochasticmodelingofdriverassignmentsandthecarrier′sloadcommitmentpolicieswithadvancebooking.

View on arXiv
Comments on this paper