Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses
30 Pages Posted: 8 Apr 2019 Last revised: 11 May 2019
Date Written: February 26, 2019
Abstract
The recent rising popularity of ultra-fast delivery services on retail platforms fuels the increasing use of urban warehouses, whose proximity to customers makes fast deliveries viable. The space limit in urban warehouses poses a problem for the online retailers: the number of products (SKUs) they carry is no longer "the more, the better", yet it can still be significantly large, reaching hundreds or thousands in a product category. In this paper, we study algorithms for dynamically selecting a large number of products (i.e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers' ultra-fast delivery platforms.
We distill the product selection problem into a semi-bandit model with linear generalization. There are in total N different arms, each with a feature vector of dimension d. The player pulls K arms in each period and observes the bandit feedback from each of the pulled arms. We focus on the setting where K is much greater than the number of total time periods T or the dimension of product features d. We first analyze a standard UCB algorithm and show its regret bound can be expressed as the sum of a T-independent part Õ(Kd3/2) and a T-dependent part Õ(d √(KT)), which we refer to as "fixed cost" and "variable cost" respectively. To reduce the fixed cost for large K values, we propose a novel online learning algorithm, which iteratively shrinks the upper confidence bounds within each period, and show its fixed cost is reduced by a factor of d to Õ(K √(d)). Moreover, we test the algorithms on an industrial dataset from Alibaba Group. Experimental results show that our new algorithm reduces the total regret of the standard UCB algorithm by at least 10%.
Keywords: sequential decision making, adaptive product selection, online learning, online retailing, stochastic optimization, regret analysis
Suggested Citation: Suggested Citation