Regret bound in restless (state-dependent) multi-armed bandits
Speaker: KaiTitle: Regret bound in restless (state-dependent) multi-armed bandits
Date: 29 Sep 2022 17:30-19:00 EST
Location: SEC Level 3 NW Terrace
Food: Malaysian
Upper confidence bound (UCB) style algorithm is commonly used in proving regret bounds in multi-armed bandits. One important assumption is that we can find the optimal arm (or combination) to pull within the confidence bounds. However, when arms’ reward and transitions depend on the states, finding the optimal arms is computationally intractable. I will talk about how this computation issue affects the regret analysis in state-dependent multi-armed bandits, including comparison of UCB algorithms with Thompson sampling algorithms, frequentist regret v.s. Bayesian regret, and how Lagrangian relaxation can be used in state-dependent multi-armed bandits.