Error message

Seminar
Speaker
P. N. Karthik (National University of Singapore)
Date & Time
Mon, 14 August 2023, 14:00 to 15:00
Venue
Emmy Noether Seminar Room & Online
Resources
Abstract

In this talk, I will describe some recent results on the problem of identifying the best arm in a restless multi-armed bandit with fixed confidence. Formally, the setting comprises a multi-armed bandit with finitely many arms, in which each arm is a homogenous and discrete-time Markov chain taking values in a common, finite state space. The state transitions on any Markov chain are governed by an ergodic transition probability matrix (TPM) that is parameterised by an unknown, real-valued parameter. Given a reward function defined on the common state space of the arms, the best arm is the arm with the largest stationary reward. The goal is to find the best arm by minimising the expected stopping time, subject to an upper bound on the error probability (fixed-confidence regime). For this problem, our results are in the form of (a) an asymptotic lower bound on the growth rate of the expected stopping time is, where the asymptotics is as the error probability vanishes, and (b) a policy for best arm identification whose expected stopping time satisfies an asymptotic growth rate that matches with the lower bound and is hence asymptotically optimal. We use results from best policy identification in MDPs to establish asymptotic optimality. Prior works deal with independent observations from the arms, rested Markov arms, and restless Markov arms with known arm TPMs. Our work is the first to study best arm identification in restless bandits with unknown arm TPMs.

This is joint work with Vincent Tan (NUS), Ali Tajer (RPI), and Arpan Mukherjee (RPI).

Zoom link: https://icts-res-in.zoom.us/j/83701102480?pwd=eHpjUHJiT3BYRXlnb2ErK09rWXBWQT09
Meeting ID: 837 0110 2480
Passcode: 141415

This is part of the Bangalore Probability Seminar Series. For details of past and upcoming seminars kindly see  Link