In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler’s objective is to maximize his cumulative expected earnings over some given horizon of play T. To do this, the gambler needs to acquire information about arms (exploration) while simultaneously optimizing immediate rewards (exploitation); the price paid due to this trade off is often referred to as the regret, and the main question is how small can this price be as a function of the horizon length T. This problem has been studied extensively when the reward distributions do not change over time; an assumption that supports a sharp characterization of the regret, yet is often violated in practical settings. In this paper, we focus on a MAB formulation which allows for a broad range of temporal uncertainties in the rewards, while still maintaining mathematical tractability. We fully characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward “variation” and the minimal achievable regret, and by establishing a connection between the adversarial and the stochastic MAB frameworks.
-
Faculty
- Academic Areas
- Awards & Honors
- Seminars
-
Conferences
- Accounting Summer Camp
- California Econometrics Conference
- California Quantitative Marketing PhD Conference
- California School Conference
- China India Insights Conference
- Homo economicus, Evolving
-
Initiative on Business and Environmental Sustainability
- Political Economics (2023–24)
- Scaling Geologic Storage of CO2 (2023–24)
- A Resilient Pacific: Building Connections, Envisioning Solutions
- Adaptation and Innovation
- Changing Climate
- Civil Society
- Climate Impact Summit
- Climate Science
- Corporate Carbon Disclosures
- Earth’s Seafloor
- Environmental Justice
- Finance
- Marketing
- Operations and Information Technology
- Organizations
- Sustainability Reporting and Control
- Taking the Pulse of the Planet
- Urban Infrastructure
- Watershed Restoration
- Junior Faculty Workshop on Financial Regulation and Banking
- Ken Singleton Celebration
- Marketing Camp
- Quantitative Marketing PhD Alumni Conference
- Rising Scholars Conference
- Theory and Inference in Accounting Research
- Voices
- Publications
- Books
- Working Papers
- Case Studies
-
Research Labs & Initiatives
- Cities, Housing & Society Lab
- Corporate Governance Research Initiative
- Corporations and Society Initiative
- Golub Capital Social Impact Lab
- Policy and Innovation Initiative
- Rapid Decarbonization Initiative
- Stanford Latino Entrepreneurship Initiative
- Value Chain Innovation Initiative
- Venture Capital Initiative
- Behavioral Lab
- Data, Analytics & Research Computing