主讲人简介: | Chengchun Shi is an Associate Professor at London School of Economics and Political Science. He is serving as the associate editors of JRSSB, JASA (T & M) and Journal of Nonparametric Statistics. His research focuses on developing statistical learning methods in reinforcement learning, with applications to healthcare, ridesharing, video-sharing and neuroimaging. He was the recipient of the Royal Statistical Society Research Prize in 2021 and IMS Tweedie Award in 2024. |
讲座简介: | Time series experiments, in which experimental units receive a sequence of treatments over time, are prevalent in technological companies, including ride sharing platforms and trading companies. These companies frequently employ such experiments for A/B testing, to evaluate the performance of a newly developed policy, product, or treatment relative to a baseline control. Many existing solutions require that the experimental environment be fully observed to ensure the data collected satisfies the Markov assumption. This condition, however, is often violated in real-world scenarios. Such gap between theoretical assumptions and practical realities challenges the reliability of existing approaches and calls for more rigorous investigations of A/B testing procedures.
In this paper, we study the optimal experimental design for A/B testing in partially observable environments. We introduce a controlled (vector) autoregressive moving average model to effectively capture a rich class of partially observable environments. Within this framework, we derive closed-form expressions, i.e., efficiency indicators, to assess the statistical efficiency of various sequential experimental designs in estimating the average treatment effect (ATE). A key innovation of our approach lies in the introduction of a weak signal assumption, which significantly simplifies the computation of the asymptotic mean squared errors of ATE estimators in time series experiments. We next proceed to develop two data-driven algorithms to estimate the optimal design: one utilizing constrained optimization, and the other employing reinforcement learning. We demonstrate the superior performance of our designs using a dispatch simulator and two real datasets from a ride-sharing company.
|