Applications of reinforcement learning (RL) for supporting, managing and improving decision-making are becoming increasingly popular in a variety of medicine and healthcare domains where the problem has a sequential nature. By continuously interacting with the underlying environment, RL techniques are able to learn by trial-and-error on how to take better actions in order to maximize an outcome of interest over time. However, if on one hand RL offers a new powerful framework, on the other hand it poses some unique challenges for data analysis and interpretability, which call for new statistical techniques in both predictive and descriptive learning. Notably, several methodological challenges, for which the contribution of the biostatistical community may play a crucial role, limit the use of RL in real life. In an aim to bridge the statistics and RL communities, we start by assimilating the different existing RL terminologies, notations and approaches into a coherent body of work, and by translating them from a machine learning (ML) to a statistical perspective. Then, through a comprehensive methodological review, we report and discuss the state-of-the-art RL-based research in healthcare. Two main applied domains emerged: 1) adaptive interventions (AIs), encompassing both dynamic treatment regimes and just-in-time adaptive interventions in mobile health (mHealth); and 2) adaptive designs of clinical trials, specifically dose-finding designs and adaptive randomization. We illustrate existing RL-based methods in these areas, discussing their benefits and existing open problems that may impact their application in real life. A major barrier to adopting RL in real-world experiments is the lack of clarity on how statistical analyses and inference are impacted. In clinical trials for example, if on one side, to achieve the practical (and more ethical) goal of improving patients’ benefits, RL may have better abilities in terms of maximising clinical outcomes by adaptively randomizing participants to the best evidence-based treatment; on the other side, to achieve the scientific goal of e.g., discovering whether one treatment is more effective compared to a control treatment, less is known about their inferential properties. Through a simulation study, we investigate the challenges of conducting hypothesis testing from data collected through a class of RL, i.e., multi-armed bandits (MABs), outlining the harms MAB algorithms can cause to traditional statistical tests’ type-I error and power. This empirical evaluation provides guidance to two alternative ways of pursuing improved statistical hypothesis testing: 1) to explore ways of modifying the test statistic using knowledge of the adaptive data collection nature; 2) to modify the algorithm or framework for a more sensitive problem to both statistical inference as well as reward maximization. Focusing on the Thompson Sampling (a randomized MAB strategy), we show how a modified version of it results in an optimal intermediate between these two objectives. These findings can provide insights into how challenges can be surmounted by bridging machine learning, statistics, and applied sciences, to conduct adaptive experiments in the real-world, aiming to simultaneously help individuals and advance scientific research. We finally combine our methodological knowledge with a motivating mHealth study for improving physical activity, to illustrate the tremendous collaboration opportunities between statistics and RL researchers in the space of developing adaptive interventions into the increasingly growing area of mHealth.

Reinforcement learning in modern biostatistics: benefits, challenges and new proposals / Deliu, Nina. - (2021 May 24).

Reinforcement learning in modern biostatistics: benefits, challenges and new proposals

DELIU, NINA
24/05/2021

Abstract

Applications of reinforcement learning (RL) for supporting, managing and improving decision-making are becoming increasingly popular in a variety of medicine and healthcare domains where the problem has a sequential nature. By continuously interacting with the underlying environment, RL techniques are able to learn by trial-and-error on how to take better actions in order to maximize an outcome of interest over time. However, if on one hand RL offers a new powerful framework, on the other hand it poses some unique challenges for data analysis and interpretability, which call for new statistical techniques in both predictive and descriptive learning. Notably, several methodological challenges, for which the contribution of the biostatistical community may play a crucial role, limit the use of RL in real life. In an aim to bridge the statistics and RL communities, we start by assimilating the different existing RL terminologies, notations and approaches into a coherent body of work, and by translating them from a machine learning (ML) to a statistical perspective. Then, through a comprehensive methodological review, we report and discuss the state-of-the-art RL-based research in healthcare. Two main applied domains emerged: 1) adaptive interventions (AIs), encompassing both dynamic treatment regimes and just-in-time adaptive interventions in mobile health (mHealth); and 2) adaptive designs of clinical trials, specifically dose-finding designs and adaptive randomization. We illustrate existing RL-based methods in these areas, discussing their benefits and existing open problems that may impact their application in real life. A major barrier to adopting RL in real-world experiments is the lack of clarity on how statistical analyses and inference are impacted. In clinical trials for example, if on one side, to achieve the practical (and more ethical) goal of improving patients’ benefits, RL may have better abilities in terms of maximising clinical outcomes by adaptively randomizing participants to the best evidence-based treatment; on the other side, to achieve the scientific goal of e.g., discovering whether one treatment is more effective compared to a control treatment, less is known about their inferential properties. Through a simulation study, we investigate the challenges of conducting hypothesis testing from data collected through a class of RL, i.e., multi-armed bandits (MABs), outlining the harms MAB algorithms can cause to traditional statistical tests’ type-I error and power. This empirical evaluation provides guidance to two alternative ways of pursuing improved statistical hypothesis testing: 1) to explore ways of modifying the test statistic using knowledge of the adaptive data collection nature; 2) to modify the algorithm or framework for a more sensitive problem to both statistical inference as well as reward maximization. Focusing on the Thompson Sampling (a randomized MAB strategy), we show how a modified version of it results in an optimal intermediate between these two objectives. These findings can provide insights into how challenges can be surmounted by bridging machine learning, statistics, and applied sciences, to conduct adaptive experiments in the real-world, aiming to simultaneously help individuals and advance scientific research. We finally combine our methodological knowledge with a motivating mHealth study for improving physical activity, to illustrate the tremendous collaboration opportunities between statistics and RL researchers in the space of developing adaptive interventions into the increasingly growing area of mHealth.
24-mag-2021
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Deliu.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.98 MB
Formato Adobe PDF
2.98 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1581572
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact