Achieving safe and autonomous glycemic regulation for Type-1 Diabetes care is an urgent challenge. Although Reinforcement Learning (RL) emerged as a promising paradigm, practical deployment is hindered by the risk of uncontrolled hyperglycemia or hypoglycemia. This work adapts two safe deep RL approaches in the context of automated insulin delivery. The first consists of a Lagrangian constrained Markov decision process that solves a primal–dual scheme with adaptive multipliers, thereby delivering constraint satisfaction in expectation; the second adopts a Barrier–Lyapunov Actor–Critic framework that embeds discrete-time control-barrier conditions and Lyapunov decrease into the learning updates, ensuring stepwise feasibility and promoting stability by design. Simulations under randomized meal timing and size, benchmarked against a standard clinical practice protocol and an unconstrained DRL baseline, indicate improved time-in-range with reduced hypoglycemic events.
Safe Deep Reinforcement Learning Control of Type 1 Diabetes / Baldisseri, Federico; Lops, Giada; Atanasious, Mohab M. H.; Menegatti, Danilo; Becchetti, Valentina; Delli Priscoli, Francesco; Mascolo, Saverio; Wrona, Andrea. - (2026). (Intervento presentato al convegno 2026 European Control Conference (ECC) tenutosi a Rekjavik).
Safe Deep Reinforcement Learning Control of Type 1 Diabetes
Federico BALDISSERI;Mohab M. H. ATANASIOUS;Danilo MENEGATTI;Valentina BECCHETTI;Francesco DELLI PRISCOLI;Saverio MASCOLO;Andrea WRONA
2026
Abstract
Achieving safe and autonomous glycemic regulation for Type-1 Diabetes care is an urgent challenge. Although Reinforcement Learning (RL) emerged as a promising paradigm, practical deployment is hindered by the risk of uncontrolled hyperglycemia or hypoglycemia. This work adapts two safe deep RL approaches in the context of automated insulin delivery. The first consists of a Lagrangian constrained Markov decision process that solves a primal–dual scheme with adaptive multipliers, thereby delivering constraint satisfaction in expectation; the second adopts a Barrier–Lyapunov Actor–Critic framework that embeds discrete-time control-barrier conditions and Lyapunov decrease into the learning updates, ensuring stepwise feasibility and promoting stability by design. Simulations under randomized meal timing and size, benchmarked against a standard clinical practice protocol and an unconstrained DRL baseline, indicate improved time-in-range with reduced hypoglycemic events.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


