Deep reinforcement learning (DRL) models have shown great promise in various applications, but their practical adoption in critical domains is limited due to their opaque decision-making processes. To address this challenge, explainable AI (XAI) techniques aim to enhance transparency and interpretability of black-box models. However, most current interpretable systems focus on supervised learning problems, leaving reinforcement learning relatively unexplored. This paper extends the work of PW-Net, an interpretable wrapper model for DRL agents inspired by image classification methodologies. We introduce Shared-PW-Net, an interpretable deep learning model that features a fully trainable prototype layer. Unlike PW-Net, Shared-PW-Net does not rely on pre-existing prototypes. Instead, it leverages the concept of ProtoPool to automatically learn general prototypes assigned to actions during training. Additionally, we propose a novel prototype initialization method that significantly improves the model’s performance. Through extensive experimentation, we demonstrate that our Shared-PW-Net achieves the same reward performance as existing methods without requiring human intervention. Our model’s fully trainable prototype layer, coupled with the innovative prototype initialization approach, contributes to a clearer and more interpretable decision-making process. The code for this work is publicly available for further exploration and applications.
Understanding Deep RL agent decisions: a novel interpretable approach with trainable prototypes / Borzillo, Caterina; Ragno, Alessio; Capobianco, Roberto. - (2023). (Intervento presentato al convegno XAI.it 2023: Italian Workshop on Explainable Artificial Intelligence 2023 tenutosi a Rome).
Understanding Deep RL agent decisions: a novel interpretable approach with trainable prototypes
Alessio Ragno;Roberto Capobianco
2023
Abstract
Deep reinforcement learning (DRL) models have shown great promise in various applications, but their practical adoption in critical domains is limited due to their opaque decision-making processes. To address this challenge, explainable AI (XAI) techniques aim to enhance transparency and interpretability of black-box models. However, most current interpretable systems focus on supervised learning problems, leaving reinforcement learning relatively unexplored. This paper extends the work of PW-Net, an interpretable wrapper model for DRL agents inspired by image classification methodologies. We introduce Shared-PW-Net, an interpretable deep learning model that features a fully trainable prototype layer. Unlike PW-Net, Shared-PW-Net does not rely on pre-existing prototypes. Instead, it leverages the concept of ProtoPool to automatically learn general prototypes assigned to actions during training. Additionally, we propose a novel prototype initialization method that significantly improves the model’s performance. Through extensive experimentation, we demonstrate that our Shared-PW-Net achieves the same reward performance as existing methods without requiring human intervention. Our model’s fully trainable prototype layer, coupled with the innovative prototype initialization approach, contributes to a clearer and more interpretable decision-making process. The code for this work is publicly available for further exploration and applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.