Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients

Curia, Francesco

doi:10.1016/j.health.2021.100001

Clinical Decision Support Systems (CDSS) that use machine learning techniques and their broadest sense of artificial intelligence (AI) must be interpretable and transparent. The lack of transparency instead of providing support could instead become a factor of indecision and obstacle. In this work, a very complex and important problem from a clinical point of view is tackled, namely the pathology known as Dry Eye Disease (DED), starting from a case-control study on an HIV-positive population and a healthy part of it. The case study is faced on two fronts, the first in which an ensemble-based clustering algorithm is built. Secondly, this algorithm is broken down to analyze each component, making the analysis method transparent and interpretable. Specifically, an ensemble of clustering algorithms is presented, such as k-means, agglomerative, spectral, and birch, which are combined and used in two levels: in the first, the labels are obtained from each clusterizer to recognize significant patterns of the two populations affected by the DED pathology, in the presence of HIV and not. Subsequently, the labels obtained at the first level are used as inputs on which the clusterizers are used again, whose outputs in the final phase serve as a training data set for a supervised method (i.e., logistic regression, decision trees, neural network, etc.), to evaluate every single component separately, through the use of features importance techniques (i.e., decision trees, LASSO regression, Gini Importance (GI), Variable Importance (VI), etc.). In this way, each clustering algorithm used at the first level can be considered a new feature in the next one and evaluate its individual contribution. Furthermore, each characteristic is interpreted through specific methods of the relevance of the characteristics to make the decision support tool as complete as possible. The performance of the methods used in training, both supervised and unsupervised, are evaluated through appropriate metrics, such as the well-known measures of precision, recall, accuracy, and homogeneity. Clustering methods provide results on the groups created and on the influence of features (cytokines) in the two populations examined. The experimental results obtained concerning the association between the development of the DED pathology and the presence or absence of HIV in these patients, and the influence that certain factors have on this problem, are interpreted with methods that are part of that branch known as Explainable AI (i.e., Local Interpretable Model-agnostic Explanations (LIME), Shapley, Individual Conditional Expectation (ICE), etc.). Besides explaining the influence exerted by certain features, the methods used provide both a global and local view on how each factor influences the final probability associated with the possible development of the pathology. The practical implications in using this method can be of support to the clinical diagnoses carried out on the patients examined to evaluate how each factor can be responsible for the possible development of the disease and therefore taken individually in the treatment. To date, the analytical techniques used in the study of this pathology have always provided generalized results, while breaking down the problem and isolating the components could provide valuable information to clinical operators.

Features and explainable methods for cytokines analysis of Dry Eye Disease in HIV infected patients / Curia, Francesco. - In: HEALTHCARE ANALYTICS. - ISSN 2772-4425. - 1:(2021). [10.1016/j.health.2021.100001]