Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel

Mastropietro, Andrea; Feldmann, Christian; Bajorath, Jürgen

doi:10.1038/s41598-023-46930-2

Machine learning (ML) algorithms are extensively used in pharmaceutical research. Most ML models have black-box character, thus preventing the interpretation of predictions. However, rationalizing model decisions is of critical importance if predictions should aid in experimental design. Accordingly, in interdisciplinary research, there is growing interest in explaining ML models. Methods devised for this purpose are a part of the explainable artificial intelligence (XAI) spectrum of approaches. In XAI, the Shapley value concept originating from cooperative game theory has become popular for identifying features determining predictions. The Shapley value concept has been adapted as a model-agnostic approach for explaining predictions. Since the computational time required for Shapley value calculations scales exponentially with the number of features used, local approximations such as Shapley additive explanations (SHAP) are usually required in ML. The support vector machine (SVM) algorithm is one of the most popular ML methods in pharmaceutical research and beyond. SVM models are often explained using SHAP. However, there is only limited correlation between SHAP and exact Shapley values, as previously demonstrated for SVM calculations using the Tanimoto kernel, which limits SVM model explanation. Since the Tanimoto kernel is a special kernel function mostly applied for assessing chemical similarity, we have developed the Shapley value-expressed radial basis function (SVERAD), a computationally efficient approach for the calculation of exact Shapley values for SVM models based upon radial basis function kernels that are widely applied in different areas. SVERAD is shown to produce meaningful explanations of SVM predictions.

Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel / Mastropietro, Andrea; Feldmann, Christian; Bajorath, Jürgen. - In: SCIENTIFIC REPORTS. - ISSN 2045-2322. - 13:1(2023). [10.1038/s41598-023-46930-2]

Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel

Mastropietro, Andrea^Primo;Feldmann, Christian^Secondo;

2023

Abstract

Machine learning (ML) algorithms are extensively used in pharmaceutical research. Most ML models have black-box character, thus preventing the interpretation of predictions. However, rationalizing model decisions is of critical importance if predictions should aid in experimental design. Accordingly, in interdisciplinary research, there is growing interest in explaining ML models. Methods devised for this purpose are a part of the explainable artificial intelligence (XAI) spectrum of approaches. In XAI, the Shapley value concept originating from cooperative game theory has become popular for identifying features determining predictions. The Shapley value concept has been adapted as a model-agnostic approach for explaining predictions. Since the computational time required for Shapley value calculations scales exponentially with the number of features used, local approximations such as Shapley additive explanations (SHAP) are usually required in ML. The support vector machine (SVM) algorithm is one of the most popular ML methods in pharmaceutical research and beyond. SVM models are often explained using SHAP. However, there is only limited correlation between SHAP and exact Shapley values, as previously demonstrated for SVM calculations using the Tanimoto kernel, which limits SVM model explanation. Since the Tanimoto kernel is a special kernel function mostly applied for assessing chemical similarity, we have developed the Shapley value-expressed radial basis function (SVERAD), a computationally efficient approach for the calculation of exact Shapley values for SVM models based upon radial basis function kernels that are widely applied in different areas. SVERAD is shown to produce meaningful explanations of SVM predictions.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Parole chiave
	
				chemoinformatics; support vector machines; explainable artificial intelligence; shapley values
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel / Mastropietro, Andrea; Feldmann, Christian; Bajorath, Jürgen. - In: SCIENTIFIC REPORTS. - ISSN 2045-2322. - 13:1(2023). [10.1038/s41598-023-46930-2]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Mastropietro_Calculation_2023.pdf accesso aperto Note: https://www.nature.com/articles/s41598-023-46930-2.pdf Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 1.5 MB Formato Adobe PDF	1.5 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1691795

Citazioni

6

22

22

Catalogo dei prodotti della ricerca