Ellenberg indicator values (EIV) are widely used in vegetation ecology, but the values for many species in Southeastern Europe are not available due to incomplete knowledge of their ecology: it is therefore of paramount importance to estimate missing values in existing databases. The entire EIV set for a single species can be missing or a single EIV can be missing for species for which other indicator values are available. Our aim here is to provide a simple method to impute missing values for species who have missing data in a single or multiple EIV. For this purpose, we adopt a multiple imputation procedure and compare a number of imputation methods on the basis of two datasets: i) “indices”, the set of 9 Ellenberg indicators taken from literature, available for 10,824 species and ii) “vegetation”, a set describing the physical and climatic characteristics (Light, Temperature, Continentality, Soil moisture, Nitrogen, Soil pH, Hemeroby index, Humidity, Organic_matter) of 29,935 relevés from Southeastern Europe where at least one tree species is present. The imputation methods we considered are: k-Nearest Neighbour, multiple linear regression (with or without collinearity correction), Reprediction Algorithm, Weighted Averaging (WA) and Weighted Averaging Partial Least Squares (WAPLS) regression. The different methods of imputation were compared by looking at the output produced and its deviation from the “true” observed values for a set of species with known EIVs. We have considered a set of species with known EIVs and proceeded to multiple imputation using the methods above; as a measure of performance we adopted the mean squared error (MSE) estimate, and expert judgement of ecological consistency. Models based on Regression and k-Nearest Neighbour seem to outperform the others. On the contrary, Reprediction algorithm in its different forms: produced less satisfactory results. Imputation of missing values is generally based on expert knowledge or on some variant of weighted averaging (also known as Hill's method). Here we show that other methods may be more effective and should be appropriately considered by vegetation scientists, since those may allow the application of EIVs in other biogeographic regions.
Estimation of missing Ellenberg indicator values for tree species in South-eastern Europe. A comparison of methods / Leccese, L.; Fanelli, G.; Cambria, V. E.; Massimi, M.; Attorre, F.; Alfo, M.; Acic, S.; Bergmeier, E.; Carni, A.; Cuk, M.; Custerevska, R.; Dimopoulos, P.; Hoda, P.; Mullaj, A.; Silc, U.; Skvorc, Z.; Stancic, Z.; Dajic Stevanovic, Z.; Tzonev, R.; Vassilev, K.; Malatesta, L.; De Sanctis, M.. - In: ECOLOGICAL INDICATORS. - ISSN 1470-160X. - 160:(2024). [10.1016/j.ecolind.2024.111851]
Estimation of missing Ellenberg indicator values for tree species in South-eastern Europe. A comparison of methods
Fanelli G.;Cambria V. E.;Attorre F.;Alfo M.;Malatesta L.
;De Sanctis M.
2024
Abstract
Ellenberg indicator values (EIV) are widely used in vegetation ecology, but the values for many species in Southeastern Europe are not available due to incomplete knowledge of their ecology: it is therefore of paramount importance to estimate missing values in existing databases. The entire EIV set for a single species can be missing or a single EIV can be missing for species for which other indicator values are available. Our aim here is to provide a simple method to impute missing values for species who have missing data in a single or multiple EIV. For this purpose, we adopt a multiple imputation procedure and compare a number of imputation methods on the basis of two datasets: i) “indices”, the set of 9 Ellenberg indicators taken from literature, available for 10,824 species and ii) “vegetation”, a set describing the physical and climatic characteristics (Light, Temperature, Continentality, Soil moisture, Nitrogen, Soil pH, Hemeroby index, Humidity, Organic_matter) of 29,935 relevés from Southeastern Europe where at least one tree species is present. The imputation methods we considered are: k-Nearest Neighbour, multiple linear regression (with or without collinearity correction), Reprediction Algorithm, Weighted Averaging (WA) and Weighted Averaging Partial Least Squares (WAPLS) regression. The different methods of imputation were compared by looking at the output produced and its deviation from the “true” observed values for a set of species with known EIVs. We have considered a set of species with known EIVs and proceeded to multiple imputation using the methods above; as a measure of performance we adopted the mean squared error (MSE) estimate, and expert judgement of ecological consistency. Models based on Regression and k-Nearest Neighbour seem to outperform the others. On the contrary, Reprediction algorithm in its different forms: produced less satisfactory results. Imputation of missing values is generally based on expert knowledge or on some variant of weighted averaging (also known as Hill's method). Here we show that other methods may be more effective and should be appropriately considered by vegetation scientists, since those may allow the application of EIVs in other biogeographic regions.File | Dimensione | Formato | |
---|---|---|---|
Leccese_Estimation-of-missing_2024.pdf
accesso aperto
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
2.57 MB
Formato
Adobe PDF
|
2.57 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.