The problem of multivariate regression modelling in the presence of heterogeneous data is dealt to address the relevant issue of the influence of such heterogeneity in assessing the linear relations between responses and explanatory variables. In spite of its popularity, clusterwise regression is not designed to identify the linear relationships within homogeneous' clusters exhibiting internal cohesion and external separation. A within-clusterwise regression is introduced to achieve this aim and, since the possible presence of a linear relation between' clusters should be also taken into account, a general regression model is introduced to account for both the between-cluster and the within-cluster regression variation. Some decompositions of the variance of the responses accounted for are also given, the least-squares estimation of the parameters is derived, together with an appropriate coordinate descent algorithms and the performance of the proposed methodology is evaluated in different datasets.
Multivariate linear regression for heterogeneous data / Vicari, Donatella; Vichi, Maurizio. - In: JOURNAL OF APPLIED STATISTICS. - ISSN 0266-4763. - STAMPA. - 40:6(2013), pp. 1209-1230. [10.1080/02664763.2013.784896]
Multivariate linear regression for heterogeneous data
VICARI, Donatella;VICHI, Maurizio
2013
Abstract
The problem of multivariate regression modelling in the presence of heterogeneous data is dealt to address the relevant issue of the influence of such heterogeneity in assessing the linear relations between responses and explanatory variables. In spite of its popularity, clusterwise regression is not designed to identify the linear relationships within homogeneous' clusters exhibiting internal cohesion and external separation. A within-clusterwise regression is introduced to achieve this aim and, since the possible presence of a linear relation between' clusters should be also taken into account, a general regression model is introduced to account for both the between-cluster and the within-cluster regression variation. Some decompositions of the variance of the responses accounted for are also given, the least-squares estimation of the parameters is derived, together with an appropriate coordinate descent algorithms and the performance of the proposed methodology is evaluated in different datasets.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.