The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability of large data sets makes the study of factors that may affect the frequency and severity of accidents viable. We deal with a binary data set of the traffic road accidents recorded in Christchurch, New Zealand, from 2000 to 2009. (50 factors for 26440 records classified in 4 severity levels.) We used cluster analysis to measure the similarity of the factors both on the whole data set and separately for severity levels to outline the association between accident type and factors involved. Several algorithms based on the well known k-means algorithm and variants exist specifically designed for binary data. However they are known to often show dependence on initial values and a tendency to deliver a local optimum as a solution. A novel genetic algorithm is proposed to improve the performance of the incremental k-means algorithm (Ordonez, Clustering binary data streams with k-means, ACM SIGMOD Workshop on DMKD, [9], San Diego, CA). The objective function is based on a few sufficient statistics that may be easily and fast calculated on binary numbers. The results may provide us with an interesting insight into the similarity or dissimilarity between factors and accident severity levels and suggest that while the factors recorded in concurrence with fatal and serious accidents are few and distant each other, at the opposite a large number of similar factors are recorded in concurrence with accidents classified as either minor or non-injured.

A new genetic algorithm for clustering binary data with application to traffic road accidents in Christchurch / S., Saharan; Baragona, Roberto. - In: FAR EAST JOURNAL OF THEORETICAL STATISTICS. - ISSN 0972-0863. - STAMPA. - 45:1(2013), pp. 67-89.

A new genetic algorithm for clustering binary data with application to traffic road accidents in Christchurch

BARAGONA, Roberto
2013

Abstract

The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability of large data sets makes the study of factors that may affect the frequency and severity of accidents viable. We deal with a binary data set of the traffic road accidents recorded in Christchurch, New Zealand, from 2000 to 2009. (50 factors for 26440 records classified in 4 severity levels.) We used cluster analysis to measure the similarity of the factors both on the whole data set and separately for severity levels to outline the association between accident type and factors involved. Several algorithms based on the well known k-means algorithm and variants exist specifically designed for binary data. However they are known to often show dependence on initial values and a tendency to deliver a local optimum as a solution. A novel genetic algorithm is proposed to improve the performance of the incremental k-means algorithm (Ordonez, Clustering binary data streams with k-means, ACM SIGMOD Workshop on DMKD, [9], San Diego, CA). The objective function is based on a few sufficient statistics that may be easily and fast calculated on binary numbers. The results may provide us with an interesting insight into the similarity or dissimilarity between factors and accident severity levels and suggest that while the factors recorded in concurrence with fatal and serious accidents are few and distant each other, at the opposite a large number of similar factors are recorded in concurrence with accidents classified as either minor or non-injured.
2013
binary data; cluster analysis; genetic algorithms; kmeans algorithm; road traffic accidents
01 Pubblicazione su rivista::01a Articolo in rivista
A new genetic algorithm for clustering binary data with application to traffic road accidents in Christchurch / S., Saharan; Baragona, Roberto. - In: FAR EAST JOURNAL OF THEORETICAL STATISTICS. - ISSN 0972-0863. - STAMPA. - 45:1(2013), pp. 67-89.
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/537054
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact