In an era where accumulating data is easy and storing it inexpensive, feature selection plays a central role in helping to reduce the high-dimensionality of huge amounts of otherwise meaningless data. In this paper, we propose a graph-based method for feature selection that ranks features by identifying the most important ones into arbitrary set of cues. Mapping the problem on an affinity graph - where features are the nodes - the solution is given by assessing the importance of nodes through some indicators of centrality, in particular, the Eigenvector Centrality (EC). The gist of EC is to estimate the importance of a feature as a function of the importance of its neighbors. Ranking central nodes individuates candidate features, which turn out to be effective from a classification point of view, as proved by a thoroughly experimental section. Our approach has been tested on 7 diverse datasets from recent literature (e.g., biological data and object recognition, among others), and compared against filter, embedded and wrappers methods. The results are remarkable in terms of accuracy, stability and low execution time. © Springer International Publishing AG 2017.

Ranking to Learn: / Roffo, Giorgio; Melzi, Simone. - 10312:(2017), pp. 19-35. (Intervento presentato al convegno 5th International Workshop on New Frontiers in Mining Complex Patterns tenutosi a Riva del Garda) [10.1007/978-3-319-61461-8_2].

Ranking to Learn:

Melzi, Simone
2017

Abstract

In an era where accumulating data is easy and storing it inexpensive, feature selection plays a central role in helping to reduce the high-dimensionality of huge amounts of otherwise meaningless data. In this paper, we propose a graph-based method for feature selection that ranks features by identifying the most important ones into arbitrary set of cues. Mapping the problem on an affinity graph - where features are the nodes - the solution is given by assessing the importance of nodes through some indicators of centrality, in particular, the Eigenvector Centrality (EC). The gist of EC is to estimate the importance of a feature as a function of the importance of its neighbors. Ranking central nodes individuates candidate features, which turn out to be effective from a classification point of view, as proved by a thoroughly experimental section. Our approach has been tested on 7 diverse datasets from recent literature (e.g., biological data and object recognition, among others), and compared against filter, embedded and wrappers methods. The results are remarkable in terms of accuracy, stability and low execution time. © Springer International Publishing AG 2017.
2017
5th International Workshop on New Frontiers in Mining Complex Patterns
Data mining; Feature selection; High dimensionality; Ranking
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Ranking to Learn: / Roffo, Giorgio; Melzi, Simone. - 10312:(2017), pp. 19-35. (Intervento presentato al convegno 5th International Workshop on New Frontiers in Mining Complex Patterns tenutosi a Riva del Garda) [10.1007/978-3-319-61461-8_2].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1410181
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 79
  • ???jsp.display-item.citation.isi??? ND
social impact