In this paper, we investigate a new class of graphs, called “Incompatibility Graphs”. They arise from Box Clustering, a branch of the more general methodology known as Logical Analysis of Data. In Box Clustering, the input of an instance is a “training” dataset for Supervised Classification, consisting of a finite set of observations in a d-dimensional space which are classified either as “positive” or as “negative”. Incompatibility Graphs are introduced as a convenient tool for representing and analyzing the relations between positive and negative observations. Besides their importance for the applications in data mining, these graphs have an intrinsic interest from a theoretical viewpoint. Actually, in the two dimensional case, they cannot have as induced subgraphs matchings of cardinality 3, paths of length 8 or more, and cycles of length 7 or more. Therefore, they generalize some important classes of graphs, namely, chordal and weakly chordal graphs. But Incompatibility Graphs feature many more forbidden subgraphs. Through an appropriate computer program, based on mixed integer linear programming, we have obtained a full catalogue of the 105329 forbidden subgraphs with at most 10 vertices. Incompatibility Graphs are also shown to have small radius. Moreover, the special structure of the Incompatibility Graphs can be exploited to efficiently solve some key-problems related to Box Clustering, such as the Maximum Box and the Minimum Covering by Boxes problems. In fact, we show that these two problems can be formulated as a vertex packing and a vertex coloring one, respectively, in an Incompatibility Graph, and that one can solve in polynomial time the former and, for two important subclasses of instances, also the latter.

Incompatibility Graphs in Data Mining / E., Boros; Ricca, Federica; Simeone, Bruno; V., Spinelli. - STAMPA. - Rapporto Tecnico Dip. di Scienze Statistiche, Sapienza, Università di Roma, n. 10/2011 - ISSN 2279-798X:(2011).

Incompatibility Graphs in Data Mining

RICCA, Federica;SIMEONE, Bruno;
2011

Abstract

In this paper, we investigate a new class of graphs, called “Incompatibility Graphs”. They arise from Box Clustering, a branch of the more general methodology known as Logical Analysis of Data. In Box Clustering, the input of an instance is a “training” dataset for Supervised Classification, consisting of a finite set of observations in a d-dimensional space which are classified either as “positive” or as “negative”. Incompatibility Graphs are introduced as a convenient tool for representing and analyzing the relations between positive and negative observations. Besides their importance for the applications in data mining, these graphs have an intrinsic interest from a theoretical viewpoint. Actually, in the two dimensional case, they cannot have as induced subgraphs matchings of cardinality 3, paths of length 8 or more, and cycles of length 7 or more. Therefore, they generalize some important classes of graphs, namely, chordal and weakly chordal graphs. But Incompatibility Graphs feature many more forbidden subgraphs. Through an appropriate computer program, based on mixed integer linear programming, we have obtained a full catalogue of the 105329 forbidden subgraphs with at most 10 vertices. Incompatibility Graphs are also shown to have small radius. Moreover, the special structure of the Incompatibility Graphs can be exploited to efficiently solve some key-problems related to Box Clustering, such as the Maximum Box and the Minimum Covering by Boxes problems. In fact, we show that these two problems can be formulated as a vertex packing and a vertex coloring one, respectively, in an Incompatibility Graph, and that one can solve in polynomial time the former and, for two important subclasses of instances, also the latter.
2011
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/420690
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact