Adaptive Positive-Unlabelled Learning via Markov Diffusion

Paola Stolfi; Andrea Mastropietro; Giuseppe Pasculli; Paolo Tieri; Davide Vergni

Positive-Unlabelled (PU) learning is the machine learning setting in which only a set of positive instances are labelled, while the rest of the data set is unlabelled. The unlabelled instances may be either unspecified positive samples or true negative samples. Over the years, many solutions have been proposed to deal with PU learning. Some techniques consider the unlabelled samples as negative ones, reducing the problem to a binary classification with a noisy negative set, while others aim to detect sets of possible negative examples to later apply a supervised machine learning strategy (two-step techniques). The approach proposed in this work falls in the latter category and works in a semi-supervised fashion: motivated and inspired by previous works, a Markov diffusion process with restart is used to assign pseudo-labels to unlabelled instances. Afterward, a machine learning model, exploiting the newly assigned classes, is trained. The principal aim of the algorithm is to identify a set of instances which are likely to contain positive instances that were originally unlabelled.

Adaptive Positive-Unlabelled Learning via Markov Diffusion / Stolfi, Paola; Mastropietro, Andrea; Pasculli, Giuseppe; Tieri, Paolo; Vergni, Davide. - (2021).