Deep Learning has revolutionized the whole field of Computer Vision. Very deep models with an huge number of parameters have been successfully applied on big image datasets for difficult tasks like object classification, person re-identification, semantic segmentation. Two-fold results have been obtained: astonishing performance, with accuracy often comparable or better than a human counterpart on one hand, and on the other the development of robust, complex and powerful visual features which exhibit the ability to generalize to new visual tasks. Still, the success of Deep Learning methods relies on the availability of big datasets: whenever the available, labeled data is limited or redundant, a deep neural network model will typically overfit on training data, showing poor performance on new, unseen data. A typical solution used by the Deep Learning community in those cases is to rely on some Transfer Learning techniques; within the several available methods, the most successful one has been to pre-train the deep model on a big heterogeneous dataset (like ImageNet) and then to finetune the model on the available training data. Among several fields of application, this approach has been heavily used by the robotic community for depth images object recognition. Depth images are usually provided by depth sensors (eg. Kinect) and their availability is somewhat scarce: the biggest depth images dataset publicly available includes 50.000 samples, making the use of a pre-trained network the only successful method to exploit deep models on depth data. Without any doubt, this method provides suboptimal results as the network is trained on traditional RGB images having very different perceptual information with respect to depth maps; better results could be obtained if a big enough depth dataset would be available, enabling the training a deep model from scratch. Another frequent issue is the difference of statistical properties between training and test data (domain gap). In this case, even in the presence of enough training data, the generalization ability of the model will be poor, thus making the use of a Domain Adaptation method able to reduce the domains gap; this can improve both the robustness of the model and its final classification performances. In this thesis both problems have been tackled by developing a series of Deep Learning solutions for Domain Adaptation and Transfer Learning tasks on RGB and depth images domains: a new synthetic depth images dataset is presented, showing the performance of a deep model trained from scratch on depth-only data. At the same time, a new powerful depthRGB mapping module is analyzed, to optimize the classification accuracy on depth images tasks while using pretrained-on-ImageNet deep models. The study of the depth domain ends with a recurrent neural network for egocentric action recognition capable of exploiting depth images as an additional source of attention. A novel GAN model and an hybrid pixel/features adaptation architecture for RGB images have been developed: the former on single-domain adaptation tasks, while the latter on multidomain adaptation and generalization tasks. Finally, a preliminary approach to the problem of multi-source Domain Adaptation on a semantic segmentation task is examined, based on the combination of a multi-branch segmentation model and a adversarial technique, capable of exploiting all the available synthetic training datasets and to increase the overall performance. The performance obtained by using the proposed algorithms are often better or equivalent with respect to the currently available state of the art methods on several datasets and domains, demonstrating the superiority of our approach. Moreover, our analysis shows that the creation of ad-hoc domain adaptation and transfer learning techniques are mandatory in order to obtain the best accuracy in the presence of any domain gap, with a little or negligible additional computational cost.

Broadening deep learning horizons: models for RGB and depth images adaptation / Russo, Paolo. - (2020 Feb 28).

Broadening deep learning horizons: models for RGB and depth images adaptation

RUSSO, PAOLO
28/02/2020

Abstract

Deep Learning has revolutionized the whole field of Computer Vision. Very deep models with an huge number of parameters have been successfully applied on big image datasets for difficult tasks like object classification, person re-identification, semantic segmentation. Two-fold results have been obtained: astonishing performance, with accuracy often comparable or better than a human counterpart on one hand, and on the other the development of robust, complex and powerful visual features which exhibit the ability to generalize to new visual tasks. Still, the success of Deep Learning methods relies on the availability of big datasets: whenever the available, labeled data is limited or redundant, a deep neural network model will typically overfit on training data, showing poor performance on new, unseen data. A typical solution used by the Deep Learning community in those cases is to rely on some Transfer Learning techniques; within the several available methods, the most successful one has been to pre-train the deep model on a big heterogeneous dataset (like ImageNet) and then to finetune the model on the available training data. Among several fields of application, this approach has been heavily used by the robotic community for depth images object recognition. Depth images are usually provided by depth sensors (eg. Kinect) and their availability is somewhat scarce: the biggest depth images dataset publicly available includes 50.000 samples, making the use of a pre-trained network the only successful method to exploit deep models on depth data. Without any doubt, this method provides suboptimal results as the network is trained on traditional RGB images having very different perceptual information with respect to depth maps; better results could be obtained if a big enough depth dataset would be available, enabling the training a deep model from scratch. Another frequent issue is the difference of statistical properties between training and test data (domain gap). In this case, even in the presence of enough training data, the generalization ability of the model will be poor, thus making the use of a Domain Adaptation method able to reduce the domains gap; this can improve both the robustness of the model and its final classification performances. In this thesis both problems have been tackled by developing a series of Deep Learning solutions for Domain Adaptation and Transfer Learning tasks on RGB and depth images domains: a new synthetic depth images dataset is presented, showing the performance of a deep model trained from scratch on depth-only data. At the same time, a new powerful depthRGB mapping module is analyzed, to optimize the classification accuracy on depth images tasks while using pretrained-on-ImageNet deep models. The study of the depth domain ends with a recurrent neural network for egocentric action recognition capable of exploiting depth images as an additional source of attention. A novel GAN model and an hybrid pixel/features adaptation architecture for RGB images have been developed: the former on single-domain adaptation tasks, while the latter on multidomain adaptation and generalization tasks. Finally, a preliminary approach to the problem of multi-source Domain Adaptation on a semantic segmentation task is examined, based on the combination of a multi-branch segmentation model and a adversarial technique, capable of exploiting all the available synthetic training datasets and to increase the overall performance. The performance obtained by using the proposed algorithms are often better or equivalent with respect to the currently available state of the art methods on several datasets and domains, demonstrating the superiority of our approach. Moreover, our analysis shows that the creation of ad-hoc domain adaptation and transfer learning techniques are mandatory in order to obtain the best accuracy in the presence of any domain gap, with a little or negligible additional computational cost.
28-feb-2020
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Russo.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 12.97 MB
Formato Adobe PDF
12.97 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1365047
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact