The proposed approach for pose estimation is based on the construction of a Convolutional Neural Network with an encodingdecoding structure and a spatial pyramid based on WASP structure in its bottleneck and a Discrete wavelet transform encoder. These techniques already shown their capabilities to solve the main problems in state of the art related to: different Field of view (FoV) required to analyze the different possible sizes of a specific subject. we want to solve the faulty structure of the modern CNN based Neural Networks in the encoding part using DWT encoder and WASP. This Work also have the objective of demonstrating from a more general point of view which could be the advantages of a Discrete Wavelet Transform (DWT) encoder in any CNN-based approach for Pose Estimation and Object detection in any form, such as for several subjects in the same image or in the internal video due to the almost redundant use of the usual most famous encoding structures for CNN such as ResNet-101, U-Net or VGG16-19. we will do our tests using a U-net Based CNN in order to evaluate the importance of the results of the Discrete Wavelet Transform encoder also in the decoding part through the cropping of theme at the last layers of the network. This is necessary due to the loss of border’s pixels during encoding that could be useful for the result’s evaluation.
A Novel DWT-based Encoder for Human Pose Estimation / DE MAGISTRIS, Giorgio; Romano, Matteo; Starczewski, Janusz; Napoli, Christian. - 3360:(2022), pp. 33-40. (Intervento presentato al convegno SYSYEM 2022: 8th Scholar’s Yearly Symposium of Technology, Engineering and Mathematics tenutosi a Brunek; Italy).
A Novel DWT-based Encoder for Human Pose Estimation
Giorgio De Magistris
Conceptualization
;Christian Napoli
Supervision
2022
Abstract
The proposed approach for pose estimation is based on the construction of a Convolutional Neural Network with an encodingdecoding structure and a spatial pyramid based on WASP structure in its bottleneck and a Discrete wavelet transform encoder. These techniques already shown their capabilities to solve the main problems in state of the art related to: different Field of view (FoV) required to analyze the different possible sizes of a specific subject. we want to solve the faulty structure of the modern CNN based Neural Networks in the encoding part using DWT encoder and WASP. This Work also have the objective of demonstrating from a more general point of view which could be the advantages of a Discrete Wavelet Transform (DWT) encoder in any CNN-based approach for Pose Estimation and Object detection in any form, such as for several subjects in the same image or in the internal video due to the almost redundant use of the usual most famous encoding structures for CNN such as ResNet-101, U-Net or VGG16-19. we will do our tests using a U-net Based CNN in order to evaluate the importance of the results of the Discrete Wavelet Transform encoder also in the decoding part through the cropping of theme at the last layers of the network. This is necessary due to the loss of border’s pixels during encoding that could be useful for the result’s evaluation.File | Dimensione | Formato | |
---|---|---|---|
DeMagistris_A-novel_2022.pdf
accesso aperto
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
2.91 MB
Formato
Adobe PDF
|
2.91 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.