PV modules are engineered to produce electricity for 30+ years and are being deployed worldwide in ever more and ever bigger PV plants. Continuous quality assurance and performance analysis are the cornerstone for long-term reliability to maximize financial and energy returns. In today’s highly competitive Operation and Maintenance (O&M) market, employing and maintaining extensive networks of on-site sensors for remote monitoring purposes, proves challenging. Within this framework, data-driven solutions play a leading role to turn raw data from the field into reliable actionable insights. PV plant’s data from SCADA and monitoring systems is constantly subject to quality issues and the uncertainty related to it is directly reflected on the quality and reliability of the performance metrics used. In this work, the impact of the quality of the most relevant input parameters (i.e., output energy and irradiation) for the calculation key performance indicators (KPIs) is evaluated and different data cleaning and imputation techniques are benchmarked. The main objective of this work is to improve the quality of PV performance analysis by minimizing the negative effects of using incomplete and/or corrupted time-series as input for the calculation of PV plant KPIs (such as Performance Ratio and Availability). This objective is achieved through the assessment of different data sources with different intrinsic quality. In chapter 2, the methodology and data used are explained. Then, in chapter 3, as a pre-liminary data analysis, raw data from on-site sensors was compared with satellite-derived data to define and validate its uncertainty values. Special emphasis is given to irradiance sensors (pyranometers and reference cells), being the plane of array (POA) irradiance one of the variables with the greatest impact on performance evaluation. Later, in chapter 4, a consistent data quality analysis is proposed to assess the sensors’ health status to proceed with the corresponding cleaning procedure. At this stage, the concept of ‘virtual sensor’ is introduced, that solves the problem of having incomplete raw data by generating time-series with no missing data that efficiently combine on-site measurements with satellite data. Finally, in chapter 5, the advantage of performing data imputation using Machine Learning (ML) techniques is demonstrated by applying three good-performing algorithms (Random Forest, Bagging and Gradient Boosting Regressor) to replace missing data with highly accurate predicted values.
Improving the quality of PV plant performance analysis by increasing data integrity and reliability: a data-driven approach using Machine Learning techniques / OVIEDO HERNANDEZ, Guillermo. - (2021 Dec 03).
Improving the quality of PV plant performance analysis by increasing data integrity and reliability: a data-driven approach using Machine Learning techniques
OVIEDO HERNANDEZ, GUILLERMO
03/12/2021
Abstract
PV modules are engineered to produce electricity for 30+ years and are being deployed worldwide in ever more and ever bigger PV plants. Continuous quality assurance and performance analysis are the cornerstone for long-term reliability to maximize financial and energy returns. In today’s highly competitive Operation and Maintenance (O&M) market, employing and maintaining extensive networks of on-site sensors for remote monitoring purposes, proves challenging. Within this framework, data-driven solutions play a leading role to turn raw data from the field into reliable actionable insights. PV plant’s data from SCADA and monitoring systems is constantly subject to quality issues and the uncertainty related to it is directly reflected on the quality and reliability of the performance metrics used. In this work, the impact of the quality of the most relevant input parameters (i.e., output energy and irradiation) for the calculation key performance indicators (KPIs) is evaluated and different data cleaning and imputation techniques are benchmarked. The main objective of this work is to improve the quality of PV performance analysis by minimizing the negative effects of using incomplete and/or corrupted time-series as input for the calculation of PV plant KPIs (such as Performance Ratio and Availability). This objective is achieved through the assessment of different data sources with different intrinsic quality. In chapter 2, the methodology and data used are explained. Then, in chapter 3, as a pre-liminary data analysis, raw data from on-site sensors was compared with satellite-derived data to define and validate its uncertainty values. Special emphasis is given to irradiance sensors (pyranometers and reference cells), being the plane of array (POA) irradiance one of the variables with the greatest impact on performance evaluation. Later, in chapter 4, a consistent data quality analysis is proposed to assess the sensors’ health status to proceed with the corresponding cleaning procedure. At this stage, the concept of ‘virtual sensor’ is introduced, that solves the problem of having incomplete raw data by generating time-series with no missing data that efficiently combine on-site measurements with satellite data. Finally, in chapter 5, the advantage of performing data imputation using Machine Learning (ML) techniques is demonstrated by applying three good-performing algorithms (Random Forest, Bagging and Gradient Boosting Regressor) to replace missing data with highly accurate predicted values.File | Dimensione | Formato | |
---|---|---|---|
Tesi_dottorato_Oviedo.pdf
accesso aperto
Tipologia:
Tesi di dottorato
Licenza:
Creative commons
Dimensione
7.5 MB
Formato
Adobe PDF
|
7.5 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.