Failure diagnosis and trend‐based performance losses routines for the detection and classification of incidents in large‐scale photovoltaic systems

Fault detection and classification in photovoltaic (PV) systems through real‐time monitoring is a fundamental task that ensures quality of operation and significantly improves the performance and reliability of operating systems. Different statistical and comparative approaches have already been proposed in the literature for fault detection; however, accurate classification of fault and loss incidents based on PV performance time series remains a key challenge. Failure diagnosis and trend‐based performance loss routines were developed in this work for detecting PV underperformance and accurately identifying the different fault types and loss mechanisms. The proposed routines focus mainly on the differentiation of failures (e.g., inverter faults) from irreversible (e.g., degradation) and reversible (e.g., snow and soiling) performance loss factors based on statistical analysis. The proposed routines were benchmarked using historical inverter data obtained from a 1.8 MWp PV power plant. The results demonstrated the effectiveness of the routines for detecting failures and loss mechanisms and the capability of the pipeline for distinguishing underperformance issues using anomaly detection and change‐point (CP) models. Finally, a CP model was used to extract significant changes in time series data, to detect soiling and cleaning events and to estimate both the performance loss and degradation rates of fielded PV systems.


| INTRODUCTION
As photovoltaic (PV) systems are rapidly becoming an important part of the energy mix, it is important to ensure their reliability and maximize their energy output to reduce the levelized cost of electricity (LCoE). This can be achieved through advanced data analytics. 1 To reliably generate electricity during an extended lifetime, cutting-edge software and data-driven algorithms can be employed to monitor the PV plant health state and detect real-time failures, thus minimizing downtimes. This will increase the energy yield and the profitability of the system. 2 Currently, most of the utility-scale PV power plants around the world are monitored 24/7, generating high volumes of data.
Depending on the level of monitoring, the analysis of field data can indicate different failures and/or loss mechanisms and it can provide insights on corrective, preventive, and predictive maintenance. 3 However, to optimize the operation and maintenance (O&M) strategies and maximize revenues, PV plant owners and asset managers must rely on efficient ways for analyzing and interpreting the data streams.
By doing this, fault incidents can be detected early and classified into different fault categories, thus enabling operators to take appropriate actions to mitigate the losses. 2,4 Typical automated failure detection can be categorized as: modelbased, image and data-driven methods. [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19] Such implementations mainly utilize electrical and meteorological measurements, signals, or images to capture the behavior of PV systems, uncover patterns, and identify failures. Most of the failure detection methods define threshold levels (TLs), which are used to compare a PV performance model against measurements in order to identify fault conditions. [20][21][22][23][24] Other methodologies perform residual analysis and statistical tests to detect faults in PV systems. [22][23][24][25][26][27][28][29] Even though some commonly occurring PV failures can be detected by the proposed algorithms, the categorization of incidents into different failure types and loss mechanisms remains a challenging task. 30 Although several machine learning-based models have been reported in the literature, 10,13,[31][32][33][34][35][36][37][38][39] it is unknown if any are indeed used (or even applicable) for real-time or post-processing monitoring applications, at the moment. On the other hand, the industry has been applying numerous statistical, empirical, and physics-based approaches to analyze the health state or performance losses of PV power plants. For example, python libraries have been developed to simulate PV performance (e.g., pvlib-python 40 ) or to evaluate PV performance based on statistical time series analysis (e.g., RdTools 41,42 ). In pvlib-python, it is possible to estimate the fraction of DC power lost due to snow coverage, soiling and shading. However, besides the site's meteorological measurements, other assumptions and parameters are required, which might not be readily available (e.g., particulate matter data, number of strings per module, and tilt). Similarly, the RdTools open-source library has the capability of statistically obtaining rates of performance degradation and soiling loss, which are useful for overall health state assessment. However, RdTools does not differentiate the performance loss rate (PLR) from the degradation rate (R D ). Though, a new capability was added for extracting simultaneously soiling losses and PLR. 43 Other existing methods rely on I-V curves and weather data and require train and test datasets for failure classification based on the exhibited profiles and fault patterns. 33 Current best practices lack a methodology for the accurate differentiation of failures and reversible/irreversible performance losses.
Reversible and irreversible performance losses are referred to as "trend-based" performance losses as their effect is either gradual or seasonal rather than failure-based losses that occur in an instant, and therefore, can be detected and quantified based on time series analysis. Differentiating and quantifying individual trend-based performance losses in real field data is a challenging task due to the complex interactions of PV behavior with changing environmental conditions as well as measurement noise, erroneous data, and uncertainties. To address this gap, failure diagnosis routines (FDRs) and trend-based performance loss routines (TLRs) were developed based on statistical residual 25,44 and change-point (CP) 45 techniques. The proposed routines operate on PV operational data and meteorological measurements and complement the previously developed data quality routines (DQRs). 4 All routines were combined to derive a complete diagnostic pipeline for differentiating common PV failures and performance losses (such as zero and reduced power production, degradation, soiling and snow losses) from a single performance metric. The diagnostic pipeline can be used for batch time series analyses of large fleets and real-time monitoring given that the technical specifications of the PV systems are available (e.g., system characteristics and meteorological data). The pipeline was benchmarked experimentally using historical inverter level measurements from a PV power plant in Larissa, Greece.

| METHODOLOGY
The proposed routines operate on time series of meteorological and electrical measurements. DQRs are initially applied for data validation and filtering of outliers, while power simulation models are used for predicting the performance of the system in the absence of any failures or degradation. FDRs are then used for detecting and classifying failures, while TLRs are utilized for detecting performance losses. Statistical analysis is then performed to distinguish faults from reversible and irreversible mechanisms. The proposed methodology, illustrated in Figure 1, was validated using the maintenance logs.

| DQRs
Initially, the methodology for data processing and quality verification was applied to the PV dataset to identify and remove invalid data points before simulating the plant performance. This can also provide insights and information about possible failures/performance losses and their type (e.g., near zero power production incidents). 2 More details about the DQR process are available in Livera et al. 4

| PV system performance models
The Huld et al. 46 model was used to predict the DC power of the test PV system. The model requires the in-plane irradiance and module temperature measurements. It was selected due to its high accuracy for c-Si PV modules under different sky conditions (e.g., clear-sky, cloudy, and partly cloudy conditions). 47 For the DC current and voltage measurements predictions, the empirical parametric models described by Livera et al. 33 were used.
The simulation models for predicting the power, current, and voltage measurements were trained based on a 10%:90% train and test set approach. The train set contained fault-free data over a 6-month period, and it was used for the model's training process-deriving the model's coefficients. The rest of the dataset, 54 months of data containing both faulted and fault-free periods, was used for the testing process. The goodness of the models' fit was evaluated using the correlation coefficient (R), the coefficient of determination (R 2 ), the mean absolute percentage error (MAPE), and the root mean square error (RMSE). 48 The Huld et al. model 46 was then used as a reference model in the failure diagnostic procedure by performing comparisons between the actual/measured and predicted/simulated power.
The performance ratio (PR), power, current, and voltage residuals, defined as the difference between the predicted and measured values, were also calculated.

| FDRs
The developed FDRs include a failure detection and a classification stage. The detection stage is based on a comparative assessment between the predicted and measured DC energy yield. A failure is detected when the absolute error (AE), defined as the absolute difference between the predicted and measured DC power, exceeds a specified TL. The TL is calculated by multiplying the power of the array at Standard Test Conditions (STC) with the combined yield uncertainty of the performance model. The combined uncertainty is calculated by deriving the partial derivatives of the model's inputs. 49 Residual analysis was performed to verify the fault occurrences.
In this context, the power, current, and voltage residuals from the seasonal naive method were analyzed using Shewhart charts (also known as control charts). 26,50 In a Shewhart chart, a sequence of samples is plotted against time and upper and lower control limits (UCL and LCL) are calculated based on the three-sigma rule; that is, UCL, LCL = μ 0 ± 3σ 0 , where μ 0 is the process mean and σ 0 is the standard deviation. 25 The UCL and LCL were calculated over weekly sliding windows using data under normal operation. 25 Under normal operation, the residuals are close to zero, have constant variance, and are normally distributed and within the estimated limits. During fault conditions, the residuals significantly deviate from zero and exceed the estimated control limits.
Once fault conditions are identified, classification algorithms based on logic tree structures 33 are used to categorize the detected incidents into (a) near zero power production (0% to 15% of predicted power due to inverter shutdown failures, grid problems, ground faults, etc.) and (b) nonzero production (due to snow coverage, degradation, soiling, etc.). 51 Nonzero production incidents due to failures were further categorized into three groups: (a) reduced current production class, (b) reduced voltage production class, and (c) reduced currentvoltage production class. More details are given in Section 2.5 and

| TLRs
Trend-based performance losses refer to linear and nonlinear drops in performance time series and profiles that may reduce the produced power of a PV system by up to 20%. 56 However, in some cases such as heavy snowfall or sandstorm, this range can be exceeded. 54 Such phenomena are categorized as nonzero production incidents and they can result in reversible and irreversible performance losses based on F I G U R E 1 Flowchart of the proposed methodology describing the five consecutive steps for differentiating failures and trend-based performance losses the caused damage. 57 Most of the irreversible losses can be classified as material/component degradation of the PV module and balance of system. 57 In this study, degradation, soiling, and snow coverage were investigated by applying a statistical method on a single PV performance metric. A statistical CP algorithm was utilized to identify the number and location(s) of CP(s) in a given profile by capturing linear and complex trends as well as abrupt profile changes. 58-60 A change-or switch-or break-point refers to a change in a time series or trend's statistical properties (e.g., mean, variance, and slope). 58 A time series with m CPs splits the data into m + 1 segments. The detected changes can be continuous or discontinuous.
In the case of performance losses, continuous CPs indicate a variation in the rate at which soiling accumulates or a nonlinear degradation pattern. 60 In case of nonlinear degradation, changes in the variability of PR time series are detected with the different segments exhibiting different slopes. 61 On the other hand, discontinuous CPs can either indicate soiling cleaning events, snow shedding, or corrective maintenance actions. 60 The TLRs consist of the Facebook Prophet (FBP) 59

CP anomalies Continuous CPs
Decreasing trend over years-progressive power drop Reduced P DC and I DC (V DC may also be affected) Abbreviations: G I , in-plane irradiance; I DC , DC current; P DC , DC power; T amb , ambient air temperature; T mod , module temperature; V DC , DC voltage. a Module temperature measurements can be simulated using an empirical model (i.e., using the Sandia module temperature model, 52 the Ross thermal model, 53 or the open-source Faiman module temperature available in pvlib-python library 40 ). b In some specific cases (e.g., heavy snowfall or sandstorm, severe potential-induced degradation), trend-based performance losses can cause power reductions greater than 20%. 54,55 analysis, and adjust the trend flexibility and its additional functionalities (e.g., forecasting). 58 This model was also applied to PV performance time series and exhibited low prediction error under different conditions (e.g., two-and three-step degradation profiles, range of PV module technologies seasonality in different climate zones, and different aggregation). 64 The FBP algorithm detects the number and location(s) of CPs by capturing statistical changes in the slopes of predefined segments. It initially distributes "potential" CPs uniformly along the selected range of the time series' trend, and it then compares the slopes in order to extract the most significant CPs by performing comparisons against a set TL. 58,59 The FBP algorithm calibration procedure was performed as reported by Theristis et al. 58 In this work, 100 "potential" CPs were distributed on the PR time series (n_changepoints = 100) and the changepoint_range argument

| Categorization
Categorization of nonzero power production incidents was performed using a single PV performance metric.

The Seasonal Hybrid Extreme Studentized Deviates (S-H-ESD)
was initially applied to detect data anomalies in the PR time series. 67 and shedding or a sandstorm followed by rain. In this case, the categorization procedure uses information from weather parameters to enable differentiation of snow and soiling. During snowfall periods, the PV system power production is reduced, while the module and ambient temperature measurements are lower than the typical operating temperature ranges. It is worth noting here that this is a seasonally repeated performance loss that affects all three electrical parameters (current, voltage and hence power). 70 Additionally, when snow sheds, an increase in the recorded voltage measurements is initially observed followed by a stepwise reduction. 70 The changepoint_prior_scale setting of FBP is then readjusted to estimate R D and to detect soiling and cleaning events. All quantitative metrics, specifications, and models used for differentiating fault types from loss mechanisms in PV systems are summarized in Figure 2 and

| Benchmarking and validation
The diagnostic architecture was benchmarked using time series data from a 1. In order to evaluate the detection and classification accuracy of the FDRs, a 2 Â 2 confusion matrix (see Table 2) was used. 75 The TP represents the normal operation data points that were correctly detected/classified, while the FP represents the actual fault data points that were incorrectly detected/classified as normal operation points. 76 In this context, the accuracy metric is defined as the ratio of the number of correct predictions (TP + TN) to the number of total predictions (TP + TN + FP + FN). 75 The maintenance log of the system was used to verify actual normal and fault operation.

| RESULTS
The results correspond to data from one of the subsystems (Inverter 1) of the 1.8 MW plant.

| Data processing and quality verification application
The DQRs were applied to the PV dataset to filter out nighttime data points (i.e., G I < 20 W/m 2 ). The power measurements were then normalized to the system's nominal capacity. The DQR process did not include data imputation nor correction to fully capture the exhibited profiles during fault conditions and loss events.
The DQRs methodology was also used to detect invalid measurements (that may indicate equipment malfunctions and/or faulty operation of the PV system) and to provide insights and information about possible failures/losses. 2 This was achieved by visually inspecting diagnostic plots, applying physical limits, statistical and comparative tests on the acquired measurements, and calculated PV performance parameters. Visual inspection of the Inverter 1 power data (see Figure 3) was deemed sufficient for observing PV operational problems (e.g., reduced power production) and data issues (e.g., gaps). 77  After evaluating the goodness of fit, a Shewhart chart 26,50 was constructed to verify the system's normal operation. As shown in Figure 5A, the power residuals were distributed within the estimated UCL and LCL during normal operation, while their mean was approximately zero. The residuals variance can be treated as a constant as shown in the histogram of the residuals (see Figure 5B). The histogram suggests that the residuals formulate approximately a normal distribution. Though, significant correlation in the residuals series was observed at different lags by the auto-correlation function (ACF) plot ( Figure 5C), signifying the need for better seasonal adjustment.

| Near zero power production incidents
During fault conditions, the measured DC power was significantly lower than the predicted and the comparative FDRs algorithm  Subsequently, the power residuals were analyzed using weekly sliding windows. As shown in Figure 7A, two data anomalies in the residuals pattern were detected in October indicating two fault occurrences. During the fault conditions, the residuals were not distributed within the estimated control limits and the mean of the residuals significantly deviated from zero for two specific cases (see October 20, 23, and 24). Furthermore, the residuals still formulate a normal distribution during the 2-week period as shown in Figure 7B, but they are spread out due to the larger standard deviation.
During near zero power production conditions, the affected parameters were mainly the DC power and current (reductions from 85% to 100%). The AC output power was also affected, and it was nearly zero. Such faults can affect either a part/subsystem or the whole PV system. Finally, the classification algorithm that considers the amount of power reduction, the affected electrical parameters, and the results of the statistical analysis achieved an accuracy of 97.3%. It is worth noting here that the classification stage achieved the same accuracy as the detection stage because this is a binary classification problem that involves classifying the data points into two groups: normal operation and fault data points. As previously indicated in Figure 7A, a sudden change in the residual profile was detected on October 20, indicating a fault event.
During that day, the current and power residuals exceeded the estimated control limits. Thus, the incident was classified as nonzero power production (reduction between 20% and 85%) due to "fault occurrences-reduced current class." Finally, the algorithm's classification accuracy could not be assessed because the maintenance log reported only fault issues at the inverter level.

| Nonzero power production incidents and categorization
The weekly PR time series (see Figure 9) was constructed using the The TLR methodology was then used for extracting soiling losses.
Over the evaluation period, the FBP model detected 24 discontinuous CPs (i.e., cleaning events; indicated by red vertical lines in Figure 11) Because there were indications of reversible (e.g., data anomalies in the PR time series, soiling) and irreversible (e.g., degradation) mechanisms, time series investigation for CPs detection was performed.
The FBP algorithm was used to detect changes in the variability of the weekly PR time series and to differentiate the loss factors based on the CPs sequence and the corresponding rate of change. Initially, the FBP algorithm distributed 100 "potential" CPs uniformly along the PR time series trend (see Figure 12).
In order to derive the optimal number of CPs, the small sample hypothesis t test 79 was used. Based on the results of the statistical significance test, the critical threshold value for CPs was set to 0.0001 resulting in 39 CPs (see Figure 13). The FBP model also captured the data anomalies detected by S-H-ESD.
Different trend-based performance losses will exhibit different sequences (e.g., continuous or discontinuous CPs) and rates of change. For example, a heavy snowfall or a sandstorm will exhibit In Figure 14, it can be seen that out of 39 As such, FBP extracted 17 significant CPs (see Figure 15); 12 due to loss mechanisms and cleanings and 5 due to fault occurrences whereas 10 of them were negative (see Figure 16).
The rates of change were then coupled to weather data (e.g., see snowfall measurements indicated by green points in Figure 16) to determine the CP root cause. One significant CP was misclassified (the CP detected in December 2018), while the remaining 11 CPs were correctly distinguished as reversible loss mechanisms and differentiated from the five CPs due to faults (circled in purple in Figure 16). The TLR classification was verified against the maintenance log resulting in an accuracy of 91.66%. The CPs detected in December-January months (six detected in total, one misclassified) were attributed to snow coverage that caused gradual decrease of PV performance. From the remaining CPs due to loss and cleaning events, five were attributed to cleaning events, while one CP was due to soiling.

| EXHIBITED FAILURES AND PERFORMANCE LOSSES OF THE TEST PV PLANT
Over the evaluation period, the test subsystem produced 4068 MWh.
The energy loss was then approximated as the difference between the predicted and measured DC energy yield resulting in 73.39 MWh Each category of energy loss is illustrated as a pie chart in Figure 17. Most of the energy loss was attributed to near zero power production incidents by 43.40% (e.g., inverter faults), while nonzero production incidents accounted for 42.87% (e.g., reduced current F I G U R E 1 6 Weekly performance ratio (PR) time series (black dots) of the test PV system along with the detected significant change-points (CPs) indicated by dashed lines. The red dashed lines indicate snow events, the blue dashed lines indicate cleaning events, and the orange line indicates soiling. Data anomalies detected by the Seasonal Hybrid Extreme Studentized Deviates (S-H-ESD) algorithm are circled in purple including heavy snowfalls. Weekly snowfall measurements are indicated by green points F I G U R E 1 7 Pie chart depicting the faction of the total energy lost in the test subsystem due to different fault and loss incidents production), from which 33.47% was due to performance losses (soiling, degradation, and snow). Finally, a 13.73% was attributed to other incidents and the power model's error.
The proposed diagnostic architecture suffers from the following limitations: (a) the CP model's flexibility needs to be recalibrated depending on the application and when using different performance metrics and (b) the methodology for extracting the significant number of CPs is not fully automated. Furthermore, given the available field data, the "actual" loss of energy generation for the test subsystem could not be estimated; only the energy lost during the period starting from the acknowledgment time until the resolution time was estimated. Despite the limitations of the proposed methodology, this is the first attempt in differentiating faults from reversible and irreversible performance losses using a statistical approach and a single performance metric. Based on the results from the analysis of such field data, O&M teams can be informed about the underperformance issues and act accordingly in order to recover some of the performance and financial losses.

| CONCLUSIONS
An analytical architecture capable of detecting underperformance issues in PV systems due to failures and loss mechanisms was presented in this work. The proposed architecture operates entirely on acquired raw field measurements. It mainly focuses on differentiating commonly exhibited failures from trend-based performance losses using a single performance metric.
The developed pipeline was experimentally validated using historical inverter data obtained from a large-scale PV system installed in Greece. The results demonstrated the effectiveness of the routines for detecting failures and loss mechanisms and the capability of the pipeline for distinguishing underperformance issues using residual, anomaly detection, and CP techniques. A CP model, namely, the FBP, was also used to extract significant changes in time series data, to detect soiling cleaning events and to estimate both reversible and irreversible performance losses in PV systems.
Finally, only inverter data were available in this study, and differentiation of faults from loss factors was performed at the inverter level. Future work will be extending the fault and loss categories and include more root causes such as partial shading, vegetation, and PV module-level failures which, in turn, require string/module-level monitoring. The results from the application of the proposed diagnostic pipeline can be used for monitoring applications and for optimizing O&M activities.