Multivariate modeling for microalgae growth in outdoor photobioreactors

Abstract An empirical model for prediction of microalgal growth in outdoor photobioreactors cultivation, using Principal Component Analysis (PCA) and Partial Least Squares (PLS) regression method, is implemented. Experimental data of biomass production were collected over 1 year of operation of a bubble column prototype, monitoring light and temperature and changing cultivation's conditions. PCA isolates 2 Principal Components that explain 80% of the variance and are associated with Environmental Conditions and Cultivation Conditions. Moreover, the PLS regression model showed positive results in term of responses (R2 = 0.84) and residuals, following the experimental trends of outputs as specific growth rate (μ(d−1)) and productivity calculated at Cmax (Pmax (g L−1 d−1)), giving also good prediction results in its validation test. This method could be easily used for other purpose, by changing the input values of the specific cultivation used (including CO2 uptake or wastewater dilution ratio in the culture medium), obtaining as outputs the desired variables (lipid production rate, etc.).


Introduction
Recently, the global market is shifting the focus towards new green products, drawing the attention to microalgae as a valid alternative source of high added value products [1,2] and renewable biofuels, for example: biodiesel, biohydrogen and biomethane [3]. A large number of microalgal species can attain a lipid content between 20-50% DCW (Dry Cell Weight) depending on the cultivation conditions [3][4][5][6][7][8][9]. With the aim of reducing costs and environmental impact [10], several studies have focused on using microalgal cultivations for CO 2 capture from industrial processes [7,11], and also on the addition of wastewater to algal nutrient media, to reduce both the cost of biomass production and wastewater treatment [12][13][14][15]. For photoautotrophic production, the most popular cultivation systems are open ponds and closed photobioreactors. There are however a large number of factors that could affect the growth of microalgae and these can be divided in three categories [7]: Abiotic factors such as light, temperature, nutrients (CO 2 , N, P, K, etc.), O 2 , pH, salinity and toxins.
Biotic factors including bacteria, viruses, fungi, and other species in competition with microalgae.
Operational factors such as mixing and stirring conditions, dilution ratio, vessel width and depth, and harvest frequency.
For photoautotrophic cultivations, light availability is the most important factor affecting cell growth; its control is difficult in outdoor cultures, due to the variation of solar radiation during day and season, and its non-homogeneous distribution inside the photobioreactors. This heterogeneous light distribution can cause photolimitation and/or photoinhibition, depending on both light irradiance intensity and microalgal concentration, thus significantly affecting photoproduction. Light distribution, temperature is also an important limiting factor for growing algae in both indoor or outdoor systems. Many microalgae can tolerate temperature increases or decreases, but diverging the optimum 2 temperature by only 2 or 4 • C may lead to culture loss [16]. Moreover, overheating problems may occur in outdoor systems, making it necessary to use cooling systems to keep temperatures below 28-30 • C [17]. Given these considerations, in the last two decades an increasing number of scientists have tried to predict microalgal growth and metabolites production under transient conditions of light intensity and temperature using empirical models [18,19] or semi-empirical models [20,21]. With respect to light modeling and its connection to microalgal growth, there is a differentiation based on the models' ability to take into account light gradients [18,22], light cycles [23,24] and also physical phenomena (as scattering) [22,25] that occur in outdoor cultures. In the case of temperature, the modeling approaches can be divided into coupled and uncoupled, depending on whether the models take into account or not the potential interdependence of light and temperature on growth [26]. Predictive power of such model is usually achieved by a large number of adjustable parameters difficult to relate to physical and chemical phenomena, and thus with scarce identifiability [27]. These problems inevitably affect the model validation for an end user, making them useless for an accurate prediction of microalgal growth in sligthly different outdoor conditions. For these reasons, there is a need to develope models that an end user would be easily able to handle, without losing accuracy [25]. In this work an empirical model to predict microalgae growth in outdoor cultivation system, using multivariate statistics, is implemented to overcome the problems of complex mathematical models. In particular, the multivariate statistical projection methods PCA (Principal Component Analysis) and PLS (Partial Least Square) are used for the purpose. PCA and PLS were initially used in process control for their ability to compress multidimensional data and to extract the most useful informations, by projecting these data into a low-dimensional space having as new reference system: the principal components [28]. In particular, for algae production both techniques have been used to analyze the water chemistry conditions in three wastewater stabilization ponds with excessive algae growth and fluctuating pH, finding correlations between variables (pH, temperature, light, dissolved oxygen, etc), and developing a multivariate regressions model for pH as a dependent variable [29]. Other similar applications were used to study and identify correlations between different algal species present in Lake Wingra [30] and also to illustrate the influence of environmental variables on phytoplankton composition in the Vaal River [31]. Unlike previous works, in this paper an innovative use of PCA and PLS methods is developed, not only to reduce data redundancy but also to predict microalgal growth in a specific period of the year with defined weather and cultivation conditions. 1 year experimental data from a pilot scale phototrophic plant were analyzed with PCA and a PLS multivariate regression method, to obtain the specific growth rate (μ(d −1 )) and the productivity calculated at C max (P max (gL −1 d −1 )).

Microalgal outdoor cultivation
In this work two different strains of algae named Tetradesmus obliquus and Graesiella emersonii were selected and cultivated in an outdoor photobioreactors (PBRs) pilot plant [17,32]. Each inoculum was prepared using local tap water [33] in place of distilled water [34,35]. The pilot plant was installed in Rome (Italy) at the Bio-P s.r.l. site (N 41°55' 5" E 12°35' 35"); it was fitted with 10 column photobioreactors with an operative volume of 21 L each (internal diameter=14 cm, height=150 cm), anchored to a metal support structure. Each reactor was connected to air (for mixing purposes) and to CO 2 lines (for pH control). The air flux was generated by a membrane compressor (AIRMAC 40W).
Mixing inside the reactors was achieved by using toroidal spargers, while CO 2 was injected as pure on demand from cylinders. When a certain concentration was reached (1-3 gL −1 ), the microalgal suspension was collected and sent to a 95 Lh

Monitoring of microalgae growth
During each experiment, microalgae concentration was determined daily both by cell count and dry weight for both species. For dry weight measures, each sample was firstly washed with 1 mL of sodium acetate buffer solution (sodium acetate 0.5 M, pH=4.8) in order to dissolve any salt that could have misrepresented the measure. After that, 10 mL of each sample was filtrated using 0.70 μm microfiber filters (VWR). The filters were then dried at 105 • C for half an hour and weighed. Cell counting was performed with an optical microscope (Motic EF-N PLAN) in a 10 −4 mL Thoma chamber. The values of specific growth rates were obtained as: Where x are the cell concentration values (10 6 cell mL −1 ) during the exponential growth phase. The productivity calculated at C max , was obtained as: Consequently, P max is not referred to the final/total productivity that is actually related to that is actually related to the batch duration.

Monitoring and control of variables
During the experiments pH and temperature were continuously monitored using probes inside reactors, controlled by an active feedback control system. Both pH and temperature data were continuously registered for analysis and displayed on a PC interface using LABVIEW software. The pH was maintained at its set point (pH=8) with the use of CO 2 , that was injected on demand directly inside the reactors. The temperature was kept below the maximum temperature threshold setting (T=30 • C) by using a water spray system, designed and built for the purpose. Two probes were placed inside and outside the reactors, to measure internal and external temperature. The illuminance was measured every day (at 10 am, at 2 pm and at 5.30 pm) by using a luxmeter (LM-8000,

LT-Lutron) and transformed to the corresponding value of Photosynthetic Photon Flux
Density (PPFD) (μEm −2 s −1 ) through multiplication with the conversion factor (0.0185) for sunlight light source [36]. Every measure was taken at three points for each reactor, 6 at different heights from the ground: at the bottom (20 cm), at a middle height (80 cm) and at the top (140 cm). Light measurements at the bottom and middle were normalized with respect to light on the top, being always lower than this value. As a reference of light fluctuations, light was also measured at a fixed point not affected by any shadowing (named as "unshaded reference point").

Multivariate Statistical Analisys
In the present study two multivariate methods, PCA and PLS respectively, were used, firstly to investigate the effects of variables on microalgal growth, and then to develop an empirical model for growth estimation using MINIT AB and OriginPro (OriginLab Corporation) software. For a short explanation of PCA and PLS methods see Appendix

Input & Output Data
Before showing PCA and PLS results, the values of the two inputs and outputs, represent- understand the experiment's duration and the biomass production. Other factors, such as CO 2 and pH, can also influence the microalgae growth, but in the pilot plant used in this study, the pH was controlled by a control system which maintained it always constant around 8 (which is also the optimal pH for such species). For such reason pH was assumed to be not influent on our tests. Furthermore, CO 2 was supplied on demand by the pH control system and thus was assumed to be sufficient, and not a limiting nutrient, for microalgae growth throughout the cultivation. In This difference is due mainly to two reasons: firstly, the water contained in a closed vessel directly irradiated by sun, without an active cooling system, heats up to a temperature higher than that of the air (T imax > T emax ) for physical reasons related to heat transfer efficiency and the thermal capacity of water. Moreover, these effects are improved by the trigger of the Non-Photochemical Quenching (NPQ) mechanism. This particular defence mechanism protects microalgae from the negative effects of high solar light absorption, dissipating the excess amount of light energy to heat and giving the appearance of an exothermic reaction. The second reason is the presence of the temperature control system with water spray cooling that maintains, when it is active in June-September period, the Internal Temperature at its set point T= 30 • C. In all four subplots, anyway, the seasons temperature trend is visible, and temperature reaches maximum value at about T max =38 • C in July, and minimum value at about T min =4 • C in December.
In 59") [37]. It can be seen, as expected, that the maximum value of PPFD (260 μEm −2 s −1 ) is reached during the summer period, as also for the Daily Illumination Time (15.6 h) that begins to decrease with autumn. In Fig. 3 the output values during each esperiment are plotted in terms of specific growth rate (μ(d −1 )) and productivity calculated at C max

)). A positive trend of Pmax towards summer months (with the exception of E conducted at an half of normal NaNO 3 concentration) is shown. μ doesn't show a clear trend because of high standard deviations between the replicates (especially for C and E) due to the several factors influencing the outdoor microalgal growth. The bars represent the Standard Deviations of the replicates for both microalgae, in each experiment, considered separately. For both outputs the replicates' number (n) consist in the number of reactors used in each experiment (n=6 for A,B,C,D,E; n=3 for F; n=7 for G; n=4 for H; n=9 for I; n=8 for L).
The P max and μ values in Fig. 3 [42]. The results obtained with PCA implementation, giving a PC=2, granted an easy interpretation of the variables' effects on PCs in a bidimensional plot.
In Fig. 4 the Loading Plot ( Fig. 4.a) and the Score Plot ( Fig. 4.b)  correlation each other (verified also physically), due to the small angles between the vectors. Fig. 4.a also shows the variables approaching a vertical orientation (NaNO 3 ,C init ) that are irrilevant to PC1, but significantly influence PC2. In particular, the NaNO 3 vector is longer than C init one, being the most influent in PC2 load. Furthermore these two variables do not correlate with each other; they physically represent two different conditions. Since the vectors that influence PC1 more are composed of environmental variables, the PC1 can be named: "Environmental Conditions"and for the same reason the PC2 is called "Cultivation Conditions". In Fig. 4 Fig. 4.b a dashed circle is reported, representing the Mahalanobis distance, denoting no outlier [28].
Like many other multivariate statistic methods, the results obtained (in terms of plots, tables and numbers) have to be interpreted and sometimes are not univocal. Indeed, even if the first principal component (PC1) is the most important one because explains the maximum percentage of variability present in the data that can be represented in only one dimension, iit is possible that the second one (PC2) is influenced by more important variables (maybe economically). In our case PC2, influenced by the Cultivation Condition variables, is less important for the explanation of data variance but is more important for the economical point of view. Indeed, the amount of nutrients (NaNO 3 in our study) can affect, besides the growth rate and productivity, also the Operative Expenditures (OPEX) of the process. For this reason, a correct and deeper interpretation of PCA results, without stopping to a shallow analysis, is essential.

Model Selection and Predictors Evaluation
PCA results showed the connections between variables and their effects on PCs, grouping both observations and variables. An empiric model to represent the effect of variables on biomass growth is developed by PLS.
As a first step, in Fig. 5.a the PRESS (Predicted Residual Error Sum of Squares) values of each predictor (the same meaning of component for PCA) are plotted. It is shown that the minimum value of PRESS is obtained at the 6th predictor and consequently the PLS model will need six predictors to describe most of the variance (84%). This is shown in Fig. 5.b, where the R-Squared (R 2 ) value for each predictor is plotted for μ response (the same trend is observed also for P max ). The R-Sq value provides the proportion of variation in each response that is explained by the predictors, indicating how well each model fits data (the higher value obtained, the better the model fits the data). In particular in Fig. 5.b both absolute R-Sq (bars) and cumulative R-Sq (dots) are plotted, denoting the same result: i.e. maximum value of R-Sq of the 6th predictor (84%). As shown in Fig. 5.b using Cross Validation (CV), an improvement in variability representation can be achieved.The cross-validation technique (the leave-one-out case has been used in this study) works by omitting each observation one at a time, rebuilding the predictive model using the remaining data and then using this model to predict the omitted data, estimating at the end with PRESS the predicted residual error.

Response Analysis
In Fig. 6 predicted responses versus experimental data are reported for both outputs for direct fitting and CV procedure. Both plots in Fig. 6  On the other hand, for P max the presence of possible outliers can be explained by the different initial cultivation conditions, in terms of C in and NaNO 3init carried out in the first experiments, as described in Paragraph 3.2.

Empiric Model Prediction Results
In this subsection the prediction results using PLS model are shown. In particular, in Tab.  (μ(d −1 )) and productivity calculated at Cmax (Pmax(gL −1 d −1 )). Predictors: maximum and minimum internal and external temperatures averaged during each experiment T imax avg , T imin avg , Temax avg , T emin avg ( • C); initial inoculum concentration C init (gL −1 ) and initial nitrate concentration NaNO 3 init (gL −1 ); photosynthetic photon flux density PPFD(μEm −

Conclusions
The innovative use of PCA and PLS for modeling and predicting microalgae growth in a phototrophic outdoor pilot plant, has shown many positive features including the ability to analyse numerous datasets, subject to high variability, without losing its predictive abilities. Model predictions results showed acceptable values for both responses μ and P max , enabling the end user to establish how much biomass will be obtained in certain