Unorganized machines and linear multivariate regression model applied to atmospheric pollutant forecasting

Air pollution is a relevant issue studied worldwide, and its prediction is important for social and economic management. Linear multivariate regression models (LMR) and artificial neural networks (ANN) are widely applied to forecasting concentrations of pollutants. However, unorganized machines are scarcely used. The present investigation proposes the application of unorganized machines (echo state networks ESN and extreme learning machines ELM) to forecast hourly concentrations of particulate matter with the aerodynamic diameter up to 10 μm (PM10), carbon monoxide (CO), and ozone (O3) at the metropolitan region of Recife, Pernambuco, Brazil. The results were compared with multilayer perceptron neural network (MLP) and LMR. The prediction was made using or not meteorological variables (wind speed, temperature, and relative humidity) as input data. The results showed that the inclusion of these variables could increase the general performance of the models considering one step ahead forecasting horizons. Also, the ELM and the LMR achieved the best overall results.


Introduction
Air pollution is a widely studied subject mainly due to its impact on human health. It is considered a major environmental risk to public health by the World Health Organization (WHO), which estimates around eight million deaths per year due to the exposure to air pollution (Jasarevic & Lindmeier, 2015). The association between concentrations of air pollution and severe health problems is a well-studied subject worldwide, as we can see in Langrish et al. (2012), Wu et al. (2012), Tadano, Siqueira, and Alves (2016), Polezer et al. (2018), Ardiles et al. (2018), Kachba et al. (2020).
Bad air quality causes chronic diseases and premature mortalities (Cabaneros, Calautit, & Hughes, 2019). In this sense, it is very important to the policy-makers and urban city planners to have previous information about air quality in order to find solutions to avoid or minimize the effects of air pollution on human health.
Therefore, air quality management is essential to improve the quality of life in urban areas. One major problem which aggravates air quality of these areas are the poorly designed building structures, allied to a large population density. The possibility to predict episodes of critical high atmospheric pollution a few days in advance would be an efficient way to help authorities take preventive and evasive action and protect citizen health (Nagendra & Khare, 2006;Caselli, Trizio, Gennaro, & Ielpo, 2009;Moustris, Ziomas, & Paliatsos, 2010).
The most commonly used tools for air quality analysis and forecasting are statistical regression models. Researchers have recently been applying artificial neural networks (ANN) and comparing their performance with classical statistical models (Baawain & Al-Serihi, 2014;Taylor, Retalis, & Flocas, 2016). To improve the prediction of the air pollutants concentration, some researchers included meteorological variables (Moustris, Larissi, Nastos, Koukouletsos, & Paliatsos, 2013). Grivas and Chaloulakou (2006) used an optimization procedure by genetic algorithm to select variables, resulting in a better performance than when using developed multiple linear regression models. Stadlobera, Hörmannb, and Pfeiler (2008) used linear regression models, combining daily pollutant concentrations with the meteorological variables of previous days (wind speed, precipitation, and temperature), aiming to model the daily mean concentrations of particulate matter with aerodynamic diameter up to 10 µm (PM10). The results showed that this approach supported future forecasting of pollutants. Caselli et al. (2009) used neural networks (multilayer perceptron -MLP and radial basis function -RBF) and a multivariate regression model (MRM) to predict daily PM10 concentrations, considering meteorological information (temperature, wind speed, pressure, and relative humidity) as input data for one, two, and three days in advance. The authors concluded that ANN showed more accurate results than the multivariate regression model, mainly for one day forecasting. They also highlighted that MRM failed to predict high PM10 values. Moustris et al. (2010) showed that MLP has a good capacity to forecast the air quality of three consecutive days. They forecasted maximum daily values of the European Regional Pollution Index (ERPI) for CO, NO2, SO2, and O3 three days in advance using meteorological variables such as air temperature, relative humidity, and wind speed and direction. They also predicted the number of hours during the day with the concentration of at least one of the pollutants above the threshold. Moustris, Nastos, Larissi, and Paliatsos (2012) predicted 24 hours ozone concentrations in Athens using multiple linear regression and the MLP with wind speed, temperature, and ozone concentrations as explanatory variables. They concluded that the ANN performed better. Moustris et al. (2013) used a forecasting of 24 hours in advance for the daily concentrations of PM10 and the number of hours exceeding PM10 concentration threshold during the day in five regions within the greater Athens area. The authors used MLP and considered maximum hourly NO2 concentration, air temperature, relative humidity, and wind speed and direction as input variables. They concluded that the ANN can successfully predict daily threshold exceedances and can be used to forecast the concentration of PM10 one day in advance. Sayegh, Munir, and Habeebullah (2014) studied different approaches, such as linear and nonlinear regressions, and learning machine methods to evaluate which gives a more accurate prediction of PM10 concentration in urban areas. They used meteorological variables (relative humidity, temperature, and wind speed) and the concentration of other pollutants (CO, SO2, and NOx). The results showed that multiple linear regression and quantile regression were the best models to describe PM10 variation. Mattos Neto, Madeiro, Ferreira, and Cavalcanti (2014) developed neural network architecture such as multilayer perceptron (MLP) with parameter optimization by genetic algorithm and tested on the prediction of PM10 and particulate matter with aerodynamic diameter up to 2.5 µm (PM2.5). The experimental study considered the random behavior of pollutant concentration. The results showed a consistent improvement when compared to other prediction techniques. Zhang and Ding (2017) proposed to predict the concentration of air pollutants using extreme learning machine (ELM) with meteorological parameters (temperature, relative humidity, and wind speed and direction) and the concentrations of NO2, NOx, O3, PM2.5, and SO2 in Hong Kong. The results showed ELM with slightly better performance than multiple linear regression and MLP.
Some recent studies applied ensembles to increase the quality of pollutant forecasting (Siwek & Osowski, 2012;Firmino, Mattos Neto, & Ferreira, 2014;Gong & Ordieres-Meré, 2016;Wang & Song, 2018;Bai, Zeng, Li, & Zhang, 2019). This methodology combines the outputs of various models, which were pretuned. The authors usually use a neural network, mainly the MLP, as combiner to increase the mapping capability of single models. As discussed by Bai et al. (2019), this approach can be considered state of the art.
However, according to the available literature, a few researchers used unorganized machines to predict concentrations of air pollution (Tadano, Siqueira, Alves, & Marinho, 2017;Zhang & Ding, 2017). Recent researches have addressed the echo state networks (ESN) and ELM to problems relating air quality and health (Polezer et al., 2018;Araujo, Belotti, Alves, Tadano, & Siqueira, 2020;Kachba et al., 2020). The use of such architectures is relevant since they have desirable characteristics to any forecasting problems: easiness in understanding and implementation and fast convergence allied to good performances. These works showed that the structures could overcome important models, such as the MLP, Radial Basis Function Networks (RBF), and Generalized Linear Models (GLM). Furthermore, some linear approaches are not capable of forecasting some series, especially those in which there are missing values, a reality in developing countries like Brazil.
Acta Scientiarum. Technology, v. 42, e48203, 2020 We highlight that the use of single neural models must be evaluated since the computational cost to adjust their free coefficients is lower than the ensembles which requires the adjustment of the single models and the combiner.
Therefore, this research aims to compare the performance of the unorganized machines (ELM and ESN) with MLP and traditional linear multivariate regression modeling to forecast air pollution (PM10, CO, and O3) in the metropolitan region of Recife, Pernambuco, Brazil. We compared the performance using and not using meteorological variables (relative humidity, temperature, and wind speed) to identify if meteorological variables improve the forecasting of pollutant concentrations.
The rest of the paper is organized in five sections, in which we discuss the databases addressed, the LMR model, the neural networks used, the computational results, as well as the critical analysis, and the conclusions.  Figure 1 shows the location of the Gaibú station on the right side, with coordinates 8º 19' 44. 37" S and 34º 57' 22. 66" W. On the left side down of Figure 1, the Abreu e Lima Refinery (RNEST) is highlighted at PE 60 Highway, which is one of the most significant sources of air pollution around the studied area.

Materials and methods
The state of Pernambuco is inserted in an intertropical zone presenting predominantly high temperatures. In the region of Recife, for example, the average temperature is 25ºC, with a maximum of 32ºC. In inland cities, the temperature during winter months (between May and July) can drop considerably, reaching 8ºC in some places (Governo do Estado de Pernambuco, 2018). In the summer, the rainfall index reaches 515 mm on the coast, where the city of Ipojuca is located, while reaching 725 mm during the winter (Agência Pernambucana de Águas e Clima [APAC], 2018). Table 1 shows the mean, standard deviation, and coefficient of variation to all input data. The coefficient of variation provides a variation measure of each data set regarding its mean to the studied period (July 17 th , 2015, to July 17 th , 2017). It is calculated by the ratio between the mean and the standard deviation. We may highlight that the mean concentration of PM10 is above the World Health Organization (WHO, 2006) guideline value of an 20 µg m -3 annual mean and presents a higher variation during the year when compared to other pollutants (higher coefficient of variation).
We also obtained the correlation coefficients between all input data. Among the meteorological variables, relative humidity was the only one inversely proportional to the concentration of pollutants, showing correlation coefficient (R) with PM10, CO, and O3 of -0.560, -0.093, and -0.351, respectively. These results confirm that the higher the relative humidity, the lower the concentration of air pollutants, especially PM10, due to wet deposition. It is important to highlight that, during winter, the relative humidity usually decreases, characterizing a dry season, which tends to increase respiratory diseases. The dry weather also increases the suspension of air pollutants, such as PM 10. Therefore, we can suppose that dryness, combined with higher concentrations of air pollutants may lead to an increase in respiratory diseases.

Linear multivariate regression model
Regression analysis is a statistical technique to model and investigate the relationship between two or more variables. A variety of engineering and science problems implicates in the exploration of this relationship (Montgomery & Runger, 2003). A mathematical model represents this kind of relationship, which is an equation associating the dependent variable (response variable -Y), which we desire to predict, with the independent variables (predictor or explanatory variables -xj). Therefore, the regression analysis aims to uncover the behavior between the dependent variable and the independent ones. When the response variable Y depends on two or more independent variables, the relationship is given by Equation 1:  (1) where: β0 represents the intersection, and the other parameters βj, j = 1, 2, 3, ..., k are denominated the regression coefficients. The estimation of these parameters can be done by using the minimum square method.

Artificial neural networks
Artificial neural networks (ANN) have been, recently, widely applied and developed in atmospheric investigations due to their computational efficiency and generalization capability. A neural network is a simplified mathematical modeling of the natural neural system of the superior organisms. It is an effective alternative to solve times series forecasting instead of traditional linear models (Goyal & Kumar, 2011). The main advantage of this tool is the flexibility, which permits the wide application on several fields (Hooyberghs, Mensink, Dumont, Fierens, & Brasseur, 2005). Another advantage includes the ability to simulate nonlinear behaviors based on the input variables of the model (Dorling, Foxall, Mandic, & Cawley, 2003).
The multilayer perceptron (MLP) is the most used architecture to solve the kind of problem addressed in this research. Notwithstanding, the literature presents many other proposals such as Elman recurrent neural networks, pruned neural networks, radial basis, and others (Shahraiyni & Sodoudi, 2016). In this research, we considered the traditional MLP and two proposals of unorganized machines: extreme learning machine (ELM) and echo state network (ESN).
Feedforward neural networks (FNN) are those in which the information flows from the input layer to the output layer, flowing only in one direction. The main characteristic of these models is the universal approximation capability that allows the neural models to approximate any nonlinear, continuous, limited, and differentiable function (Haykin, 2009;Siqueira, Boccato, Attux, & Lyra, 2014). The multilayer perceptron (MLP) is the most representative network of the FNN. Its popularity is related to the systematic training processes to tune the weights of the artificial neurons based on the backpropagation algorithm, allied with good results reported in the literature (Hippert, Pedreira, & Souza, 2001;Haykin, 2009). If the model is correctly adjusted, it can provide adequate estimations regarding a set of unknown inputs, a process known as generalization capability. Cybenko (1989) proved that an MLP, which has only one intermediate layer, is capable of approximating any nonlinear function if it is limited (the function to be approximated must be inserted in a space with boundaries to the variables) and has inputs defined in a compact space. Also, the activation functions must be infinitely differentiable (present all order derivatives needed to calculate the gradient of the cost function) (Haykin, 2009).
The unorganized machines (UM) are recent architectures of neural networks (Siqueira et al., 2014;Silva, Siqueira, Okida, Stevan Jr., & Siqueira, 2019). The difference between them and a fully trained proposal (as the MLP) is the training process. In this case, it is simple and computationally efficient as the hidden neurons stand untuned (Huang, Zhu, & Siew, 2006). Thus, the training process is reduced to find the coefficients of a linear combiner, which is the output layer. The extreme learning machines (ELM) and the echo state networks (ESN) are examples of UM.
The ELM is a single layer feedforward neural model, quite similar to the classic MLP. Be a vector containing the input signal. The output of a hidden layer of ELM is given by Equation 2: (2) in which is the matrix containing the weights of this layer, is the activation function, and b is the bias.
The output of the network is given by Equation 3: ( 3) where: W out is the output weight matrix. Due to the unorganized nature of the ELM, the training process is reduced to finding the coefficients of W out (Huang et al., 2006). This task can be solved using the Moore-Penrose generalized inverse operation, given by Equation 4: (4) where: as the matrix with the outputs of the intermediate layer, Ts, the number of training samples, , the pseudoinverse of , and d, the vector with the desired response. On the other hand, the ESN is a recurrent neural network (RNN) characterized by the presence of feedback loops in the intermediate layer, which is denominated dynamic reservoir (Jaeger, 2001). Following the same idea of the ELM, the weights of the reservoir are not adjusted.
Jaeger (2001) observed a few drawbacks in the training process of classic RNN, which are based in the backpropagation algorithm, such as instability, difficulties in the manipulation of the cost function, and an elevated computational cost. In this pioneer work (Jaeger, 2001), the author proves that if some conditions are satisfied, the reservoir can stand untrained and the model still presents memory, requiring only the adjustment of the output layer (Siqueira, Boccato, Luna, Attux, & Lyra, 2018;Silva et al., 2019). Additionally, Schäfer and Zimmermann (2007) proved that RNN are universal approximators, like the feedforward methods, being adequate to solve nonlinear mapping problems, such as time series forecasting.
The first design proposed for the weights of the reservoir (W Je ) is randomly generating them in accordance with a predefined distribution, as given by Equation 5: Finally, the adjustment process of the output layer is the same as the ELM, described in Equation 4.

Computational results and discussion
We developed two approaches considering a lag of up to 6 hours for all models (LMR, MLP, ELM, and ESN). In the first, the predictions were made using only the endogenous variables (PM10, CO, and O3). In the second, the meteorological information was considered (relative humidity -RH, average temperature -AT, and wind speed -WS) as exogenous variables, along with the endogenous ones.
During the adjustment phase of models' free parameters (training phase), we used meteorological variables and pollutant concentrations from July 17 th , 2015 0 hour to March 19 th , 2017 23 hour, comprising 14,688 hours (612 days). For the test set, we selected the data from March 20 th , 2017 0 hour to July 17 th , 2017 23 hour, containing 2,880 hours (120 days).

Performance metrics
We used the mean square error (MSE) and the index agreement (IA) to quantify the performance of the models. The MSE is defined in Equation 6: where: Y is the output of the model and di is i-th desired response.
The index of agreement (IA) is a standardized measure of the degree of model prediction error and varies between 0 and 1. It is calculated according to Equation 7: where: is the mean of all the samples of the output variable.
It is important to mention that the higher the value of IA, the better the model performance. That means a good prediction methodology of time series will present an IA close to 1. On the other hand, the MSE must be minimized.

Approach considering endogenous variables
To some time series, considers the Time Series Lag Operator (TSLO) given by Equation 8: defined for all (t > 0) wherein K indicates the desired lag applied on time series. That means the TSLO operates on each element of a time series to produce the K th previous element. For example, L 3 (PM10) indicates the 3 rd lag to PM10 pollutant concentration time series. We obtained 63 regression equations using the PM10 with 6 lag hours as an independent variable and using all the possible combinations. The same method was applied to CO and O3. The best results achieved by the LMR for PM10 used the respective lags L 1 (PM10), L 2 (PM10), and L 3 (PM10). For carbon monoxide, the selected lags were from the 1 st to the 4 th , while the ozone used L 1 (O3) and L 4 (O3). The neural networks presented similar configurations before the selection of the best lags. In all cases, the first lag achieved the best performance.
We also implemented the classic Autoregressive model, adjusted by the Yule-Walker Equations to provide a comparative analysis with a classical approach (Box, Jenkins, Reinsel, & Ljung, 2015). The order of the selected model followed the same as the ANN.
Moreover, we used the Mutual Information (MI) filter  to verify the nonlinear dependence between the pollutant concentration and the climate variables. In all cases, the values of the MI coefficient were higher than the threshold provided by the bootstrap method (Luna & Ballini, 2011). Table 2 shows the computational results obtained using the cited approaches. The numbers in bold indicate the best performances for each measure. The Friedman test was applied to verify if the errors were significant. In all cases, the p-value was close to zero, indicating a change in the predictor leads to different results.
Comparing the MSE and the IA provided by the models, the ELM achieved the best overall results for the concentrations of PM10 and CO, followed by the ESN. For O3, the MLP reached the best results, followed by LMR. This indicates that MLP and LMR should be considered to solve O3 forecasting. It also shows that the forecasting capability of these models is high despite the hidden layer being untuned. For CO, the performances of the UM is quite similar.

Approach considering exogenous variables
There is a specific interest in studying the influence of meteorological variables on the concentration of air pollutants since their influence by the wind speed (WS), relative humidity (RH), and temperature (AT), is noticeable (Moustris et al., 2013;Sayegh et al., 2014;Zhang & Ding, 2017). Therefore, we considered meteorological variables as input data.
Based on this, we used the IDE CodeBlocks software in view of finding the linear combination of the independent variables. The number of equations for the LMR was 15,600,380 to an exhaustive search, considering until the 6 th lag hour of the current pollutant and the variables mentioned above, WS, RH, and AT. The new number of inputs was 17 for PM10, 20 for CO, and 11 for O3.
Initially, the simulations evolving the neural models, MLP, ELM, and ESN considered the 1 st lag of each pollutant separately with all meteorological variables, using a total of 4 inputs. Later, we used all variables (meteorological and pollutants), presenting 6 inputs to the models. The best configuration was the same for all models: considering only 4 inputs.
We implemented the linear ARMAX model as a means of comparison (Box et al., 2015). The adjustment of the free coefficients was made using the maximum likelihood estimator. Furthermore, we used the same forward inputs and one recurrence. This is defined by previous empirical tests. Table 3 shows the models' performance. The Friedman test ensures the statistical difference of the results. Interestingly, all the results related to the LMR are improved by using the meteorological variables as inputs. On the other hand, the performance of the neural models was degraded. We can state that the simple inclusion of variables may not improve the performance. However, considering only the forecasting and nonlinear mapping capability of the neural models, the UM achieved, in general, best performances when compared to the MLP, except for O3. Figure 2, 3, and 4 allow a visual appreciation of the results, comparing the output response of the best predictors. According to them, the methods could predict the considered pollutants (PM10, O3, and CO). We highlight that the inclusion of atmospheric variables improved LMR performance. However, ANN achieved the best results without atmospheric variables.

Conclusion
The UM showed a good performance when compared to MLP. Still, using meteorological data, the LMR presented the best results in most cases. The IA of ELM for O3 and PM10 demonstrated better results. The simplicity in the training process of the UM, when compared to classic fully trained approaches provides a linear computational cost, besides the nonlinearity of the models. More accurate studies on variable selection would improve their performance. Therefore, this study should be repeated using different worldwide data sets. Finally, it is important to analyze these approaches using them as part of an ensemble or hybrid method. University (POLI-UPE), which allowed the mathematical simulations. Furthermore, the authors thank the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for the financial support, universal process 405580/2018-5.