Calibration and testing of an agrometeorological model for the estimation of soybean yields in different Brazilian regions

This study was designed to calibrate and test an agrometeorological model over 18 growing seasons in three soybean production areas in Brazil: Passo Fundo (Rio Grande do Sul State), Londrina (Paraná State), and Dourados (Mato Grosso do Sul State). The soybean potential yield (Yp) was determined by two methods: estimated using the FAO Agroecological Zone Model or based on the maximum yield published by the Brazilian Institute of Geography and Statistics (IBGE), increased by 10, 20 and 30%. The estimate of actual yield (Ya) was calculated by correcting Yp for the relative water deficit at different growth stages. The results showed that the best performance was obtained when the Yp was represented by the maximum yield increased by a certain percentage. The model showed satisfactory Ya estimates for the three locations, generating R values of 0.64, 0.46 and 0.70, with mean absolute errors (MAE) of 303, 289 and 259 kg ha for Passo Fundo, Londrina and Dourados, respectively. In a global analysis, the performance of the model was satisfactory, with an R, agreement modified index (d1), confidence index (C) and MAE of 0.64, 0.52, 0.71 and 300 kg ha, respectively.


Introduction
Agriculture is an important part of the Brazilian economy, in particular the jobs, income and revenues that are generated through the export of agricultural commodities.Today, in a modern and global agriculture, increases in crop yield as well as the assessment of climate risk systems define the capacity of a country to become competitive in the international market (EMBRAPA, 2006).
Soybeans are the most important Brazilian agricultural commodity.To improve the methods for estimating soybean yield and production, there is a need for technologies that make cropforecasting systems more accurate.According to Oliveira et al. (2003), the official agencies responsible for gathering information about agricultural crop yields are still using subjective methods, such as interviews with farmers and agrochemical/fertilizer sales representatives.
For improved yield estimates, crop simulation models are important tools that make it possible to evaluate the influence of the weather on crop production.According to Confalonieri (2012), these models have been used to identify the best time for sowing, the climate risk associated with the crop and the yield gaps as a function of the interaction between the genotype and the environment under climate change scenarios.
Crop simulation based on agrometeorological models seeks to represent and understand biomass accumulation by plants in their different phenological stages because the impact of climate factors varies with the crop phase (ASSAD et al., 2007;POPP et al., 2003).
To estimate the soybean yield in an accurate way, some authors (ASSAD et al., 2007;BERKA et al., 2003;CONFALONE et al., 2010;MELO et al., 2008) have used agrometeorological models that consider water deficits as the main factor in crop yield reduction.Water deficiency, according to Geerts and Raes (2009), has caused significant impacts on agriculture around the world, mainly for rain-fed crops.In addition, these authors emphasize the importance of soil water availability as a tool for the feasibility of agricultural projects in some regions where water deficiency is substantial.
For soybeans, the sub-periods that are more sensitive to water deficits, and consequently to yield break, are flowering and grain filling (DOORENBOS; KASSAM, 1994).When water deficit occurs during these phases, the potential yield will be reduced to an attainable yield.
According to Pereira et al. (2002), Andarzian et al. (2008) andLobell et al. (2009), the crop potential yield (Yp) is defined as the productivity obtained by a variety that is completely adapted to the environment of cultivation from sowing/planting through physiological maturity and for which water conditions, plant nutrition and phytosanitary problems are not limiting factors.
Therefore, the interannual variability of Yp is a function of meteorological conditions during the crop cycle.Recent studies showed a significant range in values for soybean Yp, from 3 ton ha -1 , as estimated by the DSSAT CROPGRO-Soybean Model for 21 regions in India (BHATIA et al., 2008), to yields close to 8 ton ha -1 for some soybean cultivars, as estimated by the STICS crop model in eastern Canada (JÉGO et al., 2010).
Due to the variety of methodologies used to estimate soybean yield, the aim of this paper was to calibrate and test the performance of the FAO agrometeorological model to identify the most suitable methodology to calculate the potential yield.This method was then used to estimate the actual yield during the crop seasons from 1990 to 2007 in Passo Fundo (Rio Grande do Sul State), Londrina (Paraná State) and Dourados (Mato Grosso do Sul State) in southern Brazil.

Material and methods
This study used daily meteorological data obtained during the period from 1990 to 2007 for the variables of maximum and minimum air temperature (°C), sunshine hours and rainfall (mm) in Passo Fundo, Rio Grande do Sul State (28°15'46"S, 52°24'24"W, 684 m), Londrina, Paraná State (23°08'47"S, 51º19'11"W, 610 m) and Dourados, Mato Grosso do Sul State (22°13'18"S, 54°48'23"W, 430 m), which are traditional regions for soybean production in Brazil.These data were obtained from Embrapa Trigo (Passo Fundo, Rio Grande do Sul State), IAPAR (Londrina, Paraná State) and Embrapa Agropecuária Oeste (Dourados, Mato Grosso do Sul State).The weather dataset was converted into a ten-day scale, which divides each month into 3 periods.To estimate the soybean yield, two methods were used to determine Yp.The first method is the Agroecological Zone Method (AZM) (DOORENBOS; KASSAM, 1979), which is shown in equation 1. where: Yp is the final potential yield, calculated as the sum of the 10-day periods; i is the 10-day period considered, varying from 1 to 12, totaling 120 days; PPBp is the total potential yield in each 10-day period, expressed as dry matter of a standard crop (kg DM ha -1 ); c resp is the coefficient for maintenance respiration correction; c harv is the harvest index; c LAI is the coefficient for leaf area index correction, and c hum is the coefficient to add humidity to the grains.All coefficients are dimensionless.
The PPBp value was obtained by the sum of two components, PPBn (standard gross photosynthesis during cloudy periods) and PPBc (standard gross photosynthesis during clear sky conditions), considering that the energy available for photosynthesis is different in each distinct condition and thus affects the plant's energy absorption.The PPBc and PPBn values and all C coefficients were determined according to the methodology described by Pereira et al. (2002).
The second method assumed that Yp would be the highest value observed (YmO) in the historical series published by IBGE (1990IBGE ( -2007)), increased by 10, 20 or 30% to minimize factors that are not considered by the agrometeorological model, such as soil conditions and crop management.A similar procedure for potential yield determination was used by Kanemasu (1983).According to this method, Yp is determined by the following equation: where x is the percentage added to the maximum observed yield (10, 20 or 30%).
For the calculation of the actual yield (Ya), the relationship between relative yield break and relative water deficit, which are well correlated variables, was used (PATANÈ et al., 2011).
The actual yield was estimated by the product of the relative water deficit and the water response factor (ky) of each soybean phenological phase, as proposed by Doorenbos and Kassam (1994).The variable ky represents the relationship between the relative yield break and the relative water deficit ( ETc ETa 1  ), where ETc and ETa represent the maximum and actual crop evapotranspiration, respectively.
The maximum crop evapotranspiration (ETc) was estimated by the product of the crop coefficient (kc), proposed by Doorenbos and Kassam (1994), for each phenological phase of the soybean crop and the reference evapotranspiration (ETo), which was calculated by the Thornthwaite method using the effective temperature and was parameterized following Camargo et al. (1999).The actual evapotranspiration (ETa) was calculated using the crop water balance according to the method of Thornthwaite and Mather (1955).
The soil water holding capacity was defined as a soil with a medium texture and the capacity to hold 1 mm of water per 1 cm of effective depth.According to the results obtained by Bordin et al. (2008), the roots of the soybean can effectively reach a depth of approximately 50 cm, resulting in a total water holding capacity of 50 mm.
Because excess water can damage the soybean yield in the final phases of the cycle (from R1 to R8), a relative factor of water excess (FEx), which was suggested by Brunini et al. (1982) and parameterized by Camargo et al. (1986), was considered.This factor was calculated by: where ke is the water excess sensitivity coefficient and ∏ (fe) is the product of the 10-day water excess index corresponding to the period from the beginning of the flowering to the beginning of physiological maturity.
To calculate the 10-day (fe), the following equation was used: where EX is the water excess (mm), calculated by the crop water balance.
According to Brunini et al. (1982), the only restriction on using the water excess index is when the ETc is higher than the EX, in which case fe = 1.
The inclusion of the water excess factor (FEx) in the soybean crop simulation model is an important step because, according to Rhine et al. ( 2010), the occurrence of water excess during the reproductive phase reduces the number of flowers and pods and, thus, the final yield.The complete model used to estimate Ya is presented below: where: Ya m represents the final actual yield of the crop; m is the phenological phase considered; ky is the crop response factor to the water deficit, and; Ya m-1 is the actual yield of the previous phenological phase (for the first phenological phase, it is equal to Yp).
The values of the crop coefficients (kc), the water response factor (ky) and the water excess sensitivity coefficient (ke) used in this study were the same for all locations.
Because the current methods of the crop yield survey do not consider the date of sowing, the agrometeorological model was used to simulate different sowing dates, as recommended by the Soybean Crop Agricultural Zone, for the three locations in this study (BRASIL, 2009).
For Passo Fundo and Londrina, the sowing time varied from October 21 st to December 11 th .In Dourados, the sowing time was set during November only.After the simulations of Yp and Ya for each location and sowing date, the final Ya for each year was calculated by the arithmetic average of all sowing dates.
To evaluate the agrometeorological model with different Yp values, a simple linear regression analysis was carried out for each location.Then, the model was calibrated to adjust kc, ky and ke to minimize the differences between the observed and estimated Ya.
The performance of the agrometeorological model was evaluated based on the statistical indices related to precision, represented by the coefficient of determination (R 2 ), and accuracy, represented by the agreement index, d j (WILLMOTT et al., 1985), which was calculated using the following equation: where: Oi and Om are the actual and mean soybean observed yield, respectively; Pi is the estimated soybean yield, and j is an arbitrary positive power.
Usually, the agreement index (d j ) is calculated with j = 2; however, Legates and McCabe Jr. (1999) recommended a modified index of agreement (d j ), calculated with j = 1.The advantage of d 1 is that errors are given their appropriate weighting and are not inflated by their square values.For that reason, d 1 was used.
The confidence index, represented by C (CAMARGO; SENTELHAS, 1997), was calculated between the product of the correlation coefficient (r) and the original agreement index (d 2 ).The classification of the model performance by the C index is presented in Table 1.The mean absolute error (MAE, kg ha -1 ) and the mean percentage error (MPE, %) were also evaluated.
In addition to the analysis using the statistical index, the observed and estimated Ya data were also evaluated using Student's t-test with a 5% probability to determine whether the differences between the data were significant.

Results and discussion
The soybean phenological phases were subdivided according to the classification proposed by Fehr and Caviness (1977).The number of days considered in each phenological phase was: 10 (S -V2); 50 (V2 -R1); 30 (R1 -R5); and 30 (R6 -R8).In Table 2, the kc, ky and ke calibrated values are presented for each phenological phase.These coefficients were calibrated according to the best fit between the estimated and observed Ya to obtain the highest values of R 2 and the smallest values of MAE for each location.Table 3 presents the statistical coefficients obtained by the relationship between the observed and estimated Ya with the different methods to determine Yp.As shown in Table 3, the best way to determine Yp for estimating Ya varied among locations.In Passo Fundo, the best results were obtained with the addition of 20% to the maximum observed yield (YmO), while in Londrina, the best statistical indices were obtained with a 30% addition to YmO.In Dourados, the best fit was found with the addition of 10% to the YmO.
For Passo Fundo, the best indices related to the precision (R 2 ), accuracy (d 1 ) and confidence (C) were 0.64, 0.63, and 0.70, respectively.The mean absolute error associated with the estimates was 303.1 kg ha -1 , which corresponds to approximately 5 bags of 60 kg per hectare, corresponding to a percentage error of 8%.
The best statistical results for Londrina were as follows: R 2 = 0.46, d 1 = 0.89, and C = 0.63.The mean absolute error was 289.3 kg ha -1 , which corresponds to almost 5 bags of 60 kg per hectare, corresponding to a mean percentage error of -8%.
In Dourados, the best statistical indices associated with the estimates were as follows: R 2 = 0.70, d 1 = 0.81, and C = 0.80.The mean absolute error was 258.8 kg ha -1 , which corresponds to approximately 4 bags of 60 kg per hectare, corresponding to a mean percentage error of 1%.
According to the performance classification of the models proposed by Camargo and Sentelhas (1997) that is presented in Table 1, the calibrated agrometeorological model performance was good for Passo Fundo, reasonable for Londrina and very good for Dourados.
Figure 1a presents the relationship between the observed and estimated Ya, which was calculated considering the potential yield as the highest YmO increased by 20% for Passo Fundo. Figure 1b presents the interannual variability of the soybean actual yield for the same location, reported by IBGE and estimated by the agrometeorological model.In Figures 1a and b, it is clear that the model can reproduce the Ya variability observed in the field, mainly in the critical years 1991 and 2005, when Ya fell to less than 1000 kg ha -1 as a result of the intense water deficit observed during the growing season, which was 150 and 200 mm, respectively.
The average soybean yield data obtained by the agrometeorological model in Passo Fundo (2,116 kg ha -1 ) was similar to the data reported by IBGE for the same period (2,053 kg ha -1 ); the values were not statistically different.However, the correlation coefficient (r) between the observed and estimated yields was significant (r = 0.80).
For Dourados, the results obtained by the model did not show a clear tendency to overestimate or underestimate high or low yields.Thus, this was the location where the model performed the best.Figure 2a presents the relation obtained by the simple linear regression between the observed and estimated yields in Dourados, and Figure 2b presents the interannual variability.similar to the observed yield (2150 kg ha -1 ).These yields were not significantly different but had a significant correlation coefficient (r = 0.84).The lowest yield in the series for Dourados (Figure 2) was obtained in the 2004-2005 season.This occurred, according to Minuzzi et al. (2010), because of the low percentage of seed germination during this season, which was induced by the high soil temperature and low water availability.It is clear that for years in which the model estimated the yield with high accuracy (for example, 1999, 2000, 2001 and 2004), the climate factors were primarily responsible for yield break.In years when the errors were higher, soybean yield was affected by other factors primarily related to crop management, such as fertilization and pest and disease control (LOBELL et al., 2009).Assad et al. (2007) also noted some divergence between the results generated by the FAO model and the average observed yield data for Mato Grosso do Sul.Specifically, for Dourados, the model presented errors ranging from +3 to -17%.
The simple linear regression between the estimated and observed Ya in Londrina was similar to that for Passo Fundo, with a predominance of overestimation (Figure 3a) mainly from 1997 to 2005 (Figure 3b).The average yields were 2,210 and 2,400 kg ha -1 for the estimated and observed Ya, respectively.The correlation between the estimated and observed Ya (r = 0.68) was statistically significant.
Most of the differences between the estimated and observed Ya values in Londrina arose because the model was not able to reproduce the technological level adapted in the region, even when Yp was set to 1.3*YmO.Some exceptions, such as 1991 and 2006, were related to specific scenarios for the soybean crop, where economic and cultural contexts limited soybean yield when the model projected an increase.
Using the simulation model SOYGRO for the soybean crop in Londrina, Klosowsky (1997) obtained similar errors, varying from -20 to +20%, among the different sowing dates and growing seasons.
Considering the dataset for the three locations studied (Figure 4), it is possible to observe a satisfactory performance of the model, with a small underestimation of approximately 5%, resulting in a coefficient of determination (R 2 ) of 0.64, an agreement modified index (d 1 ) of 0.52 and a confidence index (C) of 0.71.According to Camargo and Sentelhas (1997), the model can be classified as having good performance (Table 3).On average, the absolute error was approximately 5 bags of 60 kg per hectare or 300 kg ha -1 .Based on the results, the use of the agrometeorological model can be recommended for evaluating the climate risks for soybean production in different parts of Brazil because it can reproduce the yield variability observed.However, the determination of Yp should be performed in accordance with the technological level of each location/region to obtain better results (LOBELL et al., 2009).The method for determining the Yp for soybean crops varied among locations.From the methods used, the calculation of Yp by the Agrometeorological Zone Method generated the highest errors in Passo Fundo and Dourados and is not recommended.
The best way to determine Yp for the soybean crop was by adding a percentage to the maximum yield of the historical series.In Passo Fundo, this percentage was 20%, whereas in Londrina, it was 30%, and in Dourados, it was only 10%.The performance of the agrometeorological model for estimating soybean yield also varied among locations.It was classified as good in Passo Fundo, reasonable in Londrina and very good in Dourados.These results are remarkable, considering that there are several others factors that can affect soybean actual yield, such as crop management.
When the entire data set was analyzed as a whole, the performance of the crop model remained good.However, more importantly, the model demonstrated its ability to simulate all ranges of yield variation under very different environments, such as the three locations studied.Based on these results, we recommend the use of agrometeorological models for soybean yield simulation in studies with the objective of evaluating the climate risk associated with water deficit.

Conclusion
The best way to determine Yp for the soybean yield estimate was by adding a percentage to the maximum yield of the historical series.By this way, the used model demonstrated its ability to simulate all ranges of yield variation under very different climatic conditions, showing the potential of this tool for developing studies about climatic risk for soybean production in Brazil as well as for improving the yield monitoring by the state and national governmental agencies.

Figure 1 .
Figure 1.Relationship between the observed and estimated soybean actual yield (Yp = 1.2*YmO)(a) and their interannual variability (b) in Passo Fundo, Rio Grande do Sul State.

Figure 2 .
Figure 2. Relationship between the observed and estimated soybean actual yield (YP = 1.1*YmO) and their interannual variability (b) in Dourados, Mato Grosso do Sul State.During the study period, the average actual yield estimated in Dourados (2170 kg ha -1 ) was very

Table 1 .
Classification for model performance based on the confidence index (C).

Table 2 .
Crop coefficient (kc), water response factor (ky) and water excess sensitivity coefficient (ke) calibrated for each phenological phase of the soybean crop.

Table 3 .
Statistical indices of model performance for estimating actual yield (Ya) in Passo Fundo (RS), Londrina (PR) and