Agrometeorological models for groundnut crop yield forecasting in the Jaboticabal, São Paulo State region, Brazil

Forecast is the act of estimating a future event based on current data. Ten-day period (TDP) meteorological data were used for modeling: mean air temperature, precipitation and water balance components (water deficit (DEF) and surplus (EXC) and soil water storage (SWS)). Meteorological and yield data from 1990-2004 were used for calibration, and 2005-2010 were used for testing. First step was the selection of variables via correlation analysis to determine which TDP and climatic variables have more influence on the crop yield. The selected variables were used to construct models by multiple linear regression, using a stepwise backwards process. Among all analyzed models, the following was notable: Yield = 4.964 x [SWS of 2° TDP of December of the previous year (OPY)] – 1.123 x [SWS of 2° TDP of November OPY] + 0.949 x [EXC of 1° TDP of February of the productive year (PY)] + 2.5 x [SWS of 2° TDP of February OPY] + 19.125 x [EXC of 1° TDP of May OPY] – 3.113 x [EXC of 3° TDP of January OPY] + 1.469 x [EXC of 3 TDP of January of PY] + 3920.526, with MAPE = 5.22%, R = 0.58 and RMSEs = 111.03 kg ha.


Introduction
The groundnut, one of the major oilseeds produced in the world, belongs to the family Leguminosae. The plant has a high energy value, is rich in nutrients and is use to produce a wide range of products derived from the grain, like oil such as peanut butter and 'in natura' peanut (FREITAS et al., 2005).
According to the National Supply Company (CONAB, 2012), the Brazilian production in 2010/2011 compared to 2009/2010, was reduced by 32%, with a 12% reduction in the planted area. These reductions were mainly caused by climatic factors, such as excessive rainfall at the end of the peanut crop cycle.
São Paulo, and the Jaboticabal region in particular, accounts for almost 70% of the entire Brazilian product. This region is distinguished by their average yield, of 3.7 ton. ha -1 , which is much higher than the national average of 2,8 t ha -1 and the global average of 1.6 ton. ha -1 (IEA, 2011).
To estimate the production of an agricultural crop, in Brazil, a survey system is used based on opinions of technicians and economists from of each sector. This survey system is considered a subjective method because it does not allow for a quantitative analysis of the errors involved. An alternative subjective method is the use of crop models, known as agrometeorological models which express the influence of meteorological elements on crop yield. According to Rossetti (2001), 95% of claims paid by agricultural security entities are related to drought or excess rain. Modeling is a mean to quantify these climates risks, estimate yields and devise strategies to minimize their impacts in a rapid and cost-effective manner.
In São Paulo, the recommended groundnut sowing date, consider climatic condition of each region. Generally, the months of October and November are the most suitable to sowing the groundnut known as the "water crop". The "dry crop" is typically sowed in February (GODOY et al., 1986).
The groundnut cycle ranges from 90 to 115 days for early varieties and from 120 to 140 days for late varieties. Depending on the weather the water requirements range from 500 to 700 mm for both cycles (DOORENBOS; KASSAM, 1979). According to Silva and Rao (2006), in general, water deficits during the growing season cause flowering delays and extend the crop cycle, thereby delaying harvest and reducing yield. The phase of flowering and pod formation is highly sensitive to water stress.
The harvest must be performed in the appropriate season. Harvests at an unsuitable time may lead to considerable losses, both in the quantity and quality of seeds. Earlier harvests result in considerable quantities of immature and poorly formed seeds, whereas late harvests can lead to the greater deterioration of seeds (CARVALHO et al., 1976). Soil moisture can severely damage and reduce the quality of the seeds, most likely because of the occurrence of rain (TOLEDO; MARCOS FILHO, 1977); subsequent to harvest, seed germination may occur with the pod (SAVY FILHO;LAGO, 1985), resulting in the deterioration of the seeds during storage.
Studies of the water balance should be developed to facilitate an understanding of the relationship between culture and climate, which allows for an adjustment of the crop climatic conditions, thereby avoiding disastrous consequences of defective agricultural planning (TUBELIS, 1988). The main climate elements, according to Ometto (1981) and Vianello and Alves (1991), are the precipitation, air temperature, solar radiation, atmospheric moisture, wind and atmospheric pressure. Doorenbos and Kassam (1979) reported that, to obtain a high yield, a rainfed crop requires approximately 500 to 700 mm of water for the entire period of growth. The indices of drought sensitivity (Ky) for the phases of establishment, vegetative growth, flowering and maturity are 0,2; 0,2; 0,8 and 0,6 respectively, indicating that the flowering period is more sensitive to water deficit (DOORENBOS; KASSAM, 1979).
According to Robertson (1983) yield-climate interactions can be quantified using models and studying the variations and effects of climate on plant performance.
Agricultural simulation models can be understood as empirical or mechanistic mathematical equations that aim to simplify reality and represent the biomass accumulation and plant development as well as estimate their yield according to the influencing factors. Empirical models are those based on regression analyses, whereas mechanistic models quantify and seek to understand the physical and biological interactions of plants with their development and environment (ZHU, 2010).
The knowledge of the space-temporal variability of long-term series of meteorological data can assist in the identification of better areas and periods for sowing for the crops, as well as provide important information on possible climatic trends. The union of the knowledge of space-temporal variability is a fundamental step for reducing climatic risks associated with the agricultural sector (BLAIN, 2009).
Few agrometeorological models are available to estimate the yields of groundnut crops (ESPOSTI, 2002;MARIN et al., 2006). Assunção and Escobedo (2009) developed a model to estimate the crop yield as a function of the water availability. However, as mentioned above, most models estimate but do not forecast the yield. For example, Challinor et al. (2003) indicated that yield estimations and forecasting can be improved if we relate crop models to weather forecasting.
This study aimed to identify the climatic elements in ten-day dataset that influence the annual groundnut yield in the Jaboticabal region. The other objective was to develop agrometeorological model (s) for regional groundnut yield forecasting.

Material and methods
We used agrometeorological data that were obtained from 1990 to 2010 of the Agrometeorological Station of the Department of Exacts Science -FCAV/Unesp, Jaboticabal, located at latitude 21°14'05'', longitude 48°17'09'' and an altitude of 600 m. The mean air temperature and precipitation were used for estimating the ten-day period (TDP) reference evapotranspiration (ETo) according to the Thornthwaite (1948) method. The components of the water balance (WB) proposed by Thornthwaite and Mather (1955), such as the deficit (DEF), water surplus (EXC) and soil storage water (SWS) were calculated using an available water capacity (AWC) of 100 mm.
The groundnut production data were obtained from the Institute of Agricultural Economy (IEA) for a period of 20 years, from 1990 to 2010. The yield data were adjusted as proposed by Prela-Pantano et al. (2011), to remove the technological trend. This adjustment is necessary to minimize effects due to changes in the technological level employed by producers, thereby obtaining the influence of the climate variability on the yield.
A linear correlation analysis (r, Pearson) was performed using a TDP of SWS, DEF and EXC of the harvest year and the same elements of the previous year. The variables with the best correlations were selected for modeling. The models were constructed by multiple linear regression (Y = a.X 1 + b.X 2 + c.X 3 + … + LC) where Y is the yield of the harvest year, the independent variables are the climatic elements and LC is the linear coefficient.
In modeling, especially with the use of many independent variables, the main problem is the selection and combination of the variables. A stepwise backwards regression method was used, based on the criterion of accuracy (R 2 adjusted) improvement. A total of 511 regressions were obtained and analyzed by combinatory analysis. Only models shaving at least 0,05 significance (P value) in the parameters and in the regression were selected.
After this step, 28 models were chosen for the region. These models were evaluated using an accuracy analysis based on the mean absolute percentage error (MAPE), precision analysis based on the adjusted coefficient of determination (R 2 ) and tendency analysis based on the Root Mean Systematic Error (RMSEs) (Equations 1, 2 and 3, respectively). We used a 14 year period (1990 to 2004) for the model calibration and a 6 year period (2005 to 2010) for testing (or validation).
where: Yest i : estimation of year 'i' yield; Yobs i : observed yield (without technological trend) in year 'i'; Y estC : yield estimated by simple linear regression between the observed (Yobs i ) and estimated (Yest i ) yield; N: number of years; n: number of data and k: number of independent variables in the regression.

Results and discussion
The period of analysis showed a wide variability in the crop yield ( Figure 1A). This irregularity occurred due to climate variables and changes in the level of technology that was available. A 6° order polynomial adjustment was applied to minimize the effects of the agricultural technology level that was used ( Figure 1A) and to clearly demonstrate the effects of climate on the crop yield. The adjusted yields ( Figure 1B) are used throughout the remainder of the paper.
The occurrence of heavy rainfall in 1992/1993 season (Figure 2), and the lack of heavy rain in the 1991/1992 season during the final phases of crop development (flowering and fructification) had a considerable negative effect on the groundnut yields in those years. According to Gillier and Silvestre (1996), groundnut flowering and fructification are dependent on certain climatic elements, and a change in the water availability directly and intensely affects the crop yield.
An analyzing of the year's corresponding to the lowest and highest yields at Jaboticabal showed that the distribution of rainfall directly affects the groundnut yield. The 'water crop', which is the more common variety in the region and is normally sowed in mid-November had the lowest yield in 1998 (Figure 3). This low yield occurred due to DEF in January that affected the flowering phase and DEF from March to mid-November that most likely had a negative influence on the fructification and the initial phase of maturity.  The 2004 season (Figure 4) had the highest yield due to the occurrence of large EXC between January and February at the appropriate times (flowering and fructification) and DEF between March and December favoring the maturation and harvest of the crop.
A correlation analysis was performed to select elements of the water balance (WB) within the ten-day period (TDP) (independent variables) that had the greatest influence on the crop yield (Table 1). These selected variables were used to construct the agrometeorological models.   Table 1. Selected coefficient of correlation (r) among the most important meteorological variables and yield. Legend: X 1 : soil water storage (SWS) of the 2° TDP of December of the previous year (OPY); X 2 : SWS of the 2° TDP of November OPY; X 3 : water surplus (EXC) of the 1° TDP of February; X 4 : SWS of the 2° TDP of February OPY; X 5 : water deficit (DEF) of the 2° TDP of December; X 6 : EXC of the 1° TDP of May OPY; X 7 : EXC of the 3° TDP of January OPY; X 8 : EXC of the 3° TDP of January; X 9 : yield OPY. PROD X 9 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 PRO D 1 In the calibration, the models '1' to '7' (Tables 2 and 3) showed the high accuracies having MAPE values found of less than 3,25%. Considering the average yield in the region of 2780 kg ha -1 , the observed MAPE corresponds to 90 kg ha -1 , or 1.5 sacks of 60 kg ha -1 . The independent variable X 6, which represents the EXC of the 1° TDP of the May OPY, was important in all models, except model '5'. This importance can be explained by the fact that the timing of the end of maturation and early harvest period coincides with this variable ( Figure 5). The models were also evaluated with respect to their ability to forecast yields prior to the harvest or before the crop sowing. For this reason, the first four models (Table 2) are most interesting for discussion. The models not only present the best accuracy (MAPE) but they also ignore the variable X 5; therefore these models can forecast the yield prior to the sowing date. Whenever possible the variable X 5 , should be discarded because it uses the 2° TDP of December of the productive year, thereby preventing a forecast of the yield.
The second step of the study was to test the selected models using independent data from the period of 2005 to 2010. The first eight models showed high accuracy (Table 3), high precision and low tendency, as evidenced by, MAPE values less than 8,25%, a minimum R 2 of 0.56 and a minimum RMSEs of 110.86 kg ha -1 .
Among the top four models (Tables 2 and 3), model '1' showed the best accuracy in the tests, with a MAPE of 0.99% and precision (R 2 ) of 0.6 ( Figure 6b). This model is interesting because it uses just four variables: X 6, X 7 , X 8 and X 9 (Table   2). These variables are related to the EXC of 3° TDP of January of the same year and the EXC of the 3° TDP of January and the 1° TDP of May OPY. According to Silva and Rao (2005), this result confirms that in January, the crop is generally at the beginning of the flowering phase, which requires the greatest water supply ( Figure 5).
Models '3' and '4' showed good accuracy with MAPE values of 5.22% and 5.73%, respectively. In both models the variable X 9 (Table 2) was not used, so it is possible to forecast the yield using only meteorological data.

Conclusion
The results showed that the following meteorological elements had the greatest influence of the groundnut yield at Jaboticabal: a) soil water storage of a 2° ten-day period (TDP) of February, November and December of the previous year (OPY); and b) water surplus of the 3° TDP of January and 1° TDP of May OPY before the seasonal year and the 3° TDP of January and 1° TDP of February of the seasonal year.
Models '1', '2' and '3' were the most accurate. Model '3' uses only meteorological data and, thus does not require using the yield OPY in its calculations.