Rainfall data from the European center for medium-range weather forecast for monitoring meteorological drought in the state of São

This study evaluated the use of rainfall data from the European Center for Medium-Range Weather Forecast (ECMWF) for monitoring meteorological drought in the state of São Paulo, based on rainfall data and on the Standardized Precipitation Index (SPI). Rainfall data from ECMWF were obtained from pixels corresponding to 24 meteorological stations of the state of São Paulo. The consistency of the surface data was evaluated by the following tests: Standard Normal Homogeneity Test (SNHT), Buishand and Pettitt. In order to evaluate the agreement between the surface data and the ECMWF data, for cumulative rainfall and SPI values, the following measures of accuracy were used: Willmott index of agreement (d2), modified Willmott index of agreement (d1), and mean absolute error (MAE). Higher concordances were found in the dry period (June to September). In the wet period (December to March), the ECMWF overestimated rainfall data in up to 75% of localities when compared to meteorological station data. The results indicated that the use of SPI increases the agreement between data from the ECMWF and the meteorological stations, compared to rainfall series. The highest correlations were found in the dry period leading to the conclusion that the ECMWF performs better during this period.


Introduction
Drought can significantly impact several sectors of society due to its diversified temporal and geographical distribution.Droughts can affect different climatic regions, causing different socioeconomic impacts (Pappenberger, Wetterhall, Dutra, & Giuseppe, 2013).In general terms, World Meteorological Organization (WMO, 1992) defines this phenomenon as " […] an abnormally long period of dry weather where lack of rainfall causes hydrological imbalance" The scientific literature recognizes several types of drought (Wilhite & Glantz, 1985), and the meteorological drought occurs when cumulative rainfall in a period and in an area is significantly below the expected climatological value Wilhite (2000).
In relation to drought monitoring, Dutra et al. (2014) observed a decrease in the number of Acta Scientiarum.Technology, v. 40, e34947, 2018 meteorological stations available at the global level.For Pereira, Angelocci, and Sentelhas (2002), Brazil does not yet have a network of meteorological stations that meets its needs.Dinku et al. (2007) analyze as the main problem that in regions with low density meteorological network, a station can be mistakenly used to represent an area with high spatial and temporal heterogeneity due to its topography.In this type of region, the use of rainfall data obtained by means of remote sensing allows the more precise use of hydrological models (Yilmaz, Adler, & Tian, 2010).
Data provided by the ECMWF model uses information collected from surface meteorological stations, weather radars, satellites, and other sources.Data from the ECMWF correspond to individual pixels of 0.25 degrees (approximately 25 X 25 km), covering the entire surface.For Sheffield, Wood, Chaney, and Guan (2014), the use of satellites in remote sensing to achieve a possible increase in the density of in situ observation stations has obvious benefits.For Mwangi, Wetterhall, Dutra, Giuseppe, and Pappenberger (2014), the use of dynamic rainfall prediction models, such as ECMWF in combination with drought indices, such as SPI, may lead to a better description of the duration, size and spatial extent of the drought.
The goal of this study was to quantify the accuracy of the ECMWF rainfall data in comparison to meteorological station data, both in the Standardized Precipitation Index (SPI) and in the raw data, for monitoring meteorological drought in the state of São Paulo, considering a monthly time scale.

Material and methods
The study was conducted based on rainfall series of meteorological stations in the state of São Paulo and on data provided by the European Center for Medium-Range Weather Forecast (ECMWF).Surface data comes from 24 meteorological stations of the Integrated Center for Agrometeorological Information (Ciiagro/IAC).The stations are distributed as illustrated in Figure 1.According to Bardin, Camargo, and Moraes (2012), the grid points of the model closest to the rain gauges were used to compare the values obtained by the ECMWF model and by surface stations.The common period between 1995 and 2014 was used for the calculations of this study, to be consistent over the same period for all locations between the data provided by the ECMWF and data from the surface stations.All tests were calculated at 5% significance.To ensure the reliability of climate studies, the data must be reliable and homogeneous.Analysis with non-homogeneous data may imply erroneous conclusions (Santos, Sediyama, Oliveira, & Abrahão, 2012).Domonkos (2006) evaluates that there are still some doubts about the efficiency of the tests, and it is not possible to select a method with perfect capability of detecting non-homogeneity.Therefore, Meirelles and Vasconcelos (2011) suggest that the application of several statistical tests for the detection of heterogeneities is very useful.According to Sahin and Cigizoglu (2010), the Standard Normal Homogeneity Test (SNHT) (Alexandersson, 1986), by Buishand (1982) and Pettitt (1979) are regularly used in climatology to identify heterogeneities in meteorological series.
The SNHT, Buishand and Pettitt tests were applied to monthly data from all meteorological stations.For the classification of the series tested for homogeneity, we used the criterion proposed by Wijngaard, Klein, and Können (2003).This classification has given credibility to the use of climatic data series in studies of variability, trend and climatic extremes.
The Standardized Precipitation Index (SPI) represents the number of standard deviations of a particular rainfall event in relation to the series median.The SPI calculation uses long-term rainfall records that are used in a cumulative probability distribution and then transformed into a standard normal distribution with zero mean and unit variance (Guttman, 1999, among others), allowing the comparison between different locations.SPI was calculated as described in Mckee, Doesken, and Kleist (1993), and the suitability of the gamma distribution for the calculation of this index was confirmed by the Kolmogorov-Smirnov test modified by Lilliefors (1967;1969).The critical values of this adhesion test were calculated according to Blain (2014).
The following statistical analyses were used to quantify the accuracy and precision of SPI and rainfall data, calculated on the basis of ECMWF data in relation to those obtained through meteorological stations: mean absolute error (MAE), Willmott index of agreement (d 2 ; Willmott, 1981) and modified Willmott index of agreement (d 1 ; Willmott, Ackleson, Davis, Feddema, and Klink, 1985).According to Willmott (1981), the index reflects the degree of precision in which the observed variable differs from the simulated variable.Because it is a dimensionless index, it is used for a wide variety of models, whatever their units.The Willmott index of agreement varies from 0 to 1, when the calculated value is 1, it reflects perfect agreement between observed and simulated data.However, when the calculated value is 0, it indicates total divergence between the data.These measures of precision and accuracy are calculated as follows Equation 1: where: o -observed data; e -estimated data; d 2 -Willmott index of agreement; d 1 -modified Willmott index of agreement.Although the d 2 index is widely used in studies to check the accuracy of a model, Legates and Mccabe (1999) and Willmott et al. (1985) indicate that the use of the quadratic function in the equation of the original index can result in high values of this index, even when there is no satisfactory performance of the estimator model.According to Legates and Mccabe (1999), the advantage of the modified index in relation to the original is related to the fact that the errors (ei-oi) are not influenced by the power of two, which, according to these authors, results in a more rigorous index.

Results and discussion
According to the classification by Wijngaard et al. (2003), the majority of localities presented homogeneous series.Among the 864 series (Figure 2; 24 sites, 3 tests and 12 months), a total of 29 series (3.35%; lower than the level of significance adopted) showed heterogeneity.The distribution of the heterogeneous series was irregular in space and time, and heterogeneities were detected in February, April, May, June, July, August, September and November.The cited heterogeneities are distributed in the tests used, as described below.In the SNHT test, 2.43% of the series presented heterogeneity.In the Buishand test, 5.2% of the series and the Pettitt test indicated heterogeneous results in only 2.43% of the cases.Following the classification proposed by Wijngaard et al. (2003), the analysis of Figure 3 indicated that 97.92% series can be considered useful, and only 2.08% can be considered dubious.February presented the highest concentration of doubtful series, with 4 locations registering heterogeneity for two tests, which corresponds to 17% series for the month.May and September presented a location considered doubtful, corresponding to 4% of the locations for each month.The other months had completely useful series, since none or one of the tests accused heterogeneity.It is worth mentioning that the series classified as doubtful, can be used if interpreted with caution.Also according to the methodology of Wijngaard et al. (2003), no series can be considered non-homogeneous.Thus, it was considered that all the series obtained through Ciiagro/IAC can be used in the present study.Figure 4 shows the index d 2 in the accumulated rainfall data.In general, March showed higher indices in the northern part and lower values in the southern part of the state, while April showed a trend of increasing the index in the west to east direction.June showed the highest values of d 2 (from 0.65 to 0.91), indicating that there is greater agreement between the meteorological station data with ECMWF data, as reported by Willmott (1981).In November and December, the indices were, in general, lower than the other months.Figure 5 shows the index d 2 for the SPI data.January and February presented a trend of high and homogeneous indices in the state, excluding Adamantina (0.62) and São Carlos (0.63) in January, Adamantina (0.63), Capão Bonito (0.55), São Carlos (0.64) and Tatuí (0.52) in February, which had indices below the other localities.March showed the stations with the lowest maximum values (0.83) of d 2 .The stations in June had homogeneous and high values (0.70 to 0.95), were the highest values of this time scale.July, August and September presented high values in almost all the stations, the only exceptions in those three months were São Carlos and Tatuí, distinguishing from the other stations for presenting lower agreement between the data.As of October, there was a reduction in the indices in all stations, indicating that the agreement between the data was lower from the beginning of the rainy period in the state.
Figure 6 shows the d 1 values of accumulated rainfall data.March and December had the stations with the lowest values of the index, all regions of the state had stations with very low indices, ranging between 0.29 and 0.56.June showed a high correlation of data from this index, ranging from 0.47 to 0.74.July, August and September had most stations with good agreement, but few stations have low agreement between the data.
Figure 7 shows the variation in accuracy of the d 1 index values for SPI.January and February showed homogeneity among the stations, in the first month, the index varied between 0.42 and 0.66, the exceptions occurred in Monte Alegre do Sul (0.72) and Piracicaba (0.71) because of higher values and in Capão Bonito (0.38) and Tatuí (0.37) with lower values, in February.March exhibited the lowest values, with 0.57 for the highest index, indicating that almost all the stations had an unsatisfactory result for the agreement of the data, representing a greater difference between the data studied from these locations.June had its stations with high values of the index, until the station with the lowest value (0.51) presented a high value when compared to other months, indicating a strong agreement between the data.Spatially, there was a predominance of stations with lower values in the eastern region of the state in August.In October, the stations began to present lower d 1 values, remaining this way until December, in that period the index varied between 0.25 and 0.66.In relation to Figure 8, it is clear the divergence between the results obtained, which varied between 0.34 and 1.12.Considering that values ≥ 1 for MAE may change the class of SPI (Mckee et al., 1993;Guttman, 1999), these values were considered unsatisfactory.Values less than or equal to 0.49 are considered satisfactory.The analysis of this index evidenced a high frequency of months ( 5) with unsatisfactory values.Although the other months had no value equal to or higher than 1, their MAE had reached a maximum value of at least 0.93, the only exception was June, which had a maximum MAE of 0.73.Even though June with a maximum value below the other months, this value can be considered high, since 0.73 is sufficient to change the SPI classification in up to 2 classes (Mckee et al., 1993;Guttman, 1999).The localities of Ilha Solteira, Jales, Pindamonhangaba, São Carlos and Tatuí did not reach satisfactory values in any month, alerting these stations to high MAE values.June presented the highest number of stations with satisfactory values ( 17), August presented 9, September 8, February, 2 and July, 1, the other months had no stations with values less than or equal to 0.49, totaling 37 occurrences of satisfactory values, accounting for 12.84%.
Comparing the values of accumulated rainfall with SPI values, in both d 1 and d 2 , higher values were found in the case of SPI in almost all meteorological stations and months, evidencing the greater agreement of the data between the meteorological stations and the ECMWF.It should be noted that the values for d 1 have always been lower than those for d 2 , corroborating Legates and Mccabe (1999), who claimed that d 1 is a more rigorous index than d 2 .Moreover, the agreement between the data studied was higher in June for these indices.Willmott's indices values increased between June and September, indicating greater agreement between the data and suggesting that high index values can be influenced by factors related to the periods.In the dry period, the index d 1 for SPI values varied between 0.34 and 0.79, the only exception was registered in São Carlos, where the index reached 0.24 in September.In the other months, with higher rainfall, the same index varied between 0.25 and 0.68, with only three exceptions achieved higher values, with Monte Alegre do Sul (0.72) and Piracicaba (0.71), in February and Pariquera-Açu (0.70), in May.By relating the three indices with SPI values, it is noted that March presented a high minimum value of MAE (0.61), while the stations of this month had the lowest maximum value for d 2 and lower indices for d 1 (maximum value of 0.57), indicating that this month obtained the worst result as it presented high error value and low value in the agreement indices.On the other hand, the month of June had 17 locations with values considered satisfactory for MAE, in addition to the results in d 2 (gradient of 0.7 to 0.95) and d 1 (lower value of 0.51) were the highest, reflecting that this month had a high level of accuracy and precision between the data.For the data of accumulated rainfall, again the month of June showed the best correlation of the data, varying between 0.65 and 0.91 in d 2 and from 0.47 to 0.74 in d 1 .In general, smaller errors and higher values of agreement were found in the period between June and September, the dry period in the state of São Paulo.The higher level of accuracy and precision of the indices in this period may be associated to rainfall formation factors in this region.These results are consistent with that demonstrated by Dutra et al. (2012), who noticed a decrease in the agreement between the datasets in the period with the highest rainfall intensity.Likewise, Thiemig, Rojas, Zambrano-Bigiarini, Levizzani, and Roo (2012) explain that in the dry period, where there is concentration of low intensity rainfall, there may be greater agreement between satellite product data and surface observations.Dutra et al. (2012) verified that data from the ECMWF presented higher values when compared to datasets observed in meteorological stations, suggesting an overestimation of the data through the satellite model.Kurnik, Barbosa, and Vogt (2011), when testing two sets of rainfall data in the SPI calculation, also evidenced the overestimation of the ECMWF values in relation to rainfall data in high rainfall sites.Besides that, Guo et al. (2015) indicate a significant overestimation of data in regions with large amounts of rainfall.All the satellites studied reveal a strong overestimation of data in the summer, which is associated to the higher volume and higher frequency of rainfall during this period.
Analyzing the possible overestimation of data in this study, Table 1 points that the period of lower rainfall (May to October) contained less frequency of overestimation of satellite data.The months of May to October presented, at most, 6 stations (or 25%) with higher ECMWF values in relation to the meteorological stations.The month of June, which had the best result in the concordance indices, presented only one station with an overestimated ECMWF value.Contrariwise, the rainy period months (November to April) presented at least 10 locations with overestimation of ECMWF data.Furthermore, the month of March (which presented the worst performance in the indices evaluated) obtained 18 of the 24 locations with higher ECMWF values in relation to the surface stations.Therefore, 75% stations in that month present higher values of the ECMWF when compared to the values of the meteorological stations.These results are consistent with studies of Sharifi, Steinacker, and Saghafian (2016) who assessed the monthly precipitation, and also found ECMWF data underestimating the values observed in the field in an area with low rainfall in Iran.The same was observed by Szczypta et al. (2011), where the authors observed that under more intense rainfall, there is a tendency of overestimation of this element by the satellite data system.In this way, it is necessary to associate that the occurrence of overestimation of satellite data negatively influences the results of the indices evaluated; the higher the frequency of satellite overestimation, the lower the performance.The discrepancy between the overestimation (underestimation) in the rainy period and the underestimation (overestimation) in the dry period became clear for the ECMWF data.Dry periods (June to September) always presented better results in the evaluation of monthly time scale data, independently of the index evaluated.Dutra et al. (2012) pointed out that the use of ECMWF products for monitoring purposes should be better assessed when compared to independent datasets, suitable for particular application and region studied.

Conclusion
The d 1 index should be more frequently incorporated in the studies related to the comparison of historical series, since its more rigorous result reduces the possibility of erroneous analysis between the data, in comparison to the index d 2 .
Due to the greater concordance between values of the ECMWF and the meteorological stations, the use of SPI allows the use of rainfall data estimated by ECMWF.The standardized variation of higher correlation of the data in periods of lower rainfall suggests that SPI should be used with greater reliability in periods of low rainfall, corresponding to June to September.

Figure 1 .
Figure 1.Location of the 24 meteorological stations in the state of São Paulo.

Figure 2 .
Figure 2. Percentage of homogeneity and heterogeneity in each month of the SNHT, Buishand and Pettit tests.

Figure 3 .
Figure 3. Percentage of the series in useful, doubtful or nonhomogeneous classes.

Figure 4
Figure4to 8 are used to demonstrate the spatial distribution of the indices d 1 , d 2 and MAE throughout the state of São Paulo.Figure4shows the index d 2 in the accumulated rainfall data.In general, March showed higher indices in the northern part and lower values in the southern part of the state, while April showed a trend of increasing the index in the west to east direction.June showed the highest values of d 2 (from 0.65 to 0.91), indicating that there is greater agreement between the meteorological station data with ECMWF data, as reported byWillmott (1981).In November and December, the indices were, in general, lower than the other months.Figure5shows the index d 2 for the SPI data.January and February presented a trend of high and homogeneous indices in the state, excluding Adamantina (0.62) and São Carlos (0.63) in January, Adamantina (0.63), Capão Bonito (0.55), São Carlos (0.64) and Tatuí (0.52) in February, which had indices below the other localities.March showed the stations with the lowest maximum values (0.83) of d 2 .The stations in June had homogeneous and high values (0.70 to 0.95), were the highest values of this time scale.July, August and September presented high values in almost all the stations, the only exceptions in those three months were São Carlos and Tatuí, distinguishing from the other stations for presenting lower agreement between the data.As of October, there was a reduction in the indices in all stations, indicating that the agreement between the data was lower from the beginning of the rainy period in the state.Figure6shows the d 1 values of accumulated rainfall data.March and December had the stations with the lowest values of the index, all regions of the state had stations with very low indices, ranging between 0.29 and 0.56.June showed a high correlation of data from this index, ranging from 0.47 to 0.74.July, August and September had most stations with good agreement, but few stations have low agreement between the data.Figure7shows the variation in accuracy of the d 1 index values for SPI.January and February showed homogeneity among the stations, in the first month, the index varied between 0.42 and 0.66, the exceptions occurred in Monte Alegre do Sul (0.72) and Piracicaba (0.71) because of higher values and in Capão Bonito (0.38) and Tatuí (0.37) with lower values, in February.March exhibited the lowest values, with 0.57 for the highest index, indicating that almost all the stations had an unsatisfactory result for the agreement of the data, representing a greater difference between the data studied from these locations.June had its stations with high values of the index, until the station with the lowest value (0.51) presented a high value when compared to other months, indicating a strong agreement between the data.Spatially, there was a predominance of stations with lower values in the

Figure 4 .
Figure 4. Willmott index of agreement for accumulated rainfall data.

Figure 5 .
Figure 5. Willmott index of agreement for SPI data.

Figure 6 .
Figure 6.Willmott index of agreement modified for accumulated rainfall data.

Figure 7 .
Figure 7. Willmott index of agreement modified for SPI data.

Figure 8 .
Figure 8. Mean absolute error for SPI values.

Table 1 .
Percentage of locations with overestimation of data by ECMWF.