Homogenous regions and rainfall probability models considering El Niño and La Niña in the State of Pará in the Amazon

The determination of homogeneous regions with precipitation and probability models when considering the El NiñoSouthern Oscillation (ENSO) phenomenon is important for the planning of water resources and for the study of how climate change affects precipitation regimes. Thus, six homogeneous regions with annual mean precipitation were determined through a cluster analysis using Ward's agglomeration method and applied to a historical series of 31 years (1960-1990) at 413 satellite monitoring points in the state of Pará, where the selected years occurred during an El Niño or a La Niña event. When adjusting the probability models, the chi-square test was applied to 413 monitoring points spread over the six homogeneous regions during years with a La Niña or an El Niño, as well as the complete set of years. The normal model (i.e., the normal function) had the best fit, with chi-square values below 3.84 (tabulated chi-square values). The model was validated using 12 rainfall stations of the National Water Agency (ANA), which were distributed across the six homogeneous regions. In this case, the chisquare test for the 12 stations also had values lower than 3.84. A good fit between the observed and the regionalized data demonstrated the potential of the methodology developed and used for estimating annual average precipitation probabilities.


Introduction
The determination of homogeneous regions and probability densities of rainfall under the influence of El Niño and La Niña is important to support water resource planning in the context of climate change.For example, the probability density of rainfall and its variation in relation to El Niño and La Niña are important for agricultural planning in a region.El Niño and La Niña are considered the main sources of climate changes, which produce moderate to intense droughts and flooding in the Amazon.In this region, there is a shortage of studies regarding the influence of El Niño and La Niña on the rainfall regime due to the lack of data monitoring.This problem is partly due to the large size of the region, which increases the costs of beginning and operating a network, as well as the costs and logistics of the transport of technical teams that are responsible for measurements and the collection of data to analyze the behavior of rainfall and water flow, which are crucial for hydrological studies.One alternative to this problem is to develop models for estimating rainfall in regions without rainfall gauge stations.
In hydrology, the term homogenous is used to describe regions that have hydrological similarities.These similarities include physical, climatic, biological and geological factors, as well as human actions and effects.The definition of homogenous is considered to be the stage of regionalization with the greatest degree of difficulty because it often requires subjective judgment.Among the methods that have been advocated for determining homogeneity, one that has given notable results is the cluster analysis method (Lyra, Garcia, Piedade, Sediyama, & Sentelhas, 2006;Yang, et al., 2010;Lyra, Oliveira-Júnior, & Zeri, 2014), which has also been analyzed and used in this study.
For example, Lyra et al. (2006) defined homogenous regions based on the seasonality of monthly rainfall precipitation in the state of Táchira, Venezuela.The utilized data were monthly precipitation values from 25 climatological stations for a period from 24 to 62 years.The Ward method (Ward, 1963) was applied to a cluster of months with similar monthly precipitation values, resulting in 7 homogenous regions for the state.Modarres (2006) studied regional rainfall in Iran using a cluster analysis with the Ward method, as well as Euclidean distance as a measure of similarity, to determine climatically homogenous regions, which resulted in eight regions for the country.Raju and Kumar (2007) applied a cluster analysis (fuzzy cluster analysis -FCA) and Kohonen artificial neural networks (KANNs).These methods were used to classify 159 meteorological stations in homogenous regions in India.Eight parameters were considered when classifying regions: latitude, longitude, altitude, average temperature, humidity, wind velocity, hours of sunlight and solar radiation.Based on the Davies-Bouldin index, which is determined by the ratio between the internal dispersion of the clusters and the distance between the clusters, 14 homogenous regions were formed.The FCA approach had a better performance in relation to artificial neural networks.Gaál, Szolgay, Lapin, and Fasko (2009) used a regional frequency analysis method for precipitation based on the estimation of parameters for a regional distribution function using L-moments to define homogenous regions.The authors used a hybrid cluster technique, with a subjective analysis of precipitation data from rainfall stations (based on physical and geomorphological characteristics), which were combined with an objective analysis (i.e., cluster analysis) using statistical tools to define homogenous regions.Yang et al. (2010) used a cluster analysis to determine four variables from the 42 selected rainfall stations to describe the rainfall regime in the region: latitude, longitude, altitude and average annual rainfall.Bravo et al. (2012) used the annual average of daily precipitation to group climatological stations into clusters using the k-means procedure and a principal component analysis with varimax rotation.The authors identified two clusters that occupied northwestern and north-central Mexico, as well as another cluster at the center of the territory.The groups were compared against the results of previous studies, indicating that regions are invariant over time and space and independent of the method used for aggregation and the sampling of stations.Lyra et al. (2014) identified spatial and temporal rainfall patterns for the northeastern Brazilian state of Alagoas using a cluster analysis and related those patterns with weather systems that occur over that region.The authors concluded that rainfall was not uniformly distributed over space and time in all regions; the two groups within the arid zone, where more than 60% of the annual precipitation occurred in a period of 5 months (i.e., March to July), is an example.
Thus, the objective of this paper is to contribute to defining homogenous regions of rainfall and estimating the probability density of average annual rainfall in the Amazon (more specifically in the state of Pará) by considering El Niño and La Niña climate anomalies to support water resource planning in the context of climate change.For example, the probability density of rainfall and its variation in relation to El Niño and La Niña are important for agricultural planning in certain regions.Moreover, as explained by Silva Dias (2006) and Sanches, Verdum, and Fisch (2013), the analysis of historical series of temperature and precipitation in some regions of the world may indicate that a change in climate is occurring.Mendonça (2006) also explains that variations in the components of the hydrological cycle (e.g., precipitation and evapotranspiration) will affect the distribution of water on the planet in a differentiated manner, leading certain regions to receive increased volumes of water and, therefore, intensifying the occurrence of torrential rainfall, flooding, landslides, mass movements and erosion processes.

Material and methods
For this paper, the studied area is the state of Pará, which is located in the north region of Brazil and part of the Amazon.Almost all of this area is a humid tropical forest, except for parts where there are grassland formations (i.e., the lower Trombetas River region and the Marajó archipelago) (Figure 1).Raw data containing the geographic location and average monthly rainfall for the points studied were obtained from the Climate Research Center at the University of Delaware-Newark, USA (http://climate.geog.udel.edu/~climate/html_pages/sa_ts_P.html).The data of average monthly rainfall from 413 sampling points were used in the state of Pará for a 31-year historical series .The 413 points were the result of a 0.5 by 0.5° grid (latitude and longitude), where the average monthly precipitation for each point was interpolated through a spherical version of the Shepard algorithm (Webber and Willmott, 1998).In this case, the average number of neighboring stations that influence each estimate was twenty.The years were selected from 1960 to 1990, when El Niño (19 years) and La Niña (13 years) events occurred.In some years, the occurrence of both, the El Niño and La Niña phenomenon was observed (years where both phenomena occurred : 1965, 1970, 1973, 1976, 1983 and 1988).Information was supplied from the site for the National Institute of Space Research (INPE) in Brazil (http://enos.cptec.inpe.br).Altitudes (Figure 2) were obtained from Gonçalves, Blanco, Santos, and Oliveira (2016).To divide the state of Pará into homogenous regions, the cluster analysis was used, with the main objective of grouping individuals (i.e., variables) based on their characteristics.Thus, four variables were selected: longitude, latitude, altitude and precipitation.However, before carrying out the cluster analysis, it was necessary to standardize those variables because they did not have the same unit of measurement.Standardization was performed so that the variables contributed equally to similarities between individuals.To measure the similarity between the groups, the Euclidian distance was used, which is a common measure when grouping data.Thus, the Ward clustering method was used.It is a hierarchical method that forms groups to always achieve the smallest internal error between the vectors that comprise each group and the average vector of the group.This method is equivalent to seeking the lowest standard deviation between the data for each group.Further details of Ward's method can be found in Gonçalves et al. 2016.
Based on Sharma (1996) and Gnanadesikan (1997), the grouping process can be synthesized in five stages.The first is the choice of the dissimilarity measure, which in this case was Euclidean distance.The following step is the choice of grouping method (i.e., hierarchical or non-hierarchical), which in this case was the hierarchical method.The third step is the choice of grouping type for the method chosen, which was Ward's method, followed by the selection of the number of groups, which is a subjective choice as it varies with the value of the distance of dissimilarity.In our work, the number of groups was determined by transverse cuts in dendrograms based on the mathematical criterion for inertia (i.e., the distance between regions).The criterion was the condition where the number (N) of regions formed resulted in the greatest change in inertia in relation to the preceding n+1.The level of cutting in dendrograms was based on a sensitivity analysis, where the highest level of similarity was observed; finally, the resultant grouping could be interpreted.Thus, a cluster analysis for annual average rainfall was applied to the entire historical series  and to a subset of the historical series (i.e., the occurrence of El Niño and La Niña).After determining the number of groups that developed with their respective data (i.e., longitude, latitude and value for precipitation), a map of homogenous regions in the state of Pará was obtained.When studying rainfall, some probability distribution models have been used to estimate the probability of the occurrence of precipitation at a given frequency (i.e., daily, monthly, and yearly), as well as to verify the behavior and variability of rainfall.These models are used to quantify the probability of occurrence (%) for a certain amount of rain, which contributes to the planning and management of water resources in flood control and irrigation projects.The probability distribution models that were used in this study were the Gumbel, normal and exponential models, which are were extensively analyzed in the classic hydrology literature.Figure 3 summarizes the methodology deeloped.

Results and discussion
Figure 4 shows the results of the clustering in relation to precipitation in the state of Pará (Gonçalves et al., 2016) and the location of the rain gauge stations used for validation of the probability models.Southwest regions and a small region in the northeast have lower precipitation values for all of the cases (i.e., the complete series, El Niño and La Niña), as represented by region A, with minimum values varying from 1,615 mm year -1 for El Niño (Figure 4b) to 1,650 mm year -1 for La Niña (Figure 4c).The minimum value for the complete sample (Figure 4a) is 1,625 mm year -1 .The northeast had higher values, with maximum rainfall for La Niña equal to 3,400 mm year -1 .
The probability models were fitted for the determination of homogenous regions.The chisquare test (X²) was used to evaluate the adjustment of the probability density as calculated by the Gumbel, normal and exponential functions for annual average precipitation.For the application of the chi-square test, the number of degrees of freedom for the exponential and normal model was set equal to one, and the Gumbel model was set equal to two.Thus, given a significance level of 5%, the tabulated values of the X² test were 3.84 for normal and exponential functions and 5.99 for the Gumbel function.Table 1 shows the result of the chi-square test application to evaluate the adjustment of homogeneous precipitation regions A through F (i.e., series containing all years) to determine precipitation average annual values.In Table 1, all of the regions showed a good consistency with the normal probability model, and the calculated values of X² were lower than 3.84 (tabulated X²).According to Table 1 and the literature (Amin, Rizwan, & Alazba, 2016), average annual precipitation values and average monthly precipitation are best simulated with the normal distribution function.Figure 5 shows the frequencies of exceeding the annual average rainfall, which was observed and estimated for the three sequences of analyzed data (i.e., for every year, during El Niño years and during La Niña years); the figure graphically shows the best-fit normal function for the observed frequencies, (except for region C and El Niño years).
For validation, 12 rainfall stations were used and treated as target stations (Table 2), which were distributed according to the homogeneous regions that were obtained by the cluster analysis (Figure 4).These stations have more recent rainfall data than those that were used for regionalization.Further validation is important because the regionalization and fit of the probability model derives from estimated satellite data.The normal distribution function was best fit in relation to precipitation and is used for validation.Figure 6 graphically shows the fit between the observed and the simulated exceedance frequency curves, and Table 3 shows the results of the normal probability model and the calculated values of X².It is observed that for some cases, the model does not estimate the frequencies of the exceedance of rainfall well.A plausible explanation for this result is the low data density that is observed during El Niño and La Niña events.When considering all of the year only in region A, station Fazenda Cumarú do Norte does not have much data availability, and the normal model is less representative of average annual rainfall.For this same station, it is impossible to apply the methodology for validation during El Niño years because the time series is comprised of only two years of occurrence (missing in Figure 6).This result demonstrates the lack of monitoring of hydrological data in the Amazon, which stimulates studies such as this one.

Conclusion
The Ward method was effective in the spatial representation of six homogeneous regions when considering annual mean precipitation during El Niño and La Niña years, as well as the complete dataset, in the state of Pará.The methodology was important for the determination of homogeneous regions and evaluation of the occurrence probability of rainfall during El Niño and La Niña years.These results will help to support water resource planning in the context of climate change in regions without available data, especially in the Amazon, which, due to its size and conservation areas, faces challenges in attaining hydroclimatic monitoring services.

Figure 1 .
Figure 1.Geographic location of the state of Pará.

Figure 3 .
Figure 3. Scheme of the methodological process adopted.

Figure 4 .
Figure 4. Homogenous rainfall regions in the state of Pará, with precipitation isosurfaces in millimeters and the location of rain gauge stations for model validation

Figure 5 .
Figure 5. Observed exceedance frequencies and the normal probability model of homogeneous regions for the complete series, El Niño years and La Niña years (model fit).

Figure 6 .
Figure 6.Observed exceedance frequencies and normal probability model validation of homogeneous regions for the complete series, El Niño years and La Niña years.

Table 1 .
Chi-square test for the normal function for average annual precipitation (model fit).

Table 2 .
Rain gauge stations used for validation.

Table 3 .
Chi-square test validation for the normal function for average annual precipitation.