The impacts of climate change on rainfall modeling in the Pantanal of Mato Grosso do Sul

The most significant and influential meteorological element in environmental conditions and human activities is precipitation. The objective of this study was to adjust eight probability distributions to monthly, seasonal and annual rainfall data in the Pantanal of Mato Grosso do Sul, Brazil, using a time series of data (1983-2013) by the National Meteorological Water Agency (ANA). The performance evaluation of different probability distribution models was assessed by the quality of fit of the selected probability distributions for precipitation data. Quality tests as chi-square, Kolmogorov-Smirnov (KS) and AndersonDarling (AD), the information criteria as Akaike (AIC) and the Bayesian criterion (BIC) were used. Then the mean root square error (RMSE) and the coefficient of determination (R2) were applied. The analyzes were made monthly, annually and by seasons. The 3-parameter Lognormal distribution performs the best for all twelve months and provides the best-fit to the monthly rainfall data. Thus characterizing a dry period that runs from May to September and a rainy period between the months of October and April, it was observed that the 3-parameter Lognormal distribution has best adjustment for spring and summer, and for winter and autumn the 2-parameter Gamma and 3-parameter Gamma distribution performed better. For annual observations, the function that best fits is 3-parameter Weibull distribution.


Introduction
Considered a national heritage site and a biosphere reserve, the Pantanal is the largest floodplain on the planet and is home to an invaluable ecosystem. Located in the center of South America (SA), the Pantanal is included in the continuous floodplain of the Upper Paraguay basin, which occupies 361,666 km 2 , where highly diverse flora and fauna are sustained by seasonal floods (Louzada, Bergier, & Assine, 2020). Its territory has approximately 160,000 km 2 divided between Paraguay, Bolivia and Brazil (Silva & Abdon, 1998;Junk & Cunha, 2005;Teodoro et al., 2016). The Brazilian territory occupies about 40% of this basin. The Brazilian Pantanal is divided into 7 municipalities in the state of Mato Grosso (MT) -(35%) and 9 municipalities in the state of Mato Grosso do Sul (MS) -(65%) (Silva & Abdon, 1998).
Quantitative data are lacking as input into environmental planning, as well as various types of socioeconomic and socio-environmental modeling methods, which can be combined with ecological or hydrological data in integrated approaches to connect human systems to environmental indicators, mainly in Protected areas (PAs), where PAs are fundamental for biodiversity conservation, yet their impacts on nearby residents are contested (Gray et al., 2018;Santiago, Correia Filho, Oliveira-Júnior, & Silva Junior, 2019;Naidoo et al., 2019).
Recent research has begun to report the interrelationships between economic indicators, livestock and a changing climate (Araújo et al., 2018;Bergier et al., 2019), but socio-environmental/rainfall issues beyond livestock are still less understood. There is also considerable literature on environmental planning in the Pantanal, for example, several government reports describing strategies for environmental conservation and economic development (Oliveira, 2002), of which the Conservation Plan for the Upper Paraguay Basin (O PCBAP, in Portuguese), is probably the most relevant (Brasil, 1997). This report identified some of the main challenges of environmental management in relation to the Pantanal, such as erosion or pollution of water courses with sediments, and sought to recommend appropriate economic development strategies. In this sense, the report is considered a precursor to ecological-economic zoning, recommended for spatial governance driven by data from the Pantanal (Schulz et al, 2019).
The objective of this study is to model the historical rainfall series to determine the probability distribution that best fits the rainfall data in the Pantanal region, Mato Grosso do Sul, in central-western Brazil.

Study Area
The Pantanal is located between the parallels 16 and 22° S longitude and the meridians 55 and 58° W longitude, occupies an area in the Central West region of Brazil of 189,000 km 2 , and is divided between states MS (65%) and MT (35%), according to the Geographical Dictionary of Brazilian Institute of Geography and Statistics (Instituto Brasileiro de Geografia e Estatística [IBGE], 2010), with an average altitude of 110 m and a slope: from 6 to 12 cm per kilometer in the east-west direction and from 1 to 2 cm per kilometer in the north-south (N-S) direction ( Figure 1).
In the state of MS, the following municipalities belong to the Pantanal of MS: Anastácio, Aquidauana, Bodoquena, Corumbá, Coxim, Miranda, Porto Murtinho and the main rivers that descend from the plateau to the plains are: from north to south, Paraguay, Bento Gomes, Cuiabá, São Lourenço -Itiquira, Taquari, Negro, Aquidauana -Miranda, Nabileque and Apa.

Climate in the Pantanal
The climate of Pantanal is classified as the "Aw" type or tropical with dry winter season (according to the Köppen-Geiger classification), with two defined seasons: i) dry (May to September) and ii) rainy (October to April). The annual average precipitation is 1,400 mm. Precipitation is in the range 800-1,600 mm, but in some years it can reach up to 2,000 (Alho & Silva, 2012). During the rainy season, floods occur and river water floods the low-lying areas of the Pantanal. The water level is highest in January and February, during March the water level slowly decreases. In contrast, during the dry season, water levels are low and lagoons and swamps dry out (Campos, 1969;Teodoro et al., 2016). Alho, Lacher, and Gonçalves (1988) showed the different features of the Pantanal determined by the relief conditions and climatic interactions in addition to the strong influence of neighboring biomes, such as the Cerrado, Amazonia and the Bolivian and Paraguayan Chacos.

Historical series of monthly rainfall
The historical series of rainfall data in the Pantanal region, MS (Figure 1)used in this study span a 30 year period from 1983 to 2013. The present study are used the average monthly, seasonal and annual rainfall data. The historical data were obtained from the hydro-meteorological database of the National Water Agency (Agência Nacional de Água [ANA, 2008], available on the Hidroweb portal -Hydrological Information System (http://www.ana.gov.br/). The criteria for the use of the data series, it was considered only the use of consistent data, with a minimum of 15 years, not being admitted years with annual failure percentage greater than 10%.

Probability distribution functions
In order to describe the behavior of rainfall data at a particular area it is required to identify the distribution, which best-fit the data. In this study eight probability distributions namely 2-parameter Weibull (W2P), 3-parameter Weibull (W3P), 2-parameter Rayleigh (RA2P), 2-parameter Gamma (G2P), 3-parameter Gamma (G3P), 2-parameter Lognormal (LN2P), 3-parameter Lognormal (LN3P) and Maximum Gumbel distributions (GUM) are used to model the rainfall data in the in the Pantanal region. The probability density functions (pdfs), corresponding cumulative distribution functions (cdfs), domains and parameters for these distributions are presented in Table 1. It is known that several methods can be used to estimate the parameters of probability distributions, for example the maximum likelihood method, method of moments, least square method and method of Lmoments. In this study the commonly used maximum likelihood method is applied to determine the estimates of parameters of each probability distribution. For more information about the parameter estimation method see: Rao (1973); Montgomery and Runger (1991); Coles (2001).

Goodness of fit tests and model selection criteria
Assessing the performance of different probability distribution models is necessary to provide more accurate information about rainfall at a particular location. In this study, in order to assess the goodness of fit (GOF) of the selected probability distributions for rainfall data the goodness of fit tests such as the chisquared test, Kolmogorov-Smirnov test (KS) and Anderson-Darling test (AD) are first applied and next the information criteria such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) are used. Also, the root mean square error (RMSE) and the coefficient of determination (R 2 ) are applied. The goodness of fit tests are briefly described below. The chi-squared test, Kolmogorov-Smirnov test and Anderson-Darling test are used to decide if the rainfall data follow the specified distribution.

Chi-squared test
Chi Squared test is applied when the data are arranged into bins (subintervals). The test statistic is given by Where, is observed frequency for bin and is the expected frequency for bin . The expected frequency is calculated by where ̂( ) denotes the estimated cdf and are the lower and upper limits for bin . The expected frequency should be at least 5. The hypothesis that data follow specified distribution is rejected at the significance level if the test statistic is greater than the the critical value of the distribution, is the number of estimated parameters and is the number of bins.

Kolmogorov-Smirnov test
Kolmogorov-Smirnov (KS) test is based on the maximum difference between the sample cdf and the theoretical cdf. The KS test statistic is given by Equation (2): , ( Where, are observations in ascending order, so that The hypothesis that data follow specified distribution is rejected at the significance level if the test statistic is greater than the critical value of the KS test .

Anderson-Darling test
Anderson-Darling (AD) test is a modification of the KS test. This test is considered to be best a suitable GOF test because it gives more weight to the tails of the distribution than does the KS test. The AD test statistic is given by Equation (3): (3) The hypothesis that data follow specified distribution is rejected at the significance level if the test statistic A 2 is greater than the critical value of the AD test.

Akaike information criterion
The Akaike information criterion (AIC) is known as commonly used model selection criterion that is calculated based on the maximized value of the log-likelihood function for the estimated model (Akaike, 1974). The AIC can be calculated as follows (Equation 4): Where, is the maximized value of the log-likelihood function for the estimated model, is number of parameters to be estimated and is number of observed data.

Bayesian information criterion
The Bayesian information criterion (BIC) is another commonly used information criterion and it is closely related to the AIC (Schwarz, 1978). The BIC can be calculated as follows (Equation 5):

Coefficient of determination
The coefficient of determination (R 2 ) is utilized to measure the linear relationship between the observed and predicted probabilities of the distribution and is calculated as follows (Equation 6): and ( ) is the empirical distribution function, defined as follows where if and 0 otherwise.

Root mean square error
The root mean square error (RMSE) is based on the difference between the observed and predicted probabilities of the distribution and is calculated as follows (Equation 7): The smaller values of , KS, AD, AIC, BIC, RMSE and highest values of R 2 indicate the better fit of the theoretical distribution to the rainfall data than others.

Statistical properties of monthly, annual and seasonal rainfall
Tables 2 and 3 present the descriptive statistics include mean, standard deviation (SD), minimum, maximum, median, coefficient of variation (CV%), coefficient of skewness and coefficient of kurtosis of monthly, annual and seasonal rainfall data, respectively. It was observed, that month January has the highest mean rainfall with the value 200.72 mm, while July has the lowest mean rainfall with the value 21.41 mm. The highest mean rainfall was observed in summer season with the value 163.57 mm and the lowest in winter with the value 35.34 mm. In general, in the spring and summer the highest maximum values of the mean rainfall, maximum, median and standard deviation were observed, similar to the results obtained previously by Teodoro et al. (2015) and Teodoro et al. (2016). It was found that the values of coefficient of skewness are positive for all months and seasons, that indicate that all distributions are positively skewed to the right. The skewness for eight months are greater than 1, thus can be regarded the data as highly positively skewed, and three months lie within range 0.5-1, thus the data are moderately positively skewed. The coefficient of kurtosis is in the range 0.63-7.14. The months with high kurtosis are March to September, that is, rainfall data sets tend to have heavy tails, or outliers. The skewness for autumn and winter are greater than 1, that indicate the data are highly skewed to the right too. Furthermore, the coefficient of kurtosis is in the range 0.93-6.88. In winter and autumn the highest values of coefficient of kurtosis were observed (especially in winter), indicating a positive leptokurtic series in the heavy tail. The maximum value of CV% was observed in the winter with the value 113.34%, and furthermore, the highest value is in month August with the value 131.90%, which indicate a large fluctuation in the rainfall data in this season and month, respectively. The distribution of rainfall in the Pantanal region via boxplot showed high variability in the time series in both seasonal and monthly scales, mainly in relation to the median and interquartile range - (Figure 2a). The registered monthly accumulations were highly variable, with emphasis on the summer and winter seasons (Figure 2b). The rainfall producing systems that contribute to the total rainfall recorded in the Pantanal are: the Frontal Systems (FS), South Atlantic Subtropical Anticyclone (SASA) and Upper Tropospheric Cyclonic Vortex -(UTCV), Chaco Low (CL), Bolivian High (BH), Low-Level Jet (LLJ) and South Atlantic Convergence Zone (SACZ) and Maden-Juliana Oscillation (MJO) - (Rao & Hada, 1990;Gan, Kousky, & Roupelewski, 2004;Teodoro et al., 2016;Teodoro et al., 2015). Tables 4 and 5 show the maximum likelihood estimates of the parameters of the eight pdfs for the monthly, seasonal and annual data, respectively.

Probability distributions of monthly rainfall
When modeling rainfall by several pdfs, we can conclude that there are several acceptable models. In this study, the selection of the most appropriate pdf is based on the total score from all GOF tests and model selection criteria. This approach is chosen in situations if it is not clear which model to choose, or it is difficult to choose the best model. The pdfs are ranked based on their performances measured by the results of the GOF tests and model selection criteria. The GOF tests and model selection criteria are ranked from one (bestfit) to eight (least-fit) for all pdfs. The smallest ranking score indicates the best-fit.
The values of GOF tests and model selection criteria for monthly rainfall are summarized in Tables 6(a, b, c, d, e, f, g, h, i, j, k and l). The ranks of the pdfs according to the GOF values are also presented in Tables 6(al). Table 7 presents best ranked pdfs using different GOF tests and model selection criteria for each month and finally, Table 8 presents the top three pdfs for monthly rainfall.
The results from Tables 6-8 shown that the LN3P, G3P and GUM are the top three pdfs for modelling the monthly rainfall data in the Pantanal region. The results show the flexibility and efficiency of the LN3P distribution. The LN3P performs the best according to chi-square test, AD test (except April, where performs the best GUM, but LN3P is second the best), RMSE and R 2 . The LN3P with minimum total rank performs the best for all twelve months and thus provides the best-fit to the monthly rainfall data in the Pantanal region.The G3P ranks the second best for seven months (January, February, June, August, September, November, December), and GUM ranks the second best for four months (March, April, May, October). These distributions can be considered as a possible alternative to modeling the monthly rainfall data in the Pantanal region.

Probability distributions of seasonal and annual rainfall
The values of GOF tests and model selection criteria and the ranks of the pdfs for seasonal and annual data are summarized in Tables 9, 10, 11 and 12. Table 13 presents the best ranked pdfs using different GOF tests and model selection criteria for each season and annual data. Table 14 presents the top three pdfs for seasonal and annual rainfall.
The results from Tables 9-11 and 14, 15 indicate that the LN3P provides the best-fit to the rainfall data for summer and spring. This distribution ranks first and performs the best according to all GOF tests and model selection criteria. Second the best for these two seasons is the G3P. For autumn performs the best G3P and for winter the G2P. These pdfs can be selected as the most suitable for modelling the rainfall for autumn and winter, respectively, while a second potential alternative for autumn is the LN3P and for winter the G3P. Figure 3 shows the histograms and eight pdfs fitted to the seasonal rainfall data. Graphically it can be observed that the LN3P provides the best-fit for summer and spring, whereas the LN2P provides the poorest fit. Similarly, the G3P and G2P fit the best for winter and autumn, respectively. Figure 4 shows the histograms and eight pdfs fitted to the annual rainfall data. Graphically it can be observed that the W3P provides the best-fit, the W2P fit the histogram to a lesser degree and the LN2P provides the poorest fit. For annual rainfall data the W3P, W2P and G3P are the top three pdfs. The W3P ranks first and performs the best according to four GOF, namely chi-square test, AD test, RMSE and R 2 . The W2P performs the best according the AIC and BIC, while the W3P performs the second best. The W3P can be considered as the most suitable pdf for modeling annual rainfall data in the Pantanal region.           Table 7. Best ranked probability distributions using different goodness of fit tests and model selection criteria for months.
Months    Summer  LN3P  LN3P  LN3P  LN3P  LN3P  LN3P  LN3P  Autumn  G3P  W3P  LN3P  G3P  W2P  LN3P  LN3P  Winter  G3P  LN3P  G2P  G2P  G2P  G3P  G3P  Spring  LN3P  LN3P  LN3P  LN3P  LN3P  LN3P  LN3P  Annual  W3P  G3P  W3P  W2P  W2P W3P W3P  One of the concerns about rain is the intensity and frequency of its occurrence, due to its potentially harmful effects, when in excess or due to scarcity. Thus, the knowledge of the probabilities of occurrence of rain becomes of paramount importance in planning related activities or in monitoring hydrological processes concerning hydrographic basins, being important for the planning of water resources and optimization of the calendar of agricultural activities (Santos, Blanco, & Oliveira Junior, 2019). In this context, the rainfall of a given location can be estimated, among other ways, in probabilistic terms, using theoretical distribution models adjusted to a historical series (Lyra, Garcia, Piedade, Sediyama, & Sentelhas, 2006;Teodoro et al., 2017).
In the Pantanal region has physical peculiarities such as terrain and vegetation that directly influence the spatial variability of rainfall, as well as geographic positioning, since it results from the coupling of several meteorological systems, and is strongly controlled by the Intertropical Convergence Zone (ITCZ), which results in rainfall in the southern summer ( Figure 2). The rainfall, on average, is 1200 mm per year, from October to April, the low convergence in southern SA associated with ITCZ results in the advent of air masses moist by northwestern winds derived from the Amazon Basin (Carvalho, Jones, & Liebmann, 2004;Vieira, Satyamurty, & Andreoli, 2013;Bergier et al., 2018). Rainfall variability is thought to have strong links with El Niño -Southern Oscilattion (ENSO); however, recent studies have also pointed to the occurrence of almost periodic events of heavy rains associated with the SACZ, which are driven by the South Atlantic Convergence and the MJO (Carvalho et al., 2004;Carvalho et al., 2011;Novello et al., 2018), as the Convective System, the South Atlantic Convergence, BH and FS (Gan et al. 2004;Teodoro et al., 2016;Teodoro et al., 2015;Teodoro et al., 2017). The influence of the highlands, located in the center-west and east, respectively, is perceived in the spatial distribution of rainfall. Relief-induced rains can potentially occur in these regions, making them rainier compared to locations in the same latitudes, without the influence of altitude.

Seasonality and annual cycles
The areas where the river springs are located have an erosive potential, but are protected by the natural cover of the plants. With the advent of deforestation, erosion has become more severe. In the north of the Pantanal, floods occur between March and April, while in the south they occur from July to August. Between November and March, there is intense water loss due to evapotranspiration (ET). The most intense rains occur from October to March. River flows (m 3 .s -1 ) have their inflow from January to April, with peaks in March and discharges occur from April to October, with peaks from June to July, measured in Porto Esperança, on the Paraguay river (Hamilton, Sippel, Calheiros, & Melack, 1997).
The maximum seasonal flooding in the region occurs between February and April and decreases, with peak dry weather between October and December. In addition to annual floods, there are variations over longer periods, with no defined pattern. Such floods are influenced by several factors, both on the macro and on the micro scale. The variations in the level of the river depend fundamentally on the characteristics of the rainfall each year.
In some years, there were strong floods, the different trends of rainfall show the climatic diversity of the Pantanal. However, climatic variability tends to be homogeneous for the biome. The years with the highest rainfall were 1974, 1976, 1982 and 1998, while the greatest droughts occurred in 1978, 2002, with 45.6% of monthly values above average and 54.4% of monthly averages below average. This hydrological seasonality, with an annual hydraulic pattern, is ecologically decisive for the survival of wildlife in the Pantanal.
According to a bulletin from the ANA (2008), in early 2008 the rainfall was above the historical average in practically the entire Upper Paraguay Basin (BAP). Total rainfall in January and February exceeded the average by 100% and 50%, respectively. The increase in local rainfall contributed to a rapid increase in water levels in rivers at the beginning of the flood period.
The climatic conditions alone are not sufficient to explain the differences observed in the Paraguay River regime and in some of its tributaries. The complexity of the hydrological regime of the Paraguay River is related to the smooth slope of the terrain comprised by the plains and swamps of MT (between 50 and 30 cm km -1 in the east-west (E-W) direction and 3 to 1.5 cm km -1 from N to S). It is also due to the extension of the area, which periodically remains flooded with a large volume of water. The winding course of the river and the countless geographical features of the flooded plains contribute to the slow flow of water.

Conclusion
To evaluate the performance of different probability distributions, and identifying the distribution of more appropriate probability for rainfall data in the Pantanal region, it was observed that the best fit was the distribution lognormal 3 parameters (LN3P) for spring and summer; for winter and autumn, the 2-parameter Gamma (G2P) and 3-parameter Gamma (G3P) distributions achieved better performance. For annual observations, the function that best fits the rainfall data is the 3-parameter Weibull distribution (W3P). The 3-parameter Lognormal distribution (LN3P) provides the best fit for monthly precipitation data.