Seasonal variability of maximum daily rainfall in Campinas , State of São Paulo , Brazil : trends , periodicities , and associated probabilities

The extreme rainfall events have received especial focus in the climate literature due to their potential for causing soil erosion, runoff and, soil water saturation. Thus, the aims of the study were (i) to evaluate the presence of trends, temporal persistence and periodical components in the seasonal maximum daily rainfall values (Preabs) obtained from the weather station of Campinas, State of São Paulo, Brazil (1890-2012) and, (ii) to verify the possibility of using the General Extreme Value distribution (GEV) for modeling the probability of occurrence of these extreme rainfall events. The spectral analysis carried out on the time-frequency domain has shown no significant periodicity associated with the variance peaks of the time series under analysis. Based on parametric and non parametric methods and also considering the significance levels usually adopted in the scientific literature (10 and 5%), the Preabs values showed no significant climate trend. The results obtained from qualitative and quantitative goodness-of-fit procedures pointed out that a stationary-GEV model, with time-independent parameters, may be used to describe the probabilistic structure of this meteorological variable.


Introduction
Extreme rainfall events have received special attention in climate literature due to their potential for causing major impacts on several human activities.As noted by Hay (2007), soil erosion, disruption to essential agricultural activities, soil moisture saturation, and runoff are some agricultural impacts from heavy rainfall.In addition, according to Wilks (2006) extreme-value statistics is of great interest because physical processes generating extreme events are often unusual.
Based on the Extreme Value Theory, Coles (2001), Pujol et al. (2007), Nadarajah and Choi (2007), El Adlouni et al. (2007), Beijo et al. (2009) Furió and Meneu (2010) and Cannon (2010), used the General Extreme Value distribution (GEV) for modeling maximum daily rainfall values obtained from several parts of the world.According to the Extremal Type Theorem given the constants {b n and a n > 0} such that Pr{[(M n -b n )/a n ]≤z} G(z), where G is a nondegenerate distribution, then this function belongs to one of the extreme value distributions (Gumbel -Type I-, Fréchet-Type II-and, Weibull-Type III).The independent and identically distributed Mn corresponds to the maximums of the process observed over t time units of records (COLES, 2001).In addition, the GEV has all the flexibility of these three particular types of extreme distributions (I, II or, III) (COLES, 2001;EL ADLOUNI et al., 2007;NADARAJAH;CHOI, 2007;WILKS, 2006).Moreover, according to Coles (2001) the GEV function joins these three extreme value distributions into a single equation.Importantly, the dependence (or autocorrelation) among the Mn observations may be characterized by the persistence phenomenon (KHALIQ et al., 2006).As described in this last study the presence of a significant persistence means that the value of Mn t depends on the value of Mn t-i (i is an integer value equal to or greater than 1).
Blain (2011a) stated that the stationary-GEV distribution can be used to assess the probability of occurrence of maximum daily rainfall totals obtained from the weather station of Campinas.However, this last study is based on the block maxima approach, in which the blocks are considered equal to one year.Consequently, although this last procedure removes the influence of serial correlations (or autocorrelations) over the estimates and prevents the selection of an arbitrary threshold, it also leads to a large loss of information (or data).Besides that, this loss of information is significant for climatic analyses and casts doubts on the statement of no trends and no periodical components indicated by Blain (2011a) for the extreme rainfall events of the aforementioned location.
Another difficulty associated with the application of the Extreme Value Theory is that the data within the blocks may not have been designed from the same distribution (WILKS, 2006) because each one of these data may be generated from different physical processes.However, according to Wilks (2006), this difficulty do not invalidates this function as a candidate distribution to describe the extremes statistics, and empirically the GEV is 'often an excellent choice even when the assumptions of the Extreme Value Theory are not met'.Also, no information on the physical process that has generated the extreme rainfall events of Campinas is available (especially for the beginning of the series).Yet, the decision of taking seasonal values makes (at least) more plausible the assumption that individual blocks have a common distribution within each seasonal series.
Finally, MacDonald and Phillips (2006) have indicated that the analysis of historical rainfall records provides an opportunity for improving the knowledge on the process that controls this meteorological element.Also in agreement with MacDonald and Phillips (2006), this analysis places recent observations into a longer historical context.In addition, according to Hayhoe (2007) investigating climate trends on regional scales is essential for understanding the potential impacts of the current global climate change.
In this way, fitting a GEV distribution to the seasonal maximum daily rainfall values, obtained from the weather station of Campinas (1890Campinas ( -2012)), adds important information to the regional climate literature.And, evaluating the presence of trends and other non random components is essential for characterizing the probabilistic structure of a given variable (BLAIN, 2011b).Consequently, fitting a parametric model to these seasonal datasets may add important information related to regional impacts on the global climate change.This meteorological series can be considered one of the longest continuous data records in Brazil, justifying the importance of evaluating the presence of trends and periodical components in this dataset.
This study aimed (i) to evaluate the presence of trends, temporal persistence and periodical components in the seasonal maximum daily rainfall values (Campinas, State of São Paulo, Brazil;1890-2012), and (ii) to verify the possibility of using the stationary-GEV function for modeling the probability of occurrence of these events.

Material and methods
Data of seasonal maximum of daily rainfall were obtained from the weather station of Campinas, State of São Paulo, Brazil, between March 1890 and February 2012.It was assumed that the months of (i) December-January-February belong to the Summer; (ii) March-April-May belong to the Fall; (iii) June-July-August belong to the Winter and; (iv) September-October-November belong to the Spring.As described by Vicente and Nunes (2004) this weather station is situated at the boundary of the Tropic of Capricorn (22º54'S; 47º05'W; 669 m).According to Blain (2009), the monthly rainfall series observed in this location is considered as coming from a 2-parameter gamma distribution.The shape and scale parameters of these probability functions vary, respectively, from 0.9 and 35 (in July; a strongly skewed distribution similar to observed in arid climate) to 6 and 40, (in January; a bell-shaped distribution similar to observed in equatorial climate).All statistical tests were performed at 5% significance level.The probability of occurrence of a type I error was described by the p-values.
As pointed out by Maia et al. (2007), by fitting a cumulative distribution function (cdf) is only appropriate if the time series is not significantly autocorrelated.The autocorrelation function (ACF) was used in order to verify if the data sample can be considered free from temporal persistence.The coefficients of the ACF were estimated as described in Wilks (2006) from lags 1 to 12 (3 years; the ACF was applied to a single series that comprises all Preabs values sorted into chronological order).The run test (BEIJO et al., 2009) was applied to each seasonal Preabs series.In this last case, the time span between two subsequent values is approximately one year (e.g. two consecutive summers).Thus, while the run test was used to check serial correlation within each seasonal series, the ACF was used to measure the strength of dependence among Preabs values sorted into chronological order.The Mann-Kendall (MK) test (KENDALL; STUART, 1967) was used to evaluate the presence of trends in Preabs values.The null hypothesis (Ho) associated with this widely used test assumes that the sample is free from trends (the absence of significant serial correlation is also assumed).
The GEV is a three parameter function in which the probability of occurrence of an extreme event [Pr(x)], observed in any time (t), described as Pr{x ≤ z t }=GEV(z t ; μ, σ, ξ).The Greek characters represent, respectively, the parameters of location, scale, and shape.As described in Delgado et al. ( 2010), while μ defines the position of the GEV function with regard to the origin, σ defines the spread of the GEV distribution.The type I class of extreme value distribution (Gumbel) corresponds to the case ξ=0.The type II and type III correspond, respectively, to the cases ξ > 0 and ξ < 0 (COLES, 2001).The values of μ, σ and ξ define the probability of occurrence associated with each Preabs value.
Since the parameters of this distribution are time-independent, the use of the GEV(μ, σ, ξ) model is frequently called stationary approach.In this view, according to Coles (2001), Pujol et al. (2007), Nadarajah and Choi (2007), El Adlouni et al. (2007) and Cannon (2010), if a significant trend is detected in a meteorological dataset (composed of extreme values), the assumption that the probabilistic structure of this series does not change over time may no longer be assumed to be valid.As highlighted by Coles (2001), an extreme value analysis aims to quantify the stochastic variability of a process at unusually large levels.The stationary-GEV function is described as: The cumulative GEV distribution is obtained by integrating the equation 1.
The parameters of the equation 1 were estimated using the maximum likelihood method (MLE) as described in Coles (2001), Pujol et al. (2007), Nadarajah and Choi (2007), Furió and Meneu (2010).Hereafter, these estimates are represented by the italic font of each parameter; μ, σ, ξ and, for the sake of brevity, are referred to as parameters.
The results obtained from the (non parametric) MK test were compared with those obtained from the likelihood ratio test (Λ*).This parametric test compares the likelihood associated with H 0 with those associated with H A when the parameters obtained from these two hypotheses have been estimated through the MLE (WILKS, 2006).Herein, the three parameters obtained under H 0 were those obtained from the years between 1890 and 2011(2).The six parameters obtained under H A were those obtained from the sub-periods of 1890-1950(1) and 1951(2)-2011(2).By not rejecting H O it is assumed that the two sub-periods were designed from the same GEV distribution (WILKS, 2006).Under practical applications, the acceptance of H 0 validates the assumption that the probability structure of the meteorological series does not change over time (BLAIN, 2011b).The Λ* is calculated as follow: Following Torrence and Compo (1998), the wavelet analysis was used to decompose the Preabs time series into time-frequency space.This spectral analysis allowed (i) observing the variance peaks in the frequency domain and (ii) verifying how those peaks vary along time.Detailed explanation of the wavelet technique, including its statistical significance testing, is found in Torrence and Compo (1998).Following Blain (2009Blain ( , 2011a)), the wavelet function used in the present study was the Morlet wavelet.The computational algorithm used for calculating this method is available at http://paos.colorado.edu/research/wavelets(accessed on 21 st December, 2011).
The chi-square test (χ 2 ) and the Kolmogorov-Smirnov test (KS) are frequently used for verifying the fit of a given dataset to a parametric distribution.However, as pointed out by Wilks (2006) the χ 2 test works more naturally for discrete random variable, because to calculate it the range of the data must be divided into discrete classes.On the other hand, the KS test compares the empirical and the theoretical cumulative functions.Consequently, for continuous distributions, the KS test is often more powerful than the χ 2 (WILKS, 2006).In the present study, the Ho associated with the KS test assumes that the data was drawn from a stationary-GEV distribution.However, if (and only if) the parameters of the theoretical distribution have not been estimated from the same data used to evaluate the fit of the parametric distribution, the original algorithm of the KS test is applicable (STEINSKOG et al., 2007;VLČEK;HUTH, 2009;WILKS, 2006).Given that the parameters of the GEV were fitted using all available data, the KS test had to be modified.This adapted method will be referred to as Kolmogorov-Smirnov/Lilliefors test (KS-L).The statistical simulations required for calculating the KS-L test were based on the procedure called 'non-uniform random number generation by inversion'.It was generated Ns = 10000 synthetic data samples.The KS-L was performed at 5% significance level.Additional information on the KS-L can be found in Wilks (2006).According to Wilks (2006), although formal tests (such as KS-L) may indicate an inadequate fit, they 'may not inform the analyst as to the specific nature of the problem'.Furthermore, according to Sansigolo (2008), the KS-L is only appropriated for evaluating the central part of the distributions.Since this study deals with extreme rainfall amounts, it becomes evident that special attention should be given for the upper tail of the distributions.Thus, the quantile-quantile plots (QQ), as described by Wilks (2006), were used for comparing the observed data and the fitted distributions.By using the QQ plots we were able to verify how and where the parametric summary was not suitable.In agreement with Wilks (2006) the QQ plots can be seen as a qualitative assessment of goodness-of-fit.Figure 1 depicts the fundamental steps of this section.

Results and discussion
The results of the run test in each seasonal series [run=1.54(p-value = 0.12; Winter), run =1.13 (pvalue = 0.25; Fall), run =0.78 (p-value = 0.77; Spring) and, run = -0.96(p-value = 0.33; Summer)] and the Figure 2, have indicated that the Preabs series is free from temporal persistence.Thus, by following Maia et al. (2007) we accepted that a cdf summary of the Preabs series will result in loss of non-significant information due to the autocorrelation.The MK test has indicated the presence of a non-significant trend in the Preabs series (Figure 3).The p-values associated with the results of this trend test were always greater than the adopted critical level (0.05).
The lack of significant trends indicated by the MK results agrees with Blain and Moraes ( 2011 2009) a possible explanation for this last result is that the city of São Paulo 'has experienced a huge increase in the urbanization over the decades ' (SUGAHARA et al., 2009).
The non-significant MK results (Figure 3) also corroborate with Dufek and Ambrizzi (2008).In general, the results obtained by these authors (from 59 rain gauges) suggested an upward trend in rainfall intensity over the State of São Paulo .However, the maps of trends presented by Dufek and Ambrizzi (2008) indicated no significant trend in the region of the weather station of Campinas.The dashed line in Figure 4b represents the 95% confidence level.A given variance, at any (f) frequency level that crosses this dashed line indicates a statistical significance for such (T = 1/f) period (BEECHAM; CHOWDHURY, 2010).The lack of significant variance peaks in the spectral analysis, carried out in the frequency domain (Figure 4b), supported the hypothesis of no (significant) periodic components in the Preabs series.Furthermore, the wavelet power spectrum (time-frequency domain) shows concentration of energy only within restricted regions of the Figure 4a (WPS).For instance, within the 2-8 year band there is a considerable power from the beginning of the 1900's to the end of the 1930's.However, no other significant variance peak could be observed until the 1990's.
According to the Intergovernmental Panel on Climate Change (SOLOMON et al., 2007) the recent increases of the global air temperature have perceptible impacts on many natural systems.Under this statement, Figures 4a and c indicated that the location of Campinas is experiencing a period of greater variance that has started during the 1990's.Nevertheless, during the 1920/30's another period of greater variance was observed.Thus, the wavelet analysis has shown no (conclusive) long-term trend.
Considering the aims of this study, it is important to verify that the results obtained from this spectral analysis agree with those of the MK test.No conclusive sign of trend was detected in the present study.
Based on the described results, it was assumed that the Preabs series is free from temporal persistence, trends, and periodical components.Thus, we are now able to evaluate the possibility of using the stationary-GEV function for assessing the probability of occurrence of Preabs values.The KS-L test indicates that the Preabs series, obtained during the Summer, can be considered as coming from a stationary-GEV distribution.The results obtained through the KS-L test agree with those obtained from the QQ plots (Figure 6).Based on this qualitative goodness-offit procedure, the stationary-GEV model had its best performance when the summer and the winter seasons were evaluated.As described by Wilks ( 2006), a QQ plot for a fitted distribution perfectly representing the data would have all Cartesian points falling on the 1:1 diagonal line.In this view, all Cartesian points, obtained during the summer and the winter seasons were close to the 1:1 line.The greatest difference between the theoretical and the registered data was observed during the Spring [Cartesian point (145,118); Figure 6].By using the aforementioned parameters it can be estimated the cumulative probability (equation 2) for a certain Preabs value during a given season.After evaluating the QQ plot obtained during the Spring, future studies should verify the possibility of using another distribution (instead of the GEV) in order to achieve a better representation of this dataset.In this way, the procedures and the results described herein may be seen as a reference.

Conclusion
The seasonal maximum daily rainfall values, obtained from the location of Campinas , are free from temporal persistence, trends and periodic components.The spectral analysis indicated an increasing variability in the extreme rainfall events over the last years (after the beginning of the 1990's).However, we cannot assure that this feature represents a change in the climatic patterns of the location of Campinas.The same spectral analysis has also shown a similar period of greater variance from 1920 to 1940.The General Extreme Value Distribution, with timeindependent parameters, may be used to describe the probabilistic structure of this meteorological variable.

Figure 1 .
Figure 1.Fundamental steps of this section.

Figure 2 .
Figure 2. Autocorrelation function (ACF) applied to seasonal maximum daily rainfall values (Preabs).Campinas, State of São Paulo, Brazil (1890-2012).The ACF were applied to all Preabs values ordered into chronological order.The horizontal lines represent the white noise limits.
) and Sansigolo (2008).These authors indicated the presence of non-significant trends in the annual maximum daily rainfall values obtained in Campinas (1948-2007) (BLAIN; MORAES, 2011) and in Piracicaba-SP (1917-2006) (SANSIGOLO, 2008).However, despite the lack of significant trends in these two locations of the State of São Paulo, Sugahara et al. (2009) indicated that the high quantiles of daily rainfall observed in the city of São Paulo, São Paulo State, increased in magnitude as well as in frequency over the period of 1933-2005.Also according to Sugahara et al. (