Is the Conditional Density Network more suitable than the Maximum likelihood for fitting the Generalized Extreme Value Distribution ?

The Generalized Extreme value Distribution (GEV) has been widely used to assess the probability of extreme weather events and the parameter estimation method is a key factor for improving its quantile estimates. On such background, this study aimed to indicate under which conditions (sample size and tail behavior) the Conditional Density Network (CDN) leads to better GEV quantile estimates than the widely used Maximum likelihood method (MLE) does. With Monte Carlo simulations and rainfall series of several Brazilians regions, we highlight the following results: the return period and the tail behavior of the GEV (specified by the shape parameter) are two of the main factors affecting the quantile estimates. For -0.1 ≤ shape ≤ 0.1 and sample size ≤ 50, the CDN outperformed the MLE. For shape ≥ 0.20 the CDN outperformed the MLE for all sample sizes (30-90). The results also suggested that the CDN is more suitable than the MLE for fitting the GEV parameter to the Brazilian extreme rainfall series. We conclude that when the shape parameter are equal to or greater than -0.1 the CDN should be preferred over the MLE.


Introduction
The Extremal Types Theorem states that the highest values of independent and identically distributed data converge to one of the three types of extreme value distributions: Gumbel (type I), Fréchet (type II) or Weibull (type III).Given that the General Extreme Value Distribution (GEV) is capable of representing these three types of extreme distribution into a single equation, several meteorological studies (MARTINS; STEDINGER, 2000;COLES, 2001;EL ADLOUNI et al., 2007;FELICI et al., 2007;BROWN et al., 2008;CANNON, 2010;DELGADO et al., 2010;ZWIERS et al., 2011;WILKS, 2011;ANDRADE et al., 2012) have used the GEV to assess the probability of extreme weather events in virtually all parts of the Globe.
Different methods may be used to estimate the parameters of the GEV.Among them, the Maximum Likelihood Estimation (MLE) is one of the most used (WILKS, 2011).However, in spite of this widespread use, the MLE may perform poorly for small sample sizes (MARTINS; STEDINGER, 2000;EL ADLOUNI et al., 2007;WILKS, 2011; among many others).As pointed out by several studies (e.g.COLES, 2001;EL ADLOUNI et al., 2007;WILKS, 2011), the likelihood function of the GEV has no analytical solution, hence numerical optimization methods must be used to calculate the MLE-GEV parameters.When applied to small sample sizes, these numerical optimizations may lead to absurd shape parameters values (MARTINS; STEDINGER 2000; among many others).
In order to overcome the above-mentioned drawback, Martins and Stedinger (2000) recommended to use a Bayesian prior distribution on the numerical optimization process.This latter procedure, referred to as Generalized Maximum Likelihood method (GMLE), restricts the shape parameter estimations to physically reasonable values (MARTINS;STEDINGER, 2000).This feature is of particular interest given that the shape parameter defines the tail behavior of the GEV distribution.Therefore, the GMLE procedure allows the numerical optimizations to be applied to small sample sizes (MARTINS; STEDINGER, 2000;EL ADLOUNI et al., 2007).
After the studies of Martins and Stedinger (2000) and El Adlouni et al. (2007), Cannon (2010) proposed using a flexible nonlinear model to estimate the parameters of the GEV.As recommended by this latter study, which is strongly recommended reading, the parameters of the GEV may be calculated by means of a probabilistic extension of the Multilayer Perceptron Neural Network referred to as Conditional Density Network (CDN).From the CDN, the parameters of the GEV (GEV-CDN) are calculated following the GMLE method proposed by Martins and Stedinger (2000).In other words, under the CDN framework, the numerical optimization process also restricts the shape parameter to physically reasonable values.This feature allows one to suppose that the CDN is more suitable than the MLE for estimating the parameters of the GEV from small sample sizes.
Regardless the advantages and drawbacks of each parameter estimation procedure, the idea behind of studies, such as Martins and Stedinger (2000), El Adlouni et al. (2007) andCannon (2010), is to improve the quantile estimates obtained from the GEV.More specifically, the main idea is to decrease the human vulnerability to extreme (hydro/agro) meteorological events by improving the GEV parameter estimates for each region or case of interest.However, to the authors' best knowledge, there is no study comparing the performance of the GEV-MLE to the GEV-CDN.Such studies would be of particular interest given that several parts of Brazil (including both rural and urban populations) are particularly exposed to weather extremes (IPCC, 2013;VÖRÖSMARTY et al., 2013).
On such background, this study aimed to indicate under which conditions (sample size and tail behavior) the CDN leads to better quantile estimates than the MLE does.

Material and methods
The GEV is a three parameter distribution, in which the probability of occurrence of an extreme event [Pr(x)] is described as Pr {x ≤ zt} = GEV (zt; μ, σ, ξ).The Greek characters are, respectively, the location, scale and, shape parameters.While μ specifies the position of the GEV function with respect to the origin, σ specifies the spread of the GEV distribution (DELGADO et al., 2010).The shape parameter describes the tail behavior of the GEV defining which types of extreme distribution (I, II and III) the GEV is representing.The type I class of extreme value distribution corresponds to the case ξ = 0.The type II and type III correspond, respectively, to the cases ξ > 0 and ξ < 0 (COLES, 2001).The probability density function (pdf) of the GEV is described in the Equation 1: Parameter estimation procedure The principle of the MLE is to find the parameter values that lead to the highest probabilities to the observed data.Thus, the MLE estimates for the GEV are obtained by specifying the parameter values that maximize the following likelihood function (Equation 2). (2) As pointed out by several authors, it is frequently more convenient to work with the log-likelihood function (Equation 3), which presents its maximum at the same point as Equation 2. (3) As suggested by its own name, the GMLE is based on the same principle as the MLE.However, the GMLE method presents an additional constraint that avoids physically invalid values of the shape parameter (MARTINS; STEDINGER, 2000;EL ADLOUNI et al., 2007).Based on practical considerations, Martins and Stedinger (2000) proposed using the Beta distribution as a constraint for the shape parameter values.
As previously described, the CDN specifies the GEV parameters by means of the GMLE approach using the conditional density network.However, Cannon (2010) recommend setting c 1 to 2.0 and c 2 to 3.3 (Equation 4), so that the shape parameter may vary in a wider interval with approximately 90% of its probability mass situated between -0.4 and 0.2 (mode ≈ -0.2).Under the stationary approach, the CDN-GEV estimation procedure has three outputs corresponding to the μ, σ, ξ parameters of the GEV distribution (CANNON, 2010).As described by this latter study, the neural network architecture of the stationary GEV-CDN may be represented by Figure 1.

Monte Carlo simulation
The performance of each parameter estimation procedure for several sample sizes and shape parameter values (representing distinct tail behaviors) was firstly evaluated by means of a Monte Carlo Experiment divided into three steps.The first step was based on Equation 4 extracted from the GEVcdn-package, according in the Equation 5(R DEVELOPMENT CORE TEAM, 2009;CANNON, 2011).
( ) The location and scale parameters were respectively set to 0 and 1 without loss of generality (SHIN et al., 201 1;HEO et al., 2013).The shape parameter was left to vary from -0.4 to 0.4 by discrete steps of 0.1.The sample size (n) was left to vary from 30 to 90 by discrete steps of 10.It is worth mentioning that n = 30 is the length of record required for obtaining climatological normal values.10,000 trials were generated from function 1 for each set of parameters and sample sizes.
The third step consisted of fitting the GEV-MLE and GEV-CDN to each random sample generated in step 1 and calculating the quantile estimates for each fitted model by means of Equation 6.The root mean squared error (RMSE) of each quantile estimate was calculated with respect to the corresponding true quantile as suggested by Martins andStedinger (2000), El Adlouni et al. (2007) and Cannon (2010).The RMSE, described in several studies including Camparotto et al. (2013), can be regarded as a measure for the magnitude for the errors of each quantile estimates (WILKS, 2011).
As a study case, the parameter estimation procedures were applied to the annual daily maximum rainfall series of 26 Brazilian locations (Figure 2).The RMSE of each quantile estimate was calculated with respect to the corresponding observed (empirical) quantile and, as for the Monte Carlo Simulations, it was used to compare the performance of the GEV-MLE to the GEV-CDN.

Regardless
the parameter estimation procedure, the Monte Carlo simulations (Figure 3) indicated that the return period is one of the main factors affecting the quantile estimates obtained from the GEV distribution.For a particular ξ and n value, the RMSE increases as the return period increases.This statement is consistent with the results found by Martins and Stedinger (2000), Coles (2001), El Adlouni et al. (2007), Blain (2014), Blain and Meschiatti (2014) and indicates that the magnitude of the errors of the quantile estimates increases as the return period of an extreme event increases.The results in Figure 3 also indicate that the ξ value is another key factor affecting the quantile estimates.For a particular sample size and return period, the performance of both methods (CDN and MLE) varies with ξ (the RMSE values tend to increase as ξ increases).Therefore, by considering that the same feature was observed by Martins and Stedinger, (2000) for the MLE, we may assume that, regardless the parameter estimation procedure, the errors of each quantile estimates increases as ξ increases.These results are particularly important because, as previously described, the three different extreme distributions types (I, II or III) that the GEV is capable of represent are specified according to the ξ value (COLES, 2001; WILKS, 2011; among many others).Considering the notation adopted in this study, negative values of ξ leads to the type III (Weibull) distribution (a bounded upper tail probability density function).ξ = 0 leads to the type I (Gumbel) distribution (an unbounded upper tail probability density function, in which the tail decreases exponentially).ξ >0 leads to the type II (Fréchet) distribution (an unbounded upper tail probability density function, in which the tail decreases as a polynomial, i.e. the cumulative distribution function slowly converges towards to 1 (EL ADLOUNI et al., 2007;GILROY;MCCUEN, 2012;among many others).For this latter type, the probability of extreme values is higher than those obtained from a type I or III distribution.For instance, the 90 th quantile estimate obtained from a GEV distribution with μ = 0; σ = 1 and ξ = -0.4(type III) is ≈ 1.5, while the same quantile obtained from a GEV distribution with μ = 0; σ = 1 and ξ = +0.4(type II) is ≈ 3.65.Thus, the results in Figure 3 indicate that the errors of the quantile estimates are higher for the type II distribution than for the types I and III.This last statement is of particular interest for meteorological studies since the type II is the most common for hydrological data (GILROY;MCCUEN, 2012).
The Monte Carlo simulations also indicated that the supposition that the CDN is more suitable than the MLE for estimating the parameters of the GEV from small sample sizes were met when ξ was set to values ranging from -0.1 to 0.1 and n to values equal to or lower than 50.By considering the same range of ξ values, both methods presented similar performances when n was set to values equal to or greater than 60.For ξ ≥ 0.20, the CDN outperformed the MLE for all sample sizes and for ξ ≤ -0.20, the MLE outperformed the CDN for all sample sizes.

Case of study
Before evaluating the results in Table 1, it is worth mentioning that no remarkable difference was found between the length of time (running time) of GEV-MLE and GEV-CDN.Both methods were able to fit the GEV parameters in less than 5 seconds for each location.However, we are aware that these running times vary according to each computer processing capability.Finally, it is also worth emphasizing both GEV-MLE and GEV-CDN can be easily computed by means of R-software codes available at https://www.r-project.org.
As expected, the AD test indicated that both GEV-MLE and GEV-CDN can be used to assess the probability of extreme rainfall data.However, as can be observed from Table 1, the CDN outperformed the MLE for 23 of the 26 rainfall series.By considering that the majority of the rainfall series presents sample sizes lower than 60 and 24 out of the 26 rainfall series presents ξ greater than -0.15, this result is consistent with those obtained from the Monte Carlo simulations.In addition, the shape parameter estimates obtained from the CDN are more consistent than those obtained from the MLE with statement that extreme rainfall series are more likely to produce positive shape parameter (~ 0.0 to 0.2; FOWLER et al., 2010;GILROY;MCCUEN, 2012).As can be noted (Table 1), while the CDN produced 14 positive values of the shape parameter, the MLE produced only seven positive values.Therefore, we assume that the results of Table 1 suggest that the CDN is more suitable than the MLE for fitting the GEV distribution to the rainfall series obtained from the Brazilian locations.Finally, the results listed in Table 1, along with those of the Monte Carlo simulations, indicate that the CDN method may be used to evaluate the probability of extreme events at a regional scale.

Conclusion
When the shape parameter of the Generalized Extreme Value Distribution is greater than -0.2, the Conditional Density Network should be preferred over the Maximum likelihood method for quantile estimates.Otherwise, the Maximum likelihood method should be adopted.These two statements hold for sample sizes ranging from 30 to 90.

Figure 2 .
Figure 2. Meteorological Weather Stations.The Anderson-Darling (ANDERSON; DARLING, 1952) described in several studies including Blain (2013) was used to assess the fit of the GEV-MLE, and GEV-CDN to the series because it is better than other tests (e.g.Akaike Information Criterion and Bayesian Information Criterion) in recognizing a 3-parameter parent distribution such as the GEV (HADDAD; RAHMAN, 2011).Further details on this goodness-of-fit test can be found in Shin et al. (2011), Heo et al. (2013) and Blain (2013).The RMSE of each quantile estimate was calculated with respect to the corresponding observed (empirical) quantile and, as for the Monte Carlo Simulations, it was used to compare the performance of the GEV-MLE to the GEV-CDN.

Figure 3 .
Figure 3. Root mean squared error (RMSE) of quantile estimates obtained from the Generalized Extreme value Distribution for different return periods (10, 50, 100 and 1000 years).The quantile estimates were obtained from the Conditional Density Network (CDN) and Maximum likelihood method.

Table 1 .
Root mean squared error (RMSE) and parameter estimates obtained from CDN and MLE for annual maximum rainfall series of 26 Brazilian locations.