Stochastic evaluation of robust portfolios based on hierarchical clustering and worst-case scenarios

The objective of this paper is to present a proposal to form robust portfolios using a stochastic efficiency analysis of assets from companies in the Sao Paulo Stock Exchange, focusing on the worst market state. In order to do this, information about the market in all of its phases and information from low market periods were employed in a stochastic efficiency analysis using the Chance Constrained Data Envelopment Analysis method, along with a Hierarchical Clustering approach. Then, the portfolios underwent a capital allocation model to obtain the ideal participation of each share. The portfolios formed in both scenarios were analyzed and compared. The joint application of the approaches supplied with information about the worst market state was able to form robust portfolios that lead to a higher accumulated return in the validation period than portfolios optimized from information about the entire period, and still resulted in portfolios with smaller beta values.


Introduction
The Markowitz model was developed in 1952.More than sixty years later, the pioneering classic approach of average-variance is still the main model used in the practice of allocation of assets and portfolio management.It has led to the creation of new academic proposals (Zopounidis, Doumpos, & Fabozzi, 2014).As with investors and in the academic environment, the selection process for investments in risky assets remains a challenge for financial management.Charnes, Cooper, and Rhodes (1978) developed the concept of Data Envelopment Analysis, DEA.It is used to evaluate and compare organizational units that use multiple inputs to produce different outputs in a certain period of time (Kao & Liu, 2014).Data Envelopment Analysis is a non-parametric method, which stands out among quantitative modelling that help to make decisions.It is used by managers in many areas, including the financial area (Kao, 2014, Azadi, Jafarian, Saen, & Mirhedayatian, 2015, Rotela Junior et al., 2017).
This concept has been widely discussed and today new variations of the classic DEA models are being created.In some of these variations, uncertain and approximate reasoning can already be considered, as proposed by Azadi et al. (2015) in a DEA model with Fuzzy coefficients, or the model proposed by Sengupta (1987), which associated Chance Constrained Programming, CCP, proposed by Charnes and Cooper (1963) with the DEA model (Jin, Zhou, & Zhou, 2014, Rotela Junior, Pamplona, Rocha, Valerio, & Paiva, 2015).
According to Kim, Kim, and Fabozzi (2014), and Kim, Kim, Mulvey, and Fabozzi (2015), classic models for portfolio optimization such as those proposed by Markowitz (1952) and Sharpe (1963) cannot be considered robust, since they are very sensitive to small input variations.Thus, researchers and scholars have more recently started to incorporate uncertainties by estimating errors directly in the portfolio optimization process, using mathematical techniques for robust optimization (Fabozzi, Huang, & Zhou, 2010, Kim et al., 2014, Kim et al., 2015).
However, even though some studies such as Bertsimas and Sim (2004) have already confirmed a relationship between the increase of return from a worst-case portfolio and the increase of robustness, Kim et al. (2015) believe that robustness in models of robust portfolios is probably reached by betting systematically on information about worst-case market periods.That is, in the formation of a robust portfolio, bear markets (low volume trading periods) are more relevant than bull markets (high volume trading periods).Thus, information regarding returns of assets on days with worse performance are extremely important to create a robust portfolio (Kim et al., 2015).
The aim of this paper is to present a proposal for the formation of robust portfolios using a stochastic efficiency analysis of assets from companies in the Sao Paulo Stock Exchange, focusing on the worst state market.
This paper has as specific objectives the presentation and use of the Chance Constrained Data Envelopment Analysis model to reduce the search space, considering randomness and uncertainty in the variables, testing its applicability in optimizing portfolios.Then, the CCDEA model along with the Hierarchical Clustering system have been used to optimize a robust portfolio, using input data from low trading periods (a bear market) and comparing it to the same model supplied with complete information.Finally, it presents the importance of market worst-case information to reach a robust performance.

Chance constrained DEA
Most DEA models used in the literature are deterministic and do not consider random errors of input variables and outputs.According to Jin et al. (2014), generalized randomness in evaluation processes come from errors in data collection.Sengupta (1987)


(2) where: u 1 , …, u b , v 1 , ..., v a are weights to be estimated by the model.The symbols u q and v p represent the weights of multipliers related to the q-th output and p-th input, respectively.P r represents a probability and the superscript `^` indicates that ^ip x and ^iq y are random variables.For the constraints, the model formulates a proportion of being less than or equal to β i , which represents the expected efficiency level for the i-th DMU, which, according to Jin et al. (2014), has a variation [0,1] that is defined as an aspiration level.α i is a risk criterion referring to the utility of a decision maker.1-α i indicates the probability of reaching the demand, which is considered one level of confidence (Jin et al., 2014).As with β i , the risk criterion (α i ) is a value measured in the interval between 0 and 1.
In order to obtain a computer-viable model, the formulation must be rewritten according to the proposal by Charnes and Cooper (1963) 2014) argue that, since part of the stochastic disturbance indicates that errors come from data collection, it is natural to suppose that the random variable ξ follows a normal distribution (N (0,σ 2 )).
The equivalent deterministic formulation of the model should be derived in order to facilitate the resolution of the model.The formulation of the objective function, represented by Equation 1, can be rewritten as Equation 5: The constraints represented in Equation 2 and 3, including the stochastic process, can be rewritten in Equation 6 and 7: Equation 7 can be written in its equivalent form, according to Equation 8: (8) In which E i and V i indicate the mean and variance of each random variable.Thus, they can be represented according to Equation 9 and 10: In this manner, the random variable follows a normal distribution of average zero and variance one.Equation 8 can be presented as Equayion 11: 1 (1 ) 1, 2,..., The equivalent form is presented as Equation 12: Finally, the model of multipliers can be addressed, differentiating from the proposal of Jin et al. ( 2014), which deals with undesirable outputs in DEA stochastic model.In here, Φ represents a function of standard normal distribution, and Φ -1 is the inverse of the function.Thus, the original model could be reformulated as the linear programming model, the equivalent of which is presented as Equation 13, 14 and 16 (Rotela Junior et al., 2015): , , 0 Worst market state approach It has been reported that the correlation between financial return of assets increases in a bear market.In addition, the stock market is not different, and to make matters worse, the correlation within the stock market has been increasing recently (Kim et al., 2015).
Acta Scientiarum. Technology Maringá, v. 39, suppl., p. 623-631, 2017 Since returns of assets shares during periods of crisis are more positively correlated, models such as Markowitz (1952) may not be appropriate to protect investors.
In the study proposed by Kim et al. (2015), the authors make the assumption that the market can be divided into several states, each one with its own characteristics.This is not a new assumption and was already made by Turner, Startz, and Nelson (1989) and Schaller and Van Norden (1997).However, the author was not interested in detailing the behavior of each one of the possible states, but in finding an ideal static portfolio.
The proposition suggested by Kim et al. (2015, p. 4) is that "[…] when several states exist in a market, the stochastic portfolio of ideal meanvariance for a risk-averse investor is a robust portfolio, in which its expected return is constant in all states".Kim et al. (2015) found evidence that an emphasis on extreme left-tailed events (belowaverage results) results in the construction of portfolios with more robust performance than portfolios built for best case scenarios, as well as average-variance portfolios without state information.
Beta is defined as a measure of volatility or systematic risk, of a security or a portfolio in comparison with the market as a whole (Ross, Westerfield, & Jaffe, 2012).Aware of it, another valuable contribution from the research developed by Kim et al. (2015) is the affirmation that during market crash periods, assets with a low beta reduce the general portfolio risk and offer better returns than the assets that have a higher beta.

Materials and methods
The objective of this research is to analyze the behavior of portfolios set up using efficiency evaluation when there is risk and uncertainty considering only information regarding worst-case market conditions, and compare them to results obtained through the same model when considering information for any other period.In order to do this, the Chance Constrained DEA model will be employed, supplied with information from a case market, along with models such as Hierarchical Clustering and the model proposed by Sharpe (1963), as shown on Figure 1.
In order to build up the sample, the assets had to have information from a long period of time.Then, 61 assets were chosen, which are traded in the Sao Paulo Stock Exchange and are part of the Bovespa Index (Ibovespa).Once the assets that made up the sample were selected, a set of input and output indicators to use in the efficiency analysis had to be chosen.As this step, which can be considered independent from the others, the variables presented in the literature were analyzed again.The studies by Powers and McMullen (2000) and Rotela Junior, Pamplona, and Salomon (2014) were used for this purpose.The affirmation by Kim et al. (2015) was also taken into consideration, which stated that in periods of crisis, assets with low beta reduce the portfolio risk and lead to better returns than the assets with high beta.
Then, for this step of the research, model output variables were chosen, such as: use of return, asset profitability, and profit-price (LP).Input variables were adopted, such as beta, price-profit (PL), and volatility.Information collected with the software Economática ® correspond to daily data from the months November 2009 to November 2014.
The same strategy proposed by Kim et al. ( 2015) was adopted to better identify the crisis period.In this study, there was a comparison between portfolios optimized with information for which market states were not defined and portfolios that had been optimized considering a worst-case market.To define when the market was in its worst state, n was defined as four.
Considering information from the whole period, for each DMU (asset), it was possible to calculate the average and variance for each of the variables adopted for efficiency analysis, for each of the scenarios determined, with complete information for the market and worst market state.
According to Johnson and Wichern (2012), grouping is performed based on similarities or dissimilarities (distances).The inputs required are similarity measures or data from which similarities can be computed.Hierarchical clustering techniques could proceed by means of successive mergers.Agglomerative hierarchical methods start with the individual items.The most similar objects are first grouped, and these initial groups are merged according to their similarities (Johnson & Wichern, 2012).
Hierarchical Clustering was used, grouping DMUs by degree of similarity, taking into consideration, in this research, the averages and variances for all variables selected for the model.This initiative helped to form the group, significantly increasing the degree of similarity.Thus, the restriction about the minimum number of DMUs required by CCDEA model has been attended.The literature recommends, however, that DMUs number is equal to three times the total number of input and output variables (Rotela Junior et al., 2014).Figure 2 shows the grouping of the DMUs when considering all information from assets between 2009 and 2014.Figure 3 shows the grouping of the DMUs when information collected in the worst case market is analyzed.Therefore, for each of the proposed scenarios, the whole market and the worst state market, two groupings were made.Based on this analysis, they behaved in a more similar manner than when considered to be whole group, which is in accordance to Johnson and Wichern (2012).
Considering information from the whole market, Table 1 presents the descriptive statistics for DMUs that make up group 1 and group 2, respectively.In this step, the codes were presented for assets involved in the efficiency analysis.In addition, considering information from the worst case market, Table 2 presents the descriptive statistics for DMUs that make up group 1 and 2 respectively.Negative data were transformed, with a value being added that changed the most negative value of the series into a positive number without altering the efficiency analysis.This strategy was adopted by Cook and Zhu (2008) and Rotela Junior et al. (2015).For achieving such transformation, for each variable or column the positive value must be added, which makes positive the most negative value in the series.
It is worth mentioning that in this step of the research, input and output variables were considered independent.
A value equal to 1 was used for efficiency level (β i ).For the data analyzed, a good discrimination range for units of analysis is obtained when the risk criterion (α i ) varies between 0.5 and 0.6.This range may vary according to data evaluated by the CCDEA model.To estipulate the risk criterion range, the efficiency analysis through the CCDEA model was performed, and revealed that when the risk criterion is greater than 0.6, all assets are given as efficient; Acta Scientiarum. Technology Maringá, v. 39, suppl., p. 623-631, 2017 and when using a smaller value than 0.5, no asset is given as efficient.
The variation within the range set up in the previous step exists in order to insert the risk-averse investor.Specifically for this step of the research, a variation of 0.02 was defined as ideal for processing the variation of probability of restriction (1-α i ); so, due to this variation, six portfolios will be generated for each state.
To analyze the results, Capital Asset Pricing Model (CAPM) was used.CAPM was presented by Sharpe (1964) to identify the existence of abnormal returns (revenue).In addition, the Sharpe Ratio (S R ) was used, which is the most-used metric.Sharpe ratio can be defined as the ratio between the mean and the standard deviation of the expected excess return of investment opportunity (Schuster & Auer, 2012).Therefore, defined as a ratio of reward-torisk, it has been adopted by many authors to evaluate portfolio performance (Auer & Schuhmacher, 2013).
The software Economática ® was also used to validate daily information between the periods of November 2014 and April 2015.In order to do that, accumulated return was calculated in the validation period for each portfolio according to the participation defined by the optimization models.

Results and analysis
Efficiency for the proposed groups was analyzed considering the risk criteria (α i ) previously adopted in Equation 15.Table 3 presents the results of descriptive statistics of efficiency for groups 1 and 2, respectively.The CCDEA model represented by Equation 13, 14 and 16 was applied, with different probability levels (1α i ) to fulfill the restrictions of the model (see Equation 15), supplied with information from the whole state of the market in the stipulated period.In the same manner, Table 4 presents the results of descriptive statistics of efficiency for groups 1 and 2, respectively.The CCDEA model was used with different probability levels (1α i ) to fulfill the restrictions of the model (see Equation 15).However, in this step of the research, the CCDEA model (see Equation 13, 14, 15 and 16) was supplied with information from the worst-case market.It is worth mentioning that a reduction of risk criterion (α i ) leads to an increase in the probability of processing the optimization model restrictions, making the model more rigorous.Thus, a fewer assets are classified as efficient.The hierarchical grouping analysis was performed in order to group assets that had a certain degree of similarity so that there would be better discrimination of DMUs.When such assets are grouped, divergence is decreased among the CCDEA model restrictions.However, the assets found to be efficient in the groups for each market state are gathered and optimized according to a proposal by Sharpe (1963).
Considering information from the whole market for each risk criterion adopted (α i ), the assets considered to be efficient in Table 3 were submitted to the proposal by Sharpe (1963).However, some of them were not considered efficient by the CCDEA model, and therefore did not undergo Sharpe (1963) optimization.Six portfolios were proposed from the variation of risk criterion, which were identified for ease of discussion as Portfolios A, B, C, D, E and F.
As discussed previously, the more rigorous is the CCDEA model, the fewer are the assets that will be considered to be efficient, because there is an increase in probability for processing restrictions.This fact shows that smaller α i values correspond to higher aversion to risk.Thus, with reduction of risk criterion (α i ), the portfolio tends to be made up of a smaller number of assets.
Six other portfolios (U-Z) were proposed according to the variation of risk criterion (α i ), when the model is supplied only with information from the worst market state.As shown in Table 5 and 6, it is interesting to observe that when the model was supplied with information from the worst case market, a small number of assets were considered to be efficient.Moreover, when submitted to the Sharpe proposal, few of them were selected for the portfolios, which were identified as Portfolios U, V, X, W, Y, and Z. Table 5 presents the adopted risk criterion (α i ), portfolio beta, return results, standard deviation, S R obtained and number of assets that make up the portfolio for each optimized portfolio from information about the whole market.
Table 6 shows the same information for optimized portfolios considering the worst case market.
The most important at this step of the research is to show the importance of information about the worst-case market scenario for robust optimization of portfolios when optimized through the CCDEA model in conjunction with other techniques.Table 5 and 6 will allow analyses and comparisons to be made between the optimized portfolios from the whole market (portfolios A-F) and those that were optimized from information of the bear market (portfolio U-Z).To make this comparison, portfolios A-F and U-Z will be compared in pairs according to the adopted risk criterion (α i ) (see Equation 15).
A comparison of the S R results obtained by portfolios A and U shown in Table 5 and 8 reveals that portfolio A had an S R of 0.024, while portfolio U had a value of 0.218.In both cases, the asset efficiency was evaluated, considering α i equal to 60%.After allocation by the Sharpe model, 57 assets were used to make up portfolio A. For portfolio U, only 11 were selected from the initial sample at the end of the optimization process.The same comparison could be made for the others pairs of portfolios.It is noted that portfolios optimized from bear market information have better performance when measured by the S R , and are composed of less assets.
It is worth mentioning that optimized portfolios from historical data of bear market periods had better results in the Sharpe Ratio (S R ) than those optimized from whole market information for different adopted risk criterion values (α i ).
Table 5 and 6 also present values for expected portfolio return (restitution), which were calculated according to what was shown previously.In order to do this, it was necessary to calculate beta values (β) for each portfolio, also presented in the tables.For portfolios optimized from information about the whole market (A-F), the expected return varied between 1.03 and 1.07% per month.For portfolios optimized from information from bear market periods (U-Z), the expected return was concentrated in the range between 0.95 and 0.96% per month.The non-parametric Mann-Whitney test was applied.Then, the results obtained in such tests, for all pairs of portfolios, were values of P smaller than 0.05.Therefore, it can be affirmed that the abnormal accumulated return of the optimized portfolios with information from worst market state is statistically higher than the accumulated return obtained with optimized portfolios from total market state.
Regarding effective profitability, values of 1.21, 1.30, 2.33, 3.81, 3.24 and 3.20% were found for portfolios A to F, respectively.For portfolios optimized from bear market periods, U to Z, the effective profitability values were 2.10, 1.25, 2.63, 3.70, 3.63 and 3.64%, respectively.This fact demonstrates the existence of abnormal returns when compared to expected profitability, also presented in Table 5 and 6.
The accumulated return of the portfolios could be obtained from information collected within the period selected for validation.A comparison of portfolio pairs was made in which one of them is always optimized from information about the whole market and the other only with information of the worst market state.Figure 4 shows the accumulated return in pairs formed according to the probability level (1-α i ) of processing of restrictions in the CCDEA model.It is interesting to observe that optimized portfolios from information about bear markets (U-Z) had better S R values.Nevertheless, beta values for the portfolios, independent of the adopted risk criterion, were smaller than those of portfolios A-F.Kim et al. (2015) affirmed that robust portfolios optimized through stochastic models reach such robustness because they concentrate specifically on information from a crisis period in the financial market.Kim et al. (2015) also believe that robust portfolios tend to be made up of assets that have a low beta value, and such assets tend to behave better than assets that have high beta value, in any classification period.

Conclusion
This proposal does not claim to replace established approaches such as Markowitz (1952) and Sharpe (1963), but to promote a reduction of search space for assets considered to be efficient through stochastic data from different variables.
The variation of the probability level for processing of restrictions (1-α i ) of the CCDEA model meets investor requirements with different attitudes towards risk, from the most conservative investor to the risk taker.
The employment of Hierarchical Clustering allowed the grouping of assets with the same behavior towards the adopted variables, taking into consideration average values as well as variance values.
Another fact worth mentioning is that for each risk criterion adopted, a smaller beta value was obtained by portfolios that were optimized from information about a bear market state than for those supplied with information from the whole period.Portfolios resulting from robust optimization tend to be made up of assets with low beta values, which perform well regardless of market conditions.Finally, the application of Hierarchical Clustering gave the CCDEA model better data discrimination, even when there were fewer risk criterion, since a reduction was already obtained in the degree of contradiction between model restrictions supplied by the data.

Figure 2 .
Figure 2. Dendogram of grouping using Hierarchical Clustering, considering all market information.

Figure 3 .
Figure 3. Dendogram using Hierarchical Clustering considering only information from the worst market state.

Figure 4 .
Figure 4. Accumulated return of portfolios defined by pairs using risk criterion.

Table 2 .
Descriptive statistics for group 1 and group 2 from the worst case market.

Table 3 .
Descriptive statistics of Efficiency for groups 1 and 2 from the whole market.

Table 4 .
Descriptive statistics of Efficiency for groups 1 and 2 from the worst case market.

Table 5 .
Results of optimized portfolios by the risk criterion considering the whole market.

Table 6 .
Results of optimized portfolios by the risk criterion considering the worst case market.