Chisquaremax rotation criterion in factor analysis: a Monte Carlo assessment of the effect of outliers

Recently Knüsel (2008) proposed a new method of orthogonal rotation based on chi-square statistic, the Chisquaremax criterion. However, its performance has not yet been evaluated for the effect of outliers. Thus, we assessed the factorial model with Chisquaremax criterion for the effect of outliers using Monte Carlo simulation techniques in different scenarios. The efficiency of covariance matrix estimator provided by the factorial model using either Chisquaremax or Promax criteria was not affected by the presence of outliers. The orthogonal factorial model using Chisquaremax criterion showed better goodness of fit than the results obtained with Promax criterion.


Introduction
Factorial model is frequently applied in many areas of research, since it is a multivariate technique that allows prediction of latent variables called factors or constructs, whose interpretation leads to information on both behavioral and psychological study of individuals showing these variables.
Determination of constructs requires calculating factor loadings.For this purpose, the factorial model provides several alternatives regarding different rotation criteria, such as Varimax and Quartimax procedures.
The Varimax procedure finds the orthogonal transformation of the loading matrix that maximizes the sum of those variances, summing across all m rotated factors, while the Quartimax procedure performs the same transformation across all p variables.However, as a method of maximum likelihood is used for estimating the model parameters, it is plausible to assume that violation of the assumption of multivariate normality is related to the presence of outliers in the sample, which either compromises or hinders the maximization of likelihood function.
It is important to highlight that the criteria of rotation Varimax and Quartimax produce uncorrelated factors.In the case of Promax is a non orthogonal transformation that is not rotation and can produce correlated factors.
Schmitt and Sass (2011) say that few articles have outlined, compared, and evaluated different rotation criteria.Rotation criteria continue to be selected based on the correlations between factors, or the lack of them, while ignoring factor structure complexity and other influences on the rotated loading matrix (HENSON;ROBERTS, 2006).
Another key feature of factor analysis is the feasibility of obtaining new solutions capable of reproducing the same covariance matrix ∑, and this property allows the search for new factor loadings using rotation criteria.
Due to the focus on factor orthogonality and non orthogonal, researchers frequently overlook fundamental differences between rotation criteria, including how different rotation criteria influence the factor structure.Various correlation matrices were used to illustrate how different factor loading structures are affected by the choice of rotation criterion (SASS; SCHMITT, 2010;SCHIMITT;SASS, 2011). Thus, Knüsel (2008) proposed a new method of factorial rotation based on chi-square statistic, the Chisquaremax criterion.
According to Knüsel (2008), Chisquaremax overcomes the major disadvantages of orthogonal criteria, such as the commonly used Varimax and Quartimax procedures.However, since Chisquaremax is relatively new and has not yet been addressed in the literature, the performance of a factorial model submitted to this method should be assessed and compared with other non-orthogonal criteria, considering samples both with different correlation degrees and contaminated with outliers.The motivation for this study is given by Cudeck and O'Dell (1994) when they state that sample size, estimation method and total amount of variance explained by the individual factor loadings influence the factor loading stability and accuracy; they also demonstrated that these assumptions are often violated.For this purpose, Liu and Zumbo (2007) recommend using the Monte Carlo simulation technique, in which contaminated samples can be generated from mixtures of distributions.Thus, we evaluated the factorial model using Chisquaremax rotation criterion compared to the criteria for orthogonal rotation (Varimax) and non orthogonal (Promax) for the effect of outliers with Monte Carlo simulation techniques.For this purpose, we simulated different scenarios and measured statistics related to efficiency and goodness of fit.
This paper is organized as follows.In Section 2 we describe the methodology; Section 2.1 -Chisquaremax rotation criterion; Section 2.2 -Methodological procedure and description of the assumptions used in the Monte Carlo simulation.In Section 3 we discuss the results of this study.Finally, in Section 4, some concluding remarks are given.

Chisquaremax rotation criterion
Following the notation proposed by Knüsel (2008), the theoretical foundation of this new criterion defines the matrix of factor loadings by (1) where: f ir , i=1,…,p; r=1,…,k are the frequencies found in a contingency table with p rows and m columns.Thus, the chi-square statistic is calculated by (2) with the quantities (2) -(6).
  Based on this statistic, Knüsel (2008) presents a decomposition using concepts by Cramérm (1961) and shows that maximizing (2) is equivalent to maximizing the quantity ( 7) Thus, the author suggests an iterative procedure to find the maximum likelihood solution where matrices (8) represent the unrotated loading matrix; (9) corresponds to the orthogonal rotation matrix; and (10) is the rotated loading matrix In summary, the procedure consists of finding the maximum X C subject to the restriction that matrix M is orthogonal, i.e.MM t =M t M=I m .For this, an iterative procedure was presented by Knüsel (2008), that maximizes X C redefined by (11), in which the criterion of rotation chisquaremax is generated.
In matrix form the derivations performed including the method of Lagrange are complex; more details may be viewed at Knüsel (2008).However, we draw attention that the solution obtained in the convergence process in the numerical resolution of the system MA = B. Thus, the steps required in the algorithm are described sequentially 1-4: 1. Assume the initial values defined in (12); 2. Compute (13); where c λ and d λ 3. Compute the singular value decomposition de B 1 =UΔV t where U and V are orthogonal, and Δ=diag (δ 1 ,…,δ k ) and assume A 1 =VΔV t e M 2 =UV t , M 2 is orthogonal and symmetric and positive definite, so that M 2 A 1 =B 1.
4. Repeat the procedure with M 2 in place of M 1 until convergence takes place.

Methodological procedure and description of the assumptions used in the Monte Carlo simulation
Given the objectives of the study, the oblique and orthogonal factor models using Chisquaremax and Promax were considered, respectively.Then the maximum likelihood method were used to compute the factor loadings using the Varimax criterion (10), which are subsequently used as argument of the iterative procedure described in the previous section proposed to obtaining the factor loadings of chisquaremax criteria argument.
Concerning the generation of multivariate samples via Monte Carlo, are assumed that the correlation structure defined by ( 14) and the vector of means whose components were equal to zero.
With the purpose of evaluating the criterion of rotation chisquaremax in relation to the effect of the degree of correlation between variables arbitrarily, it remained fixed in the parametric values ρ = 0.2 and 0.8.Thus, the samples were generated under different scenarios, described in Table 1.Given the number of factors, the number of variables was fixed at p = 18 so that all cases were evaluated satisfying the condition (p -f) 2 ≥ p + f given by Peña (2003), which relates number of variables (p) to number of factors (f).Based on both correlation matrix structure ( 14) and described parameters (Table 1), it was possible to deduce sample covariance matrix ∑.Thus, multivariate samples with outliers were generated in contaminated multivariate normal distribution, and outlier percentage was specified in multivariate tdistribution by mixing probability δ, proportions 0.05, 0.10 and 0.15.Thus, the multidimensional random variable G was generated according to equation (15), where: ν is the number of degrees of freedom in the Student's t-distribution.
Given the rotation criteria of interest to our study, i.e.Chisquaremax and Promax, and the amount of outliers in the generated sample, goodness-of-fit was evaluated in both models using the root mean square residual measure (RMSR in our study) suggested by Sharma (1996), following equation ( 16).  The two measures were submitted to both original and rescaled data by a transformation of variables considering the ratio between each sample observation subtracted from the median of each variable and the median absolute deviation.For further details, see Filzmoser et al. (2008).
All calculations and simulations proposed in this study were perfomed using the Software R (R DEVELOPMENT CORE TEAM, 2009).A program was developed based on the method and 3000 Monte Carlo simulations were performed to generate empirical distributions of measures ( 16) and ( 17), incorporating the function that estimates the factor loadings using chisquaremax available in Knüsel (2008).Then the values expected for each measure were analyzed and the results interpreted.

Results and discussion
In line with the study objectives, our results matched the expected values of goodness of fit index defined in ( 16) by RMSR in all 3000 Monte Carlo simulations in the mentioned scenarios (Table 1).Thus, the results described in Table 2 show that, when considering original data in the scenario of low correlation between variables (ρ = 0.2) with Chisquaremax criterion, goodness-of-fit decreases with increasing proportion of outliers in the sample, represented by δ.However, given the increasing number of factors (f) and sample size (n) this deleterious effect on RMSR index decreased to values closer to zero, which was considered as a criterion for high goodness-of-fit results.
When using Promax criterion we found that in all scenarios the expected values of RMSR index were higher than the ones obtained with Chisquaremax criterion.When using rescaling procedure, however, goodness-of-fit results were more promising since the values were closer to zero with Chisquaremax.Thus, reproduction of sample covariance matrix through factorial model is more accurate with Chisquaremax than with Promax.Since degree of correlation between variables increased, when assuming ρ = 0.8 the expected values for RMSR described in Table 3 decreased when compared to the previous situation (Table 2) with Chisquaremax in both original and rescaled data.Promax criterion, however, provided more discrepant results with much higher values than the null value.Thus, Promax may provide important results, e.g.factor loadings and specific variances, which are subject to errors involving reproduction of sample covariance matrix.This statement agrees with the review by Treiblmaier and Filzmoser (2010), which show subtle differences in rotation criteria that corroborate the researcher's choice of criterion.The authors note that depending on the chosen procedure, complexity of matrix factors, interpretation of correlations between factors and factor loadings can vary substantially in the presence of outliers or due to data heterogeneity.
In addition, Henson and Roberts (2006) report that choosing a criterion definitely affects factor structure, since orthogonal rotations assume that extracted factors are independent of one another, i.e. they are not inter-correlated.In addition, oblique rotations allow factor correlation, thus implying that communalities depend not only on factor loadings but also on covariances of the common factor vector (FERREIRA, 2008).Oblique rotation is desirable in many practical situations, as it allows extraction of factors that reveal groups of inter-correlated variables.Sass and Schmitt (2010) state that researchers often arbitrarily select either an oblique or orthogonal rotation criterion, mostly choosing Varimax because it is the most cited in the literature.Regarding criterion choice, the results of our study show that using RSMR help researchers make better decisions with rescaled data, as simulation studies showed that RSMR favors obtention of residues near the null value, thus indicating high goodness-of-fit.
In terms of nature of outlier observations related to degree of symmetry, Liu and Zumbo (2012) found that outliers generated by symmetrical distribution did not affect correlation matrix estimate.Thus, we argued whether outlier impact would affect efficiency of estimators of the covariance matrix produced by the factorial model in relation to the sample covariance matrix estimator.When maintaining a low correlation (ρ = 0.2) between variables, Chisquaremax criterion presented a quadratic trend regarding the effect of sample size.
As outlier proportion increased, efficiency showed constant effect.This characteristic was observed in both original (Figure 1a) and rescaled data (Figure 1b).Similarly, we found the same behavior when considering degree of correlation between variables ρ = 0.8, referring respectively to original data (Figure 1c) and rescaled data (Figure 1d).Therefore, the efficiency of covariance matrix estimator for Chisquaremax is robust to the effects of outliers regardless of whether variables have low or high correlation.In terms of Promax efficiency, the surfaces illustrated in Figure 2 show that this criterion is more sensitive to the degree of correlation between variables.When considering ρ = 0.2, the method efficiency kept constant given the increase in sample size and outlier proportion.Such behavior is seen in Figures 2a and b referring to original and rescaled data respectively.Assuming ρ = 0.8, the efficiency of covariance matrix estimator, given the factorial model and Promax criterion, show similar results to Chisquaremax for both original (Figure 2c) and rescaled data (Figure 2d).
According to the results, it is important to emphasize that a priori knowledge of the researcher about the nature of the study is a key factor for selecting the rotation criterion.This statement can be exemplified by Costello and Osborne (2005) who reported that, despite the widespread use of orthogonal criteria, especially Varimax, its results in psychology research tend to be inconsistent.

Conclusion
According to the simulated scenarios we concluded that: The efficiency of covariance matrix estimator provided by the factorial model using either Chisquaremax or Promax criteria was not affected by the presence of outliers; The orthogonal factor model using Chisquaremax criterion showed better goodness-of-fit than the results obtained with Promax criterion; Rescaling procedure was adequate for quality improvement of the factorial model in both rotation criteria analyzed in the study.
In future work we intend to apply the chisquaremax criterion in the analysis of multivariate data through exploratory factor analysis.In the case of a methodological research, there is interest in developing other criteria of rotation based on the statistics used in the analysis of categorical data.
the covariances estimated by the factorial model and p is the number of variables.According to the author, values near zero indicate good fit of the factorial model.As proposed by Jhun and Choi (2009) were also calculated, in each scenario, the relative efficiency RE of the estimator of matrix ∑ vector of factor loadings, Ψ the matrix of variances,  the covariance matrix adjusted by the factor model and   the sample covariance matrix.

Figure 1 .
Figure 1.Mean values of Relative Efficiency of estimator obtained through the orthogonal factor model considering the criterion of rotation Chisquaremax and different sample sizes (n); likelihood mixture (δ) in correlation ρ = 0.2 (a and b) and ρ = 0.8 (c and d).

Figure 2 .
Figure 2. Mean values of Relative Efficiency of estimator obtained through factorial design considering the oblique Promax rotation criterion and different sample sizes (n); likelihood mixture (δ) in correlation ρ = 0.2 (a and b) and ρ = 0.8 (c and d).

Table 1 .
Simulation scenarios used in simulating orthogonal and oblique factor model based on the number of factors and sample size.

Table 2 .
Mean values of Root Mean Square Residual, obtained through simulation recital number of factors (f) and factorial model obtained with the criteria Chisquaremax and Promax rotation.

Table 3 .
Mean values of Root Mean Square Residual obtained through simulation recital number of factors (f), and factorial model obtained with the criteria Chisquaremax and Promax rotation.