Mixed models in cerebral ischemia study

The data modeling from longitudinal studies stands out in the current scientific scenario, especially in the areas of health and biological sciences, which induces a correlation between measurements for the same observed unit. Thus, the modeling of the intra-individual dependency is required through the choice of a covariance structure that is able to receive and accommodate the sample variability. However, the lack of methodology for correlated data analysis may result in an increased occurrence of type I or type II errors and underestimate/overestimate the standard errors of the model estimates. In the present study, a Gaussian mixed model was adopted for the variable response latency of an experiment investigating the memory deficits in animals subjected to cerebral ischemia when treated with fish oil (FO). The model parameters estimation was based on maximum likelihood methods. Based on the restricted likelihood ratio test and information criteria, the autoregressive covariance matrix was adopted for errors. The diagnostic analyses for the model were satisfactory, since basic assumptions and results obtained corroborate with biological evidence; that is, the effectiveness of the FO treatment to alleviate the cognitive effects caused by cerebral ischemia was found.


Introduction
In the past few years, there have been an increasing number of experiments in the areas of health and biological sciences, where the response variables are analyzed over time for the same experimental unit in order to check, for example, the performance of alternative treatments that could come to be used in healing patients with some kind of disease.Pharmacological studies with these characteristics were done by Ferreira et al. (2014) and Bacarin et al. (2015) wherein the authors longitudinally evaluated (memory tests) the cognitive performance of animal groups submitted to cerebral ischemia and control groups with treatments of different durations, in the same way as when treated with different medications.Studies with such characteristics belong to the category of repeated measurements, in which one or more response variables are assessed repeatedly in the same unit.
A special case of repeated measures studies is the longitudinal studies.The measurements are performed under different evaluation conditions and they are arranged over time, respecting an analysis order.When thus characterized, the data presents a hierarchical structure, where the observations for same individual have a dependency structure.This requires the modeling of a covariance structure to capture and accommodate the variability of the longitudinal data.One way to consider this dependence on modeling data is to use linear models with the assumption of correlated errors when using a correlation matrix to accommodate individual variation.Alternatively, the data can be modeled including a random effect in the model.In this case, there is a model with fixed and random effects, which is called mixed model, whose focus is to accommodate the correlation between repeated measurements and evaluate individual and collective behavior (of groups) over time.This is a class of quite flexible models, since they offer greater versatility in modeling the data covariance structure to settle variations within and between individuals.In general, they investigate and simultaneously accommodate the random effect and the correlation structure of repeated measurements.
For these models it is common to consider the response variable following a normal distribution (when it is the best choice to model the biological characteristic), since this approach is theoretically consolidated from the formulation, adjustment methods and diagnostic analysis (residual analysis and influence), as observed in Diggle, Heagerty, Liang, and Zeger (2002), Fitzmaurice, Laird, and Ware (2004), Nobre and Singer (2007), Nobre and Singer (2011) with availability of use in applications such as R and SAS as verified in Pinheiro and Bates (2000) and Littel, Stroup, Wolfinger, and Schabenberger (2006).
The main scope of this article is to use the methodology of Gaussian mixed models (GMM) to model the response variable latency (lat -time taken by the animal to find the real hiding place) of a pharmacological experiment that aims at evaluating the effect of fish oil (FO) to the treatment on animals submitted to cerebral ischemia, checking if treated animals showed improvement in their clinical condition (amnesia) in relation to the animals that were not treated with FO.

Material
The data used in this article are a result of an experiment performed in the Cerebral Ischemia and Neuroprotection Laboratory of Maringá State University during the second semester of 2014.The aim of this study is to evaluate how the antiamnesic effectiveness of FO (300 mg kg -1 ) varies over time according to the time that the treatment started after ischemia (known as the therapeutic window, TW), i.e. 4, 8, or 12 hours.
The experiments were carried out following an entirely randomized design, with seven treatments (groups) and five repeated longitudinal measures (testing days).According to treatment regimen, the following groups were tested: Sham-operated (n = 15), Vehicle 4 hours TW (Veh , n = 10), FO 4 hours TW (FO , n = 16), Vehicle 8 hours TW (Veh , n = 10), FO 8 hours TW (FO , n = 13), Vehicle 12 hours TW (Veh , n = 10), and FO 12 hours TW (FO , n = 13).The shamoperated animals were subjected to all experimental manipulations, including surgery, except that they did not have cerebral ischemia nor received vehicle or FO administration.The vehicle and FO groups underwent cerebral ischemia and were treated with olive oil or FO, respectively, according the different therapeutic time window (4, 8 or 12 hours).Intact rats were trained in the aversive radial maze task (ARM) (Paganelli et al., 2004) for 10 consecutive days, and allocated to different groups.One or two days after training, they were submitted to cerebral ischemia; then the treatment with FO or Veh started at 4, 8 or 12 hours postischemia and continued for 7 days consecutively.On the 15 th day of ischemia, the animals begun to be evaluated for their ability to remember the task that was learned preoperatively, i.e, retrograde memory.The memory tests (MT) were applied once a week, for 5 weeks.The ability of rats to learn the shelter location across sessions (days) and trials was measured by response variable Latency, as well by other variables not discussed.This response expresses the time spent (seconds) by the animal to find the hiding place during each attempt.For each animal, and in each testing day (session), the recorded latency was the arithmetic mean of three attempts (trials).The operational procedures for ischemia induction followed the 'Basic principles for animal use', as approved by the Ethics Committee on Animal Experiments of the Maringá State University.

Gaussian mixed models (GMM)
The GMM are used in the context of health when it is desired to evaluate the behavior of experimental units and treatment groups observed over time.In the field of brain ischemia, many studies have utilized mixed models to evaluate data.2014) a mixed model is a parametric linear model for data in clusters, repeated measurements and longitudinal data, which quantifies the relationship between a dependent variable and several linear predictors, containing fixed and random effects.Using these models, it is possible to correct the distortions that appears when regression models with independent responses are used to analyze correlated data.
The general form of a mixed model is given by Equation 1: where: Y is the response vector for the ith subject; X is a design matrix of fixed effects, which represents known values of p covariates; β is a vector of unknown parameters, linked to the fixed effects; is a design matrix of random effects, which represents known values of q covariates; b is a vector of random effects and ε is a vector of assumed independent random errors between them.
It is assumed that b ∼ (0, ) and ϵ ∼ (0, R ), which follow a multivariate normal distribution with mean vector 0 and matrixes (symmetric) of variance and covariance G and R respectively, with b and ϵ independent of each other.
According to Laird and Waire (1982), marginally the Y is independent and follows a multivariate normal distribution with mean X β and variance and covariance matrixes, according as Equation 2: where: V is the ith block of V diagonal matrix, whose elements are called variance components, which compose the vector θ.Alternatively it is written as Equation 3:

~N( )
To analyze the experimental data presented initially, a GMM with a random effect on the intercept and inclination is specified, on the assumption that the probability distribution of the latency variable response is approximately normal.
The mixed model initially proposed is given by Equation 4: where: y is the latency to the ith animal, i = 1, ..., 85, of the jth, j = 1, 2, 3, 4, 5, 6, 7 treatment observed in the kth MT, k = 1, 2, 3, 4, 5; β is the parameter associated with the intercept (general average); α is the effect of the jth treatment; t = 1, 2, 3, 4, 5 is the kth memory test measured as a continuous variable; is the interaction parameter for the jth treatment with the kth day of MT; b and b are the random effects in the intercept and individual inclination; ϵ is the random error.In the above model, it is assumed that G is a second-order matrix where the main diagonal is the variances associated with random effects ( and ) and the secondary diagonal is the covariance ( , ), is a fifth-order, symmetric, positive-definite matrix.

Covariance structures
Several covariance structures are found in the literature, where the G and R matrices can be assumed.Accordingly, the Variance Components (VC), Compound Symmetry (CS), First-Order Autoregressive [AR (1)], and Unstructured (UN) matrices are most often used.In the VC structure, it is assumed that the variance is constant at separate occasions.The CS structure assumes that variances and covariances are homogeneous.For the AR (1) structure, the variances are homogeneous and the covariances decrease exponentially as the interval of time increases.For the UN structure, all variances and covariances are uneven.Other covariance structures are presented by Fitzmaurice et al. (2004) among others.

Parameters estimation
Generally, the methods used to estimate the parameters of the mixed models are based on the theory of Maximum Likelihood, i.e., the methods of Maximum Likelihood (ML) and Restricted Maximum Likelihood (REML).The ML method tends to be biased for the components variance estimates.A way to solve this problem is to use the REML method, which corrects the bias of these estimates, as it takes into account the loss of the degrees of freedom of the fixed effect parameters estimation (Perri & Iemma, 1999).

Selection models
The most common ways of model selection rely on the Likelihood Ratio Test (LRT), the Akaike Information Criterion (AIC) and Bayesian Schwarz Criterion (BIC), which will be used in this study because they are consolidated theoretically and computationally.To compare models with different nested covariance structures, the Restricted Likelihood Ratio Test (RLRT) is used, as well as the BIC Criterion, since it penalizes models with a larger number of parameters.In this case, the RLRT statistics follow a distribution asymptotically, where the degrees of freedom (df) are equal to the difference between the number of parameters of the covariance structures.When it is desired to test the inclusion of random effects in the model, the asymptotic distribution of RLRT will be a mixture of the distributions with q and q + 1 df, each one weighting 0.5 with q and q + 1 representing the number of random effects of each model (Self & Liang, 1987;Verbeke & Molenberghs, 2000).
The test F-Snedecor and contrasts was used to check which fixed effects were significant and comparisons between treatments, respectively (Littel, Stroup, Wolfinger, & Schabenberger, 2006).All hypotheses were tested with a significance level of 5%.

Diagnostic analysis
Through diagnostic analysis, it is possible to verify the suitability of the adjusted model.This is an important step, as it helps to verify the adequacy of assumptions and allows the detection of extreme observations that may affect results.It is possible to divide this stage in residual and influence analysis.
By residual analysis, it is verified if the adjusted model is adequate, that is, it checks if the independence assumptions, homoscedasticity and normality are satisfied.According to Nobre and Singer (2007), for mixed models, the most common residuals are marginal, conditional and from random effects.
With marginal residuals it is possible to detect to which units the matrix adopted variance and covariance within-subject is adequate.In this case, a graph of the observed units versus a standardized measure of Lesaffre -Verbeke is built (Lesaffre & Verbeke, 1998;Nobre & Singer, 2007).Units outside the established limits imply that the covariance structure was inadequate.
To verify the homoscedasticity assumption, a graph of the predicted values versus standardized conditional residuals (SCR) must be used where a random behavior around zero is expected.By a QQ plot graph verifies the normality assumption for minimal residual confounding, which minimizes SCR confounding in verification of the assumption of normality (Nobre & Singer, 2007).The presence of outlier units is checked through the random effects residuals.According to Nobre and Singer (2007), a graph of observed units versus Standardized Mahalanobis Distance should be used.Measurements falling outside the established limits are considered outliers.
Influential points are interpreted as those that, once present, can change important aspects of the adjustment of the models.The basis of influence analysis is associated with the diagnosis of a particular observation (or subset of observations) for quantifying the effect of these data set observation omissions on the analysis results of the entire data set.Among the most common ways to evaluate the influence of observations that affect the model parameters estimates disproportionately, it is possible to highlight the omission/removal of observations/measurements. Omitting observations with these of data set characteristics, the positive and negative effects caused by their absence are evaluated.This technique was first carried out by Cook (1977) and explored by several authors since then.
According to Nobre and Singer (2007), for mixed models, Cook's distance is given by Equation 5: where: For the components of variance, the variance matrix used here is the inverse matrix of second derivatives, known as the Hessian matrix.A ratio higher than one indicates greater precision in estimates with complete data, while a CovRatio less than one implies greater precision considering the reduced data.To investigate the influence, a graph of experimental units versus their respective CovRatio is visually built.
More details about diagnostic analysis for mixed models are presented by Nobre and Singer (2007), Nobre and Singer (2011).
Results were generated using the software R (R Core Team, 2015) with use of the nlme package (Pinheiro, Bates, Debroy, & Sarkar, 2014) and SAS version 9.4 (Statistical Analysis System [SAS], 2015) with use of the mixed procedure.

Results and discussion
Figure 1 shows the average latency of the groups subjected to sham-operation (Sham), ischemia + FO (FO), or ischemia + vehicle (Veh), when the FO and vehicle administration started at 4 (left, FO 4 ), 8 (middle, FO 8 ), or 12 hours (right, FO 12 ) after ischemia.It is observed that as the MT are applied, the animals of the FO group, regardless of the time window of treatment, take less time to find the actual hiding place as compared with the Veh (4, 8 and 12 hours) groups.This memory-protective effect of FO was apparently robust in the 4 hours TW group, given that latency was reduced to the level of the sham group throughout the various test days.When the treatment started at 8 or 12 hours post-ischemia, the latency was also reduced, but not so robustly as in the 4-h TW group, suggesting that the antiamnesic effect of FO was more effective the earlier the treatment started, which suggests the occurrence of a time effect (days of test) for the FO group.Taking into account the general model (4), model 1, two other models were evaluated: model 2, which only has an effect on the random intercept, and model 3 with a random effect only in the inclination, to verify the need to keep these random effects in the model.Through the RLRT it is concluded that there is sampling evidence to reject the null hypothesis in both cases (p < 0.05), meaning that both the random effect in the intercept and the individual inclination should be maintained.A model without interaction between treatments and time, M , was compared with the model M .The results of the comparisons (AIC = 4289.85and AIC = 4287.25)indicate it should maintain interaction in the model.When observing both settings, the more coherent biological results were obtained with the model without the interaction, and thus, it was adopted.
To select the R covariance structure, given the mixed model with the restrictions already obtained, models were adjusted with covariance structures VC (M .), CS (M .), AR (1) (M .) and UN (M .) to evaluate which would capture and appropriately accommodate the variability inherent to the data.Considering the proposed models' sets of adjustments with different structures evaluated, models with CS (BIC .= 4300.42)and UN structures (BIC .= 4303.00)did not accommodate adequate variability, because the parameter estimates showed elevated se.After evaluating the information criteria, the value of BIC indicated favorability in choosing the AR (1) structure (BIC .= 4290.90),although the value of BIC for the VC structure was very close to the prior one (BIC .= 4291.06).The RLRT was used to decide which of these two structures should be adopted.There was sample evidence to reject the null hypothesis, i.e., that the AR (1) structure for errors (χ = 6.19, p < 0.05) should be adopted.Thus, considering the restrictions obtained, the model was adjusted.
The fixed effects test revealed a difference between treatments (F , = 10.84,p < 0.05), as well as between the performance of the rats over the course of the various memory tests ( , = 8.57, p < 0.05).Table 1 presents estimates of REML, standard error (se), p-value and confidence intervals (CI) of 95% for the fixed effects model and variance components, considering the Sham group as a reference.Where ² and are parameters of the matrix R = AR (1).
The adjusted marginal models for the FO , FO and FO treatment are given by Equations 7 at 9: The results suggest that latency tends to decrease over the course of the by 4.11 units in comparison to the initial value.Based on parameter estimates, the FO treatment regimen has a value lower than its standard error.Such occurrence might be associated with the presence of influential points.
The Figure 2a shows that there are observations located in the left area of the graph, some of them concentrated around zero, while others are more dispersed in the right area of the graph.In addition, observations were present even outside of the limits, some closer, others further away, mainly the experimental units 39, 43, 51, 57, 80.However, there is no evidence that violates the assumption of homoscedasticity for SCR.The Figure 2b shows that the normality assumption is met.Experimental outlier units were identified, as well as those whose covariance structure was not adequate (standardized Mahalanobis Distance and standardized measure of Lesaffre -Verbeke).However, the majority of these units were treated with Veh (4, 8 or 12 hours post-ischemia) or with FO 8 or FO 12 , indicating a coherent behavior.
Considering the influence analysis, units that are candidates to have this characteristic (Figure 3a) also change the accuracy of the fixed effects estimates (Figure 3b).In general, units identified in Figure 3b also alter the accuracy in the variance components.Based on the observations with identified influential characteristics, about 46.87, 10.9, 12.5, 9.4, and 20.3% occurred during the 1 st , 2 nd , 3 rd , 4 th and 5 th day of MT, respectively.Since most of the observations occurred on the first MT, when the rats returned to the ARM task 15 days after ischemia, those observations might explain the possible influence.
When these observations were suppressed of data set, the main influences were now closer to the parameter estimates related to FO 4 and FO 8 treatment regimens.Such exclusion implied a decrease of approximately 34% of each estimation.On the other hand, these were not changed significantly when compared to the original estimates.For the remaining estimates, the reduction was in a smaller proportion.For the influence over variance components, the highest reduction was in relation to the estimate , approximately 34.71%.Based on these results, the influence of the detected observations is evident.Considering that such observations are influential points, the decisions obtained from the complete data set were not changed after adjusting the model for the suppression of those observations.Since the performance of each animal is intrinsic to their biological nature, these observations must stay in the data set.
The GMM met the experiment goals, since the statistical results corroborate the biological evidence.It was observed that the group given vehicle at 4, 8 and 12 hours post-ischemia showed higher latencies when compared to Sham (Table 1), indicating that the animals in this group forgot the task performed during the pre-ischemia training phase.The memory deficit was significantly reduced by treatment with FO, an effect that was robust when the treatment started at 4 hours post-ischemia (FO vs Veh t = -5.04p < 0.05).Significant but less robust antiamnesic effect also occurred, however, in the FO 8 and FO 12 groups (FO vs Veh t=-2.88;FO vs Veh t=-2.97;p < 0.05) implying that the memory was only partially restored as the treatment initiation delayed from 4 to 12 hours post-ischemia.Likewise, animals treated with FO and FO were able to remember the tasks performed before the intervention (FO vs Sham t = 0.59; FO vs Sham t = 1.60; p > 0.05).Although the performance of animals treated with FO was better than that of that of the Veh group, a moderate memory deficit persisted throughout the experiment (FO vs Sham t = 2.84; p < 0.05).

Conclusion
The modeling of the data by a Gaussian mixed model proved to be appropriate to evaluate the latency response variable.The diagnostic analyzes for the model were satisfactory, since basic assumptions were checked.It was found that the UN and AR (1) covariance structures accommodated the dependence and the variability of the data appropriately, both within and between individuals.Using this methodology, we demonstrated that FO was effective in alleviating the memory deficit when the treatment initiated at 4 or 8 hours post-ischemia.However, the effectiveness was reduced when treatment initiated at 12 hours post-ischemia, indicating that the faster the treatment begins, the better the antiamnesic effectiveness of FO.
set of observations omitted from the data set, Y and Y indicate that the estimates of the model were obtained from the complete data set, and from the reduced data set without the observations indexed by I, respectively; V is the inverse of the variance and covariance matrix and c is the scale parameter.Clearly I can be composed of a set of Acta Scientiarum.Technology Maringá, v. 38, n. 3, p. 345-352, July-Sept., 2016influential observations for each experimental unit, as well as all observations for each identified sampling unit.To identify influential observations, a graph of observed units versus Cook's distance should be used.Points identified outside of the established limits are influential candidates.By omitting one observation or set of observations of data set, it is possible to verify the accuracy of the parameter estimates of β and θ using the covariance ratio (CovRatio).In this case, the determinants of the covariance matrices of the reduced data set are compared with the covariance matrices of the full data set.The expression to calculate this measure for β and θ is given by Equation6:

Figure 1 .
Figure 1.Average profiles for latency variable response in treatment started 4, 8 and 12 hours post-ischemia regarding different treatments.Considering descriptive analysis of the data, average and standard deviation respectively, the latency of FO group (90.92 -72.44 s; 72.44 - 33.90 s, 61.11 -26.29 s; 74.32 -28.80 s; 73.73 - 31.77 s)  was inferior to Veh group(148.36- 50.40 s; 151.70 -49.95 s; 146.89 -53.55 s; 146.67 - 54.03 s; 131.23 -48.26 s), independent of the memory testing day.This suggests that FO-treated animals took less time to learn the shelter location.The FO and FO groups exhibited similar, but less intense performance.Taking into account the general model (4), model 1, two other models were evaluated: model 2, which only has an effect on the random intercept, and model 3 with a random effect only in the inclination, to verify the need to keep these random effects in the model.Through the RLRT it is concluded that there is sampling evidence to reject the null hypothesis in both cases (p < 0.05), meaning that both the random effect in the intercept and the individual inclination should be maintained.A model without interaction between treatments and time, M , was compared with the model M .The results of the comparisons (AIC = 4289.85and AIC = 4287.25)indicate it should maintain interaction in the model.When observing both settings, the more coherent biological results were obtained with the model without the interaction, and thus, it was adopted.

Figure 2 .
Figure 2. SRC and simulated envelope with CI of 95% for the residual minimal confounding.

Table 1 .
Estimates, se, p-value and CI of 95% for the model parameters.