Longitudinal modeling using log-gamma mixed model: case of memory deterioration after chronic cerebral hypoperfusion associated with diabetes in rats

: In recent years several longitudinal studies have been conducted in the ﬁeld of pharmacology. In general, continuous response variables occur frequently in these situations and tend to present asymmetric characteristics, as well as being restricted to the set of positive real numbers. erefore, using the normal model would be incorrect. In this conjecture, generalized linear mixed models (GLMM) are used to analyze data characterized in this way, aiming to accommodate inter-and intra-individual variations. us, we propose a mixed gamma model (LGMM) with a log link function and random eﬀects normally distributed to evaluate data from a longitudinal experiment, where the eﬀects of cerebral ischemia associated with diabetes on the performance of long-term retrograde memory were evaluated in rats. Based on the results obtained, the random intercept model presented a good ﬁt and accommodated the correlation inherent to the data. It was possible to observe that normoglycemic animals, when compared to hyperglycemic animals, whether submitted to ischemia or not, had their cognitive capacity partially preserved, indicating that hyperglycemia (‘diabetes’) aggravates the cognitive


Introduction
Recently, longitudinal studies have gained prominence in the area of health and biological sciences, especially in the pharmacological area.is is due to the interest of repeatedly evaluating one or more experimental units, in order to verify the effectiveness of a treatment or the evolution of a patient through an intervention.Longitudinal studies are versatile designs, since they allow the verification of the evolution of treatment over time, either for a specific group or for the individuals studied.In particular, the objectives of studies in this area are related to the understanding of which treatments can be used, both in the prevention and in the rehabilitation of individuals, as for example aer cerebral ischemia (stroke).
In the literature, it is possible to identify several studies with this theme which we highlight Bacarin et al. (2015), Bacarin et al. (2016) and Zaghi et al. (2016).Commonly, the studies longitudinally analyze the cognitive behavior of animals treated with different drugs over different time periods, as well as when the ischemic induction method is altered.In this conjecture, several statistical models can be used to model the data in question.Considering that the evaluation over time for the same experimental unit induces a correlation for the measurements of the same individual, the modeling technique through mixed models becomes attractive for data evaluation.It allows modeling this correlation through the inclusion of random effects in the model, as well as through the use of a correlation structure.However, this is little explored by authors in the health area.In this scenario, even though the number of studies addressing this technique is limited, the predominant methodology is the one that considers normal distribution for response variable and random effects, for example in Ribeiro, Milani, and Prevideli (2016) and Ferreira et al. (2014).Nevertheless, it is possible that in certain situations, the data present a characteristic of asymmetry and are restricted to the set of positive real numbers.us, a distribution for a response variable that is appropriate for such data should be used, such as the gamma distribution, for example (Sassoon et al., 2016).
Regarding longitudinal designs, one possibility is the use of generalized linear mixed models (GLMM) (Breslow & Clayton, 1993).ese are so called because random effects are included in the linear predictor of the standard generalized linear model (GLM) (Nelder & Wedderburn, 1972), so that it is possible to accommodate intra-individual variations resulting from the longitudinal characteristic of the study.A GLMM allows the structure of temporal dependence, caused by the correlation of the data, to be accommodated by adding random effects in the linear predictor, which induces the correlation between the observations of the same experimental unit (Fitzmaurice, Laird, & Ware, 2004).
us, the objective of this study was to adopt the GLMM methodology with gamma distribution to model the response variable latency, i.e., the time spent by the animal to perform the behavioral task aer a condition of chronic cerebral hypoperfusion (CCH) combined with diabetes.More specifically, we want to verify if diabetes aggravates the outcome of CCH.To this end, we measured the ability of hyper-or normoglycemic rats to remember the task that was learned prior to the induction of CCH.

Data set
e data used in this article are a result of an experiment performed in the Cerebral Ischemia and Neuroprotection Laboratory of Maringá State University during the year of 2015-2016.e aim of this study is to evaluate whether diabetes (viewed as a risk factor for vascular dementia) aggravates the cognitive sequelae caused by chronic cerebral hypoperfusion (CCH) in middle-aged rats (12-15 months of age).e experiments were carried out following a completely randomized design, with four treatment (groups): sham/normoglycemic (SN -12 animals -1 to 12), sham/hyperglycemic (SH -12 animals -13 to 24), CCH/ normoglycemic (CN -12 animals -25 to 36) and CCH/hyperglycemic (CH -11 animals -37 to 47).Hyperglycemia (diabetes) was induced by a single intravenous injection of streptozotocin (STZ, 35 mg kg -1 , Sigma, St. Louis.Mo, USA) dissolved in 10 mM citrate buffer (pH = 4.5).Control animals received the buffer solution alone.Blood glucose levels were measured 2 days aer injection and animals with glycaemia higher than 250 mg dl -1 were classified as overtly diabetic.CCH was induced according to the four-vessel occlusion/ internal carotid artery (4VO/ICA) model, as described in detail elsewhere (Pereira, Ferreira, Oliveira, & Milani, 2012).
Fieen days aer STZ or vehicle injections, the diabetic or normal rats were trained for 15 days to learn a radial maze task, as described previously (Pereira et al., 2012).At the end of training, the rats were subjected to 4VO/ICA or sham surgery.ey were le to recover from surgery stress for 7 days, and then they were tested for their ability to remember the task that was learned prior to surgery, i.e., retrograde memory performance.e retrograde memory tests (MT) were applied once a week, for 3 weeks and once a time before ischemia procedures.Retrograde memory performance was measured by response variable latency, as well by other variables not discussed (e.g., number of reference and working memory errors).e latency response expresses the time spent (s) by the animal to find the destination (i.e., the goal box or the safe place within the aversive radial maze) during each attempt.
For each animal, and in each testing day (session), the recorded latency was the arithmetic average of three attempts (trials).e operational procedures for diabetes and ischemia induction followed the 'Basic principles for animal use', as approved by the Ethics Committee on Animal Experiments of the Maringá State University.

Generalized linear mixed models (GLMM)
In studies whose data are modeled following GLM, only fixed effects are considered.However, when we come across longitudinal studies whose probability distribution of the response variable belongs to the exponential family (EF), random effects can be included in the linear predictor.In this sense, a class of models with this characteristic was developed, called Generalized Linear Mixed Models (GLMM) (Breslow & Clayton, 1993).In this context, the temporal dependence structure caused by the correlation of the data is accommodated by adding random effects to the linear predictor.
In general, GLMMs arise the combination of exponential family models with normally distributed random effects (Molenberghs, Verbeke, Demétrio, & Vieira, 2010).However, there is no need for the random effects of the model to be normally distributed, as can be seen in studies found in the literature, for example Molenberghs et al. (2010) and Fabio, Paula, and Castro (2012).Modeling via these models has several advantages.ey allow the evaluation of data whose distribution of the response variable belongs to the EF, as well as to work with estimates using an approximation of integrals (Stroup, 2012).In addition, they are versatile in modeling the covariance structure and random effects for the longitudinal measurements of the same individual in contrast to other methodologies such as ANOVA for repeated measurements.
e GLMM has been used to quantify the longitudinal benefits of different strategies aer hypoxic/ischemic brain damage.ey include investigations on the neuroprotective effects of the antineuroinflammatory lipid signaling molecule N-palmitoylethanolamine associated with the antioxidant agent luteolin in rats and humans (Caltagirone et al., 2016), the effects of prophylactic triple-H therapy on hemodynamic variables in humans victims of subarachnoid hemorrhage (Tagami et al., 2014), the impact of reducing the time window of treatment with intravenous tissue plasminogen activator (tPA) and the complicating effects of bleeding in patients that underwent acute ischemic stroke (Ido et al., 2016).Overall, these studies indicate the importance of using GLMM to evaluate experimental data related to the long-term outcomes of cerebral ischemia.
In order to determine the general form of a GLMM, we consider Y ij the j th observation for the ith sampling unit, i = (1, ..., m) and j = (1, ..., n i ), with probability distribution Y ij for belonging to the EF, conditional to a vector of random effects u i = (u i1 , …, u iq ) of dimension q, where it is written Equation 1.
(1) with a(.), b(.) and c(.) being known functions and ϕ dispersion parameter to the distribution used.e parameter n ij relates the expected value of the conditional response variable to the random effects, say , with a set of explanatory variables by means of Equation 2. (2) For the expressions presented in Equation 2, we have that x ij is a p-dimensional vector of explanatory variables, β is a p-dimensional vector of unknown parameters (related to fixed effects), z ij is a q-dimensional vector of variables that were, in general, a subset of x ij u i , is a vector of random effects and g (.) a link function (Molenberghs, Verbeke, & Demétrio, 2007;Mcculloch, Searle, & Neuhaus, 2008;Mcculloch & Neuhaus, 2011;Gad & Kholy, 2012).

Estimation of parameters
e maximum likelihood method can be considered as one of the most popular parameter estimation methods in the statistical literature.Based on the maximization of the likelihood function, the maximum likelihood estimators are obtained.It is possible to write the conditional distribution function of the response variable Yi in relation to the random effects u i .e estimation of the vector of fixed parameters β as well as the components of the variance-covariance matrix ψ will be done through the maximization of the marginal likelihood function obtained through integration under the random effects u i (Molenberghs et al., 2007).In this context, from a random sample (y 1 , …, y ij ) of i observations with j repeated measures, two by two independents, the log-likelihood function is given by Equation 3: (3) e distribution of random effects for this problem is multivariate normal, with mean vector 0 and variance-covariance matrix ψ.In order to maximize the expression presented in Equation 3 it is necessary to solve the integrals involved.However, it is a fact that there is no analytical solution, that is, the solution cannot be obtained through of traditional methods presented in integral and differential calculus.In this sense it is necessary to use numerical integration methods, which provide approximate solutions to given integrals.

Log-gamma mixed model (LGMM)
For the problem addressed, it is assumed that the conditional distribution of y ij given u i belongs to the EF with probability density function given by Equation 4: (4) that is, y ij ui#Gamma (μ, α), where μ is a mean for gamma distribution and α is scale parameter, with de dispersion parameter , both positive, y ij is the latency for the ith (i = 1, ..., 47) evaluated animal, in the jth (j = 1, 2, 3, 4) memory test (MT).e functions a(.), b(.) and c(.), considering the Equation 4 and 1 are Equation 5: (5) In fact, the systematic component of the LGMM adopted is given by Equation 6: (6) with from to being the unknown regression parameters, G l , l = 1, 2, 3 is the group in which the ith animal was allocated.e G l assume values 1 to l = 1, 2, 3, representing CH, CN and SH groups, respectively, and 0 otherwise and considering the SN group as a reference.Still t ij is the time of the ith animal evaluated according to the jth MT.Also, log(n ij ) represents the link function.e random effect u i = (u i0 )' is the random intercept to the ith animal, and it is assumed that , where .According to the LGMM presented in Equation 6, the log-likelihood function is given by Equation 7, (7) From the one reported in section estimation of the parameters, especially in Equation 3, the expression given by Equation 7does not have an analytical solution.us, the maximization of the log-likelihood function, to obtain the estimates for fixed effects, will be done through the Laplace approximation (Tierny & Kadane, 1986 ).

Diagnostic analysis
Aer the adjustment of a regression model, whether it is only fixed effects or fixed and random effects, the validation of the adjusted model is necessary.is phase is comprised of the verification that the model is satisfactorily adjusted (residual analysis) to the identification of possible influential observations (influence analysis).e exclusion of this phase of the analysis may imply the uncertainty of the conclusions obtained from the adjusted model, thus compromising the developed study.
To verify if the model is properly adjusted, an easy-to-interpret method is to construct a half-normal plot (Atkinson, 1985).ese plots are obtained by plotting the ordered absolute values of a model diagnostic versus the expected order statistic of a half-normal distribution, according to Equation 8: (8) where: n consists of the sample size, i from 1 to n, adding to this graph a simulated envelope from the estimates of the adjusted model (Moral, Hinde, & Demétrio, 2016).If the adjustment of the model is adequate, then the points are expected to be contained within the simulated envelope (Vieira, Hinde, & Demétrio, 2000).To check if the residuals of the adjusted model have a random pattern, a graph of the fitted values versus the Pearson residue will be used.A random behavior around zero is expected.
Over time, different measures of influence have been proposed in the literature, among which Cook's Distance (CD) stands out (Cook, 1977).However, when working with a GLMM, it must be taken into account that there are estimated parameters associated with the fixed effects, as well as the prediction of random effects.us, it is necessary to use a measure that is useful for evaluation in such cases.In this sense, an extension of the measure of influence proposed by Xiang, Tse, and Lee (2002) was proposed by Pinho, Nobre, and Singer (2015).is extension takes into account the aspects described above.is measure is obtained by means of the Equation 9: which can be decomposed into three components as follows, according Equation 10: (10) where the subscripts ij indicate the ith animal and its jth MT, Var (Y | u) is the conditional variance of Y in relation to the vector of random effects u, is an estimate of η, an estimate of without the jth MT of the ith animal and jth MT, is a known function in relation to the dispersion parameter to gamma distribution in this case, reported in Equation 5, p and q are de dimension the vectors of fixed effects and Z ij .e terms are the components of the distance from which they can identify the influence of ith animal and jth MT in relation to the parameter estimates for the fixed effects, prediction of the random effects and influence in relation to the covariance of and , which is expected to trend to zero (Pinho et al., 2015).To identify influential observations, a graph of the observed values/units versus the proposed CD ij should be used.Points outside the established limits will be influential candidates.
e obtained results were generated using the soware R (R Core Team, 2015).Were using the lme4 (Bates, Machler, Bolker, & Walker, 2015) and hnp (Moral et al., 2016) packages for adjustment and validation of the proposed model respectively.For the construction of the influence analysis graphs, R codes for GLMM diagnostics available at http://www.ime.usp.br/~jmsinger/GLMMdiagnostics.zip were used.

Results and discussions
Figure 1a represents the total average latency of the animals for the four groups CH (hipoperfusion hyperglycemic), CN (hipoperfucion normoglycemic), SH (sham hyperglcycemic) and SN (sham normoglycemic).It is possible to notice that the groups that underwent CCH under the conditions of hyperglycemia (CH) or normoglycemia (CN) had a higher latency (CH: 116.64 ± 13.69 and CN: 85.82 ± 6.98) when compared to the groups that were not submitted to ischemic induction, i.e., sham-surgery (SH: 61.97 ± 5.97 and SN: 52.32 ± 3.54), indicating that diabetes aggravated the cognitive deficit caused by CCH.Regarding the latency of normglycemic and hyperglycemic sham-operated animals, it is identified that the first group performs slightly better than the second one.is suggests that hyperglycemia alone did not compromise the memory performance.From Figure 1b, which represents the average observed latency profiles, considering that the measurement 1 was made before the animals underwent ischemia or only the surgical procedures, it is seen that for the groups SN (52.71 ± 3.33;52.10 ± 2.49,45.17 ± 3.19) and SH (60.60 ± 6.39,65.03 ± 6.93,58.75 ± 4.85), as the measure is applied to the post-ischemia memory tests, the animal's ability to perform tasks remains approximately constant and is slightly better at the end of the tests.However, the same does not occur for the CH group (142.34 ± 11.17, 126.70 ± 10.22, 137.83 ± 9.45), which indicates the occurrence of severe cognitive impairment.For the CN group, similar behavior to the SN and SH groups is observed, however, with less intensity.Total average latency and average profiles (vertical bar represent the 95% confidence interval to average) for the four groups CH (hipoperfusion hyperglycemic), CN (hipoperfucion normoglycemic), SH (sham hyperglycemic) and SN (sham normoglycemic).
From the adjustment of the model presented in Equation 6, it was possible to identify that there is difference in the response variable for the different groups and memory effect, time effect, (x 2 2 = 4.07-40.94,p < 0.05).Table 1 presents the parameter estimates for the model, standard error (SE), p-value for the model parameters.
With the results presented in Table 1, it is verified, without exception, that animals of the CN or CH group had inferior performance in the retrograde memory tests when compared with the SN group.In addition, when comparing group SH with groups CH and CN, those who underwent ischemic induction via CCH had inferior performance, that is, they took a long time to perform the MT activities.A similar performance was observed when comparing the results of groups SH and SN.is indicates that even without ischemic induction, hyperglycemic animals have their cognitive performance compromised, although with less intensity.Animals in the CH group had their cognitive function compromised by diabetes, as well as the animals of the CN group, although the latter in lesser intensity.Growing evidence support the hypothesis that CCH represents a possible etiological factor in the development of age-related dementias, including that of Alzheimer's disease (AD) type (Zhao & Gong, 2014), and the occurrence of diabetes may aggravate this scenario (Saedi, Gheini, Faiz, & Arami, 2016).erefore, in the pre-clinical investigation of the physiopathology of age-related dementias, the clinical relevance of animals' models of neurodegenerative diseases associated with CCH may be improved if certain preexisting risk factors (e.g., diabetes, hypertension) are combined with CCH.In these studies, the reliability of the experimental data can further be improved if the appropriate statistical method is used to quantitate the behavioral data.Estimparison between groups, SE, t-value, z-value and p-value for estimates of LGMM.
By means of Figure 2a, it is observed that all points are located around zero, randomly distributed without presenting a systematic pattern, and are within the established limits.is implies that there is no violation of the homoscedasticity assumption for the Pearson's residuals of the adjusted model.Figure 2b shows a half-normal simulated envelope graph for Pearson's residue, with 99% confidence for the adjusted model.It is possible to see that without exception, all points are within the established limits or at the border.is indicates that the model is well adjusted, as the assumption of gamma distribution for the response variable is adequate.Graphs for analysis of residuals of the adjusted model.Graph for influence analysis.

Conclusion
From the aforementioned results, the adjusted LGMM proved to be effective at modeling behavioral data.Diagnostic analysis for the adjusted model was satisfactory.e modeling by a GLMM was able to capture the inherent variability in the longitudinal data.It becomes an alternative to frequently used traditional method, which disregard the effect of longitudinal dependence and may make the developed inferences inadequate.rough of GLMM methodology, we demonstrated that the statistical method developed here fit the behavioral data very well.From this, we demonstrated that ischemia condition is aggravated by diabetes.