Image analysis of coffee seeds submitted to the LERCAFE test

The aim of this experiment was to quantify the stained areas of coffee seeds submitted to the LERCAFE test using image analysis. The seeds used were of the Catuaí Vermelho IAC 99 and Paradise cultivars. The physiological quality of the lots was assessed using germination tests, moisture content and a germination speed index. The LERCAFE test was conducted using seeds without parchment immersed in a solution of 3% sodium hypochlorite for 3 hours. Color photographs (RGB) with a resolution of 5 MPx were taken of each seed. The seeds were visually evaluated, and the functions generated from the analyses of the stained and non-stained regions were quantified by the Matlab R2009b program. Classification models were developed based on the Fisher Linear Discriminant Function and the evaluation of the adequacy of the models confusion matrix between the visual references and the classification generated by the linear functions. The image analysis for the creation of Fisher’s linear discriminant function in the development of classifiers for the coffee seeds submitted to the LERCAFE test is potentially efficient; however, it remains necessary to test other discriminant functions and quantification methodologies.


Introduction
Brazil is the world's biggest coffee producer.In 2012, Brazil produced approximately 51 million bags of coffee, which corresponds to 35% of the total coffee produced in the world (Organização Internacional do Café [OIC], 2013).Minas Gerais State alone is responsible for approximately 50% of the total production of coffee in Brazil (Companhia Nacional de Abastecimento [CONAB], 2013).
Coffee is cultured through the use of seedlings, which requires the utilization of high quality seeds.In addition, the evaluation of physiological quality, which involves germination and vigor tests, is essential (Guimarães, Rosa, Coelho, Veiga, & Clemente, 2013).The reduction of time to germination test in the laboratories would be favorable for the production, commercialization, utilization and fiscalization of seeds (Guimarães et al., 2013).Tests such as the LERCAFE, which provides results in a relatively short period of time, are in the highest demand because they facilitate decision making at the different steps of the production process (Reis, Araújo, Dias, Sediyamam, & Meireles, 2010;Zonta, Araújo, Araújo, Reis, & Zonta, 2010).This test consists of the immersion of coffee seeds in a sodium hypochlorite solution for a determined period and temperature.This test, however, still depends on visual evaluation to determine the viability of the coffee seeds.Thus, the utilization of computational images is an alternative for analysis because it eliminates the possibilities of human error; thus, the results are consistent (Marcos Filho, Kikuti, & Lima, 2009).
According to Hoffmaster, Fujimura, Mcdonald, and Bennett (2003), the quality evaluation of seeds through image analyses is a technique that is used by many cultures; it has shown promise for the evaluation of viability and vigor and has the potential to reduce the time required to generate results.The analysis of digital images consists of the recognition of context for the generation of the dimensional characteristics such as the length of the area or object and attributes such as patterns of color and texture, along with their measurement through count methods or the determination of the frequency of the elements that form the images, named pixels (Teixeira, Cicero, & Dourado Neto, 2006).
Several studies have been conducted to improve the results obtained through image analyses to assess the viability and vigor of seeds.Teixeira, Dorado Neto, Cícero, and Martin (2005) evaluated the uniformity of corn seeds using digital images obtained by a professional surface scanner and obtained satisfactory results.Amaral, Martins, Forti, & Cícero, Marcos-Filho (2011) used X-ray images to evaluate the physiological potential of ipê-roxo (Tabebuia impetiginosa) and concluded that this method is efficient.X-ray analysis of images has also been used to evaluate mechanical damage, for example, in soybean (Flor, Cicero, França Neto, & Krzyzanowski, 2004;Ferreira Pinto, Vaz Mondo, Gomes Júnior, & Cicero, 2012), bean (Forti, Cicero, & Pinto, 2008) and corn (Vaz Mondo & Cícero 2005).Another application of image analyses was studied by Moreira et al. (2002), where the authors used lasers to identify, analogous to the tetrazolium test, areas of bean seeds with different levels of metabolic activity.The images were collected by camera CCD ("charge-couple device") and analyzed in a program of image treatment based on the seeds' luminescence.
The development of a technique that confers a higher level of accuracy to the evaluation of the LERCAFE test can provide benefits to future research, giving the test higher reliability and reproducibility.Thus, the objective of this work was to use image analysis to quantify the stained areas of the coffee seeds submitted to the LERCAFE test.

Material and methods
The work was realized at the Seeds Laboratory and Wood Technology Laboratory at the Universidade Federal dos Vales do Jequitinhonha e Mucuri (UFVJM), in Diamantina, Minas Gerais.Two cultivars of coffee seeds (Coffea arabica L.) were used -Red Catuaí IAC 99 and Paradise; these were obtained from the 2011/2012 crop, provided by the Empresa de Pesquisa Agropecuária de Minas Gerais.
For the characterization of the lot profiles, the following determinations and tests were defined, as follows: A germination test was realized using seeds without parchment, which was manually removed.The seeds were placed in a paper substrate to germinate, which was moistened with water in a quantity of 2,5 times their weight.The roller papers were transferred to a germination chamber at 30ºC with constant light.The results of the germination test were expressed as the percentage of normal seedlings, computed at 15 days (First count) and at 30 days (Final count) (MAPA, 2009).
The germination speed index (GSI) was determined according to Maguire (1962), and the moisture content was determined by the air oven method at 105 ± 3°C for 24 hours.
The seeds were submitted to the LERCAFE test, following the methodology proposed by Reis et al. (2010).Coffee seeds without parchment (manually removed) were used and were immersed in sodium hypochlorite with 3% active chloride for 3 hours.
After the realization of the LERCAFE test, the seeds were placed on a laboratory bench and were duly cleaned and disinfected to realize the image register.For separating the seeds, a checkered and numbered A4 paper (Figure 1) was used.The image register was performed using a support exclusively developed to collect images, which only illuminates the seeds using LEDs and standardizes the distance from the camera to the seed (Figure 2).
Colored photos (RGB) were produced.These images were cut to isolate the area of interest and had their background colored black to facilitate the processing (Figure 3).Paint Shop Pro 7 software was used for image editing.For the processing of images, classifiers were developed with the aid of the Matlab R2009b software program.One classifier was made for the Catuaí cultivar (Catuaí classifier), another for the Paradise cultivar (Paradise classifier) and a final one for both cultivars (Set classifier), which had the objective to separate the stained areas and nonstained areas of the coffee seeds submitted to LERCAFE test.The classifiers were developed from the selection of twenty positive points (which were colored due to the application of the test) and twenty negative points (not colored) for each seed, totaling 200 points.The classification models were adjusted based on the Fisher Linear Function (Discriminant Analysis) with variables for the average of the values of the green and blue bands (GB -x 1 ) and the average of the values of the red and blue bands (RB -x 2 ).The Fisher Linear Function is a multivariate analysis technique used to differentiate or discriminate populations and classify individuals in pre-defined populations.For the discrimination, functions are constructed from observed variables that are responsible or that can explain the differences between the populations.For the allocation or classification, functions that separate the populations and also place or classify a new individual in just one population are determined (Ferreira, 2011;Hair Júnior, Black, Babin, Anderson, & Tatham, 2009).The function to discriminate between pixel populations that are positive (regions of seeds that react with the active chloride present in the solution of sodium hypochlorite) and negative (unreacted regions) is given by: D X = L .X = x − x .S .X, where, L = discriminant vector The median point between two population averages x and x is: The rule of classification based on the Fischer discriminant function is: Allocate The accuracy of the classifiers for each cultivar, and for both combined cultivars, was determined by the confusion matrix.This statistical tool was constructed through the true information of the test samples, which means it is assumed to be true that all of the pixels of the test samples belong to the class to which they were assigned.
The exclusion and inclusion of errors for each matrix were determined; exclusion or omission errors occur when a pixel is not attributed to the class to which it belongs, and an inclusion error occurs when a pixel is attributed to a class to which it does not belong.Each error represents one correct class omission, and one inclusion represents one incorrect class.
In the evaluation of the classification using the confusion matrix, a proportion of the samples were correctly classified, denominated global exactness (E_G), which corresponds to the region between the sum of the diagonal confusion matrix (samples correctly classified) and the sum of all elements of this matrix (number total of samples).The kappa coefficient was also calculated, which is based on the difference between the observed agreement and the chance of agreement between the reference dates and one aleatory classification (product between the marginal totals of the matrix).Landis and Kock (1977) proposed a rating evaluation of classification quality based on the K value (Table 1).

Results and discussion
The moisture contents of the coffee seeds of the Red Catuai cultivar and the Paradise cultivar were 19 and 20%, respectively.For the germination test, the results showed that the Red Catuai cultivar presented 91% germination compared to 81% for the Paradise cultivar.For the first count, Red Catuai presented 76% germination, and Paradise presented 61%.The GSI result was 5,3 for Red Catuai and 4,2 for Paradise.
A determination of the seed lot profiles was realized to detect the physiological differences between the lots.The estimated values for the discriminant vector L and the midpoints between the population averages m of the Fischer discriminant linear function applied to the samples, which reacted positively or negatively with the active chloride present in the sodium hypochlorite, are presented in Figure 4.
Note that the values for the discriminant vectors and for the midpoints between the population averages, when the discriminant function on the total conjunct of dates is applied (Red Catuai and Paradise cultivars), are found in an intermediate position relative to those obtained when the function is applied to the cultivars individually.The distribution of the points around the line that represents the limit between the two populations (positive and negative) is also shown in this figure .Table 2 represents the confusion matrices for the classifiers for Catuai, Paradise and a set applied to independently to the samples for their evaluation.In all cases, the global accuracy (E_G) that is shown is quite high.However, for the segregation of the stained and non-stained regions of the seeds of the Catuai cultivar, the set classifier had a slightly higher accuracy.We also observed that the classification of the pixels in the images of the Paradise cultivar seeds by the classifier developed on their own samples presented better results.The Paradise classifier was superior when it was used in the quantification of the stained sample points of the joint sample (two cultivars in conjunction).Still analyzing Table 2, note that for the Red Catuai cultivar the inclusion errors for the positive population were higher than for the negative population; this makes it more difficult to identify by the LERCAFE test what is stained green and what is the natural color of the seed.For the Paradise cultivar, the inclusion errors were higher for the negative population, which means that points were included in the negative class that actually belonged to the positive class.This made it difficult to detect what was not stained green, and the same result was observed when the two cultivars were analyzed together.
Exclusion errors are used to analyze the efficiency of classifiers.For the Red Catuai cultivar, the classifier Catuai erroneously classified 25 points, which were actually negative but were classified as positive.The opposite occurred when the same classifier was applied to the Paradise cultivar and the two cultivars in conjunction.In this case, the classifier incorrectly allocated negative points that were actually positive.For the Paradise classifier, analyzing the cultivars separately, it can be observed that the results showed similar exclusion errors, in which a larger quantity of points were classified as positive when in fact they were negative.The opposite result was found when the cultivars were analyzed in conjunction with this classifier.Finally, when the set classifier was analyzed, the exclusion error was similar to that of the Red Catuai cultivar and the conjunction analyses; in other words, more negative points were classified mistakenly as positive, and the opposite occurred with the Paradise cultivar for this same classifier.
The kappa agreement coefficient for the three classifiers applied to the validation sample groups was considered excellent by the quality scale defined by Lands and Koch (1977).
Figure 5 presents the results of the image classification of the coffee seeds of the Catuai and Paradise cultivars, submitted to the LERCAFE test, applying the three classifiers.The immediate visual analyses suggest that the paradise classifier better identified the stained areas.However, the seed staining is subtle enough that the human eye cannot detect what the classifiers detect mathematically.
Thus, only through the repetitive collection of samples for use in validating the classifiers would it be possible to statistically compare the accuracies obtained for each cultivar-classifier combination.To achieve an ideal methodology for the development of classifiers for the evaluation of coffee seeds submitted to the LERCAFE test, the discriminant function and the experimental instructions must be refined.These refinements should address the variations in the intensity and spectral quality of the light; the use of hyperspectral cameras, including the range of work performed outside the visible spectrum; and adjustments to the concentrations of the sodium hypochlorite solution.

Conclusion
The Fisher linear discriminant function for use in the development of classifiers for coffee seeds submitted to the LERCAFE test is efficient; however, additional tests are still required with other discriminant functions and other methods for quantification.

Figure 1 .
Figure 1.A4 paper used to mark coffee seeds (a-circles used to mark the seeds, b-coffee seeds submitted to LERCAFE test).

Figure 2 .
Figure 2. Support used for lighting coffee seeds and for image collection.

Figure 3 .
Figure 3. Coffee seed submitted to LERCAFE test after editing for the seed isolation.
of characteristics of the points to be classified x = average between the brightness values of the bands G and B; x = averages between the brightness values of the bands R and B; the averages of the variables x1 and x2, in the negative reaction points; x = average of the variable x 1 in the positive reaction points; x = average of the variable x 2 in the positive reaction points; x = average of the variable x 1 in the negative reaction points; x = average of the variable x 2 in the negative reaction points; S = Common covariance matrix of the populations x P and x N.

Figure 4 .
Figure 4. Distribution of variable values descriptors around the limit between the populations (positive and negative).L = discriminant vector; m = midpoint between populations.

Figure 1 .
Figure 1.Original images of coffee seeds submitted to the LERCAFE test of Red Catuaí and Paradise cultivars and the images resulting from the application of the three classifiers.

Table 1 .
Quality of classification according to the intervals of agreement of the Kappa coefficient.

Table 2 .
Confusion matrices between the visual references and the classification generated by the Fisher's linear discrimination function.