DNA barcode regions for differentiating Cattleya walkeriana and C. loddigesii

Growers appreciate Cattleya walkeriana and C. loddigesii due to striking shape and rarity. Thus, this study aimed to evaluate the feasibility of DNA barcode regions, namely ITS1, ITS2 and rpoC1, to discriminate between C. walkeriana and C. loddigesii species. DNA barcode regions were successfully amplified using primers designed to amplify plants. We also included sequences from public databases in order to test if these regions were able to discriminate C. walkeriana and C. loddigesii from other Cattleya species. These regions, and their combinations, demonstrated that the ITS1+ITS2 had the highest average interspecific distance (11.1%), followed by rpoC1 (1.06%). For species discrimination, ITS1+ITS2 provided the best results. The combined data set of ITS1+ITS2+rpoC1 also discriminated both species, but did not result in higher rates of discrimination. These results indicate that ITS region is the best option for molecular identification of these two species and from some other species of this genus.


Introduction
Brazil has a great biodiversity of orchid species.Some of them, especially the epiphytes, are endangered.Thus, knowledge about genetic diversity is extremely valuable for the preservation of species at risk.Loss of genetic variability may reduce survival and evolution chances in the wilds.Conserving such hereditary legacy is crucial for long-term species survival (Muñoz, Warner, & Albertazzi, 2010).Cattleya walkeriana Gardner is now a threatened species due to forest fragmentation and predatory collection (Tambarussi et al., 2017).Growers appreciate C. walkeriana due to its diversity of forms, springing in beautiful and valuable flowers (Faria, Santiago, Saridakis, Albino, & Araújo, 2002).In recent years, collectors have been looking for plants with high genetic improvement (Menezes, 2011).Individuals with improved traits (rare color and good flower shape) are highly valued (Tambarussi et al., 2017).Cattleya loddigesii Lindl.occurs in the states of Minas Gerais, Paraná and São Paulo States in Brazil, and also in the Northeast of Argentina (Barbosa Rodrigues, 1996).These species are in the same background of modern Cattleya Alliance hybrids (American Orchid Society [AOS] (2016) and the growers have used these species to produce hybrids.Orchid growers accept this process when the aims are produce pure plants (crosses among the same species).
Currently, the development of biotechnology, including several techniques manipulating DNA for differentiation purpose, are being applied to maintain genetic features, breeding programs, characterization of germplasm banks, and discrimination of hybrids (Cruz, Selbach-Schnadelbach, Lambert, Ribeiro, & Borba, 2011).Many molecular techniques help generating information and assessing polymorphism among individuals and populations (Qian, Wang, & Tian, 2013).Biotechnological techniques, such as the in vitro procedure (Faria et al., 2002), differentiation of natural populations, species delimitations in rare plants (Qian et al., 2013), and phylogeographical studies (Monteiro, Selbach-Schnadelbach, Oliveira, & van den Berg, 2010), have extensively contributed for understanding and saving orchid species.Molecular markers have been used for genetic analysis of the genus Calanthe (Qian et al., 2013), Cattleya (Almeida et al., 2013;Rodrigues et al., 2015), and many others.
Molecular identification of species has been extensively used in many organisms, such as animals, fungi, bacteria and even plants.Hebert, Cywinska, Ball, and deWaard (2003) proposed an identification of a biological system based on DNA sequences (DNA barcoding).In this context, they proposed that a small, but standardized region from the genome could be able to discriminate species.In animals, this region is Cytochrome Oxidase subunit 1(COX1; Hebert et al., 2003) and ITS (internal transcribed spacer; Schoch et al., 2012) region for fungi for example, but in plants several systems have been discussed, involving the sequencing of one or more standard genomic regions for species identification (Hollingsworth et al., 2009).For plants, the DNA barcodes (rpoC1+rpoB+matK or rpoC1+matK+trnH-psbA) (Kress & Erickson, 2009, Hollingsworth, Graham, & Little, 2011), and an internal transcribed spacer (ITS1+ITS2) (Chen et al., 2010;Selvaraj et al., 2012), have been suggested by different researches for plant species identification.In the Orchidaceae, the following loci [rbcL, matK, atpF-atpH, psbK-psbI, trnH-psbA and ITS] have been recommended as plant barcoding loci to discriminate among species within the genus Holcoglossum (Orchidaceae: Aeridinae) (Xiang, Hu, Wang, & Jin, 2011).However, DNA barcoding in Orchidaceae is very recent.Consequently, several studies have proposed a new system to discriminate between species based on infrageneric taxonomy, within few genus (Kim, Oh, Bhandari, Kim, & Park, 2014;van den Berg, 2014).
Thus, this research looked at the use of DNA barcode to differentiate the species C. walkeriana and C. loddigesii, and also to differentiate these two species from other Cattleya species.1).

Material and methods
DNA was extracted from 100 mg samples per plant by the Doyle and Doyle (1990) method.Four C. loddigesii and three C. walkeriana individuals were genotyped.DNA barcodes from these species were tested and separated in order to use their sequences in further studies.Although the aim of this study was to test the DNA barcode discrimination between C. loddigesii x C. walkeriana, we also included GenBank sequences from other species of Cattleya and tested the discrimination of C. loddigesii and C. walkeriana against other species.Unfortunately, several species were represented by only one specimen, where it was not possible to evaluate the intraspecific species variation and also not able to concatenate the regions in order to test species discrimination.Sequences of the universal primers for evaluating DNA barcodes, including those for ITS1, ITS2 (ITS1+ITS2, herein named as ITS region) and rpoC1, and general PCR reaction conditions, were obtained from previous studies (Tokuoka & Tobe, 2006;Chen et al., 2010;Sharma, Folch, Cardoso-Taketa, Lorence, & Villarreal, 2012).All PCRs were performed in 25 μL reaction volumes with 12.5 μL of PCR Master Mix (Promega Corp., Madison, Wisconsin), 1.25 μL each of 10 μM primers (upstream and downstream), and 10 μL of diluted (10-to 100-fold) DNA template.PCR products were checked on a 1.0% agarose gel.The sequencing amplification protocol consisted of one cycle of 1 min at 96°C, followed by 30 cycles of 10 sec at 96°C, 5 sec at 55°C, and 4 min at 60°C, using the ABI Prism BigDye terminator v3.

Statistical analysis of DNA barcodes
DNA barcodes candidates were edited with BioEdit program, version 7.0.9.0 (Hall, 1999).Informative polymorphic characters were identified by MEGA6 (Tamura, Stecher, Peterson, Filipski, & Kumar, 2013).Alignment of the sequences was executed by MUSCLE program (Edgar, 2004).Eventually manual adjustments were made through BioEdit software, version 7.0.9.0 (Hall, 1999).The different locus combinations were partitioned for independent model assessment at each marker.Diagnostic characters analysis was conducted for rpoC1 gene according to BOLD Systems (Barcode of Life Data Systems).Pairwise genetic distances for each individual sequence data set, and all possible combinations of the three sequence data sets, were determined by Kimura 2-parameter (K2P) method (Kimura, 1980) using MEGA6 (Tamura et al., 2013).Neighbor-joining analysis (NJ), under MEGA6, was employed to assess whether the resulting sequence data sets in various combinations formed species-specific clusters.For NJ analyses, K2P distance matrices were used.A bootstrap (BS) analysis (Felsenstein, 1985) of 1000 replicates was conducted to evaluate support for clades using the same search parameters as in the previous NJ analysis.We tested two approaches for discrimination levels using the entire dataset for both regions and using only the pair C. loddigesii x C. walkeriana.In the first one, the degree of discrimination was successful when the minimum interspecific distance was larger than the maximum intraspecific distance, a similar approach proposed by Hollingsworth et al. (2009), but here we considered the K2P distance.Unfortunately, many species from GenBank were represented by a unique sequence, not representing the intraspecific variation.In the second one, the tree-based method (NJ) was only considered successful by the specific monophyletic groups for species for at least two specimens sequenced, and that showed bootstrap values ≥70% (as used by Zhang, Fan, Zhu, Zhao, &Fu, 2013 andVivas et al., 2014).Cladograms were analyzed and edited with MEGA6 (Tamura et al., 2013).

Results and discussion
On the specimens used in this study, the assessed DNA barcode regions were successfully amplified using primers designed to evaluate plants (Tokuoka & Tobe, 2006;Chen et al., 2010;Sharma et al., 2012).Most samples were successfully amplified through direct sequencing of the PCR products using the same primer pairs, which generated high-quality bidirectional sequences.This indicates that the primers used for each DNA barcode region in this study are universally applicable to the genotypes of C. walkeriana and C. loddigesii.The lengths of aligned DNA fragments of rpoC1 and ITS region were 518 and 678, respectively.ITS region provided a greater number of variable sites (29.8%) than rpoC1 (2.3%) and also a higher interspecific mean distance (5.4%) (Table 2).Both regions were analyzed separately, for all samples, and combined only for the pair C. loddigesii x C. walkeriana, demonstrating a great discrimination of this pair of species (Table 3).For all others, when the analysis was conducted on separated regions, rpoC1 showed low variability (overall mean 0.3%).Despite its lower variability, it is possible to distinguish C. walkeriana from C. loddigesii taking into account a SNP at positions 30 (G/C) and 66 (G/C) of the final alignment (Figure 1).
When compared with all public sequences, position 30 can be classified as partial diagnostic since it is possible to distinguish C. walkeriana, but not C. loddigesii, from all the other species except for C. violacea and C. nobilior.Also, position 66 is considered as diagnostic since it is possible to discriminate C. walkeriana from all the other species.On the other hand, ITS region was shown to be more efficient to discriminate species, with higher pairwise distances between them.These results were also discussed in the literature, with a lower variability being reported for rpoC1 (Hollingsworth et al., 2009) and greater variability for ITS region (overall mean distance 7.05% for the ITS region and 0.35% for rpoC1; Chen et al., 2010).But in our study, although little variation was found in the rpoC1 region, this region can indeed discriminate at least C. walkeriana from all other species.The NJ tree analysis for ITS region represented in Figure 2 clearly shows that C. walkeriana is distinguished from all other species, but C. loddigesii cannot be separated from 35.7% of the species tested (C. bicolor, C. granulosa, C. leopoldii, C. forbesii, C. porphyroglossa, C. elongata, C. tenuis, C. velutina, C. harrisoniana, C. schilleriana, C. kerrii, C. amethystoglossa, C. guttata, C. dormaniana and C. intermedia).
Furthermore, it is also possible to discriminate C. elongata, C. lueddemanniana, C. maxima, C. porphyroglossa and C. trichopiliochila from all other species.Similar results were found in Gossypium, as the ITS region is most suitable as a candidate DNA barcode for identification compared to the plastid regions, even between organisms of the same species (Ashfaq, Asif, Anjum, & Zafar, 2013).The plastid regions rpoC1 discriminate more than 60% while ITS regions discriminate more than 90% of species in land plants (Kress & Erickson, 2007).There are several works trying to identify combinations of universal genes that allow separation of plant species.According to Kress et al. (2009), using multilocus combinations rbcL + trnH-psbA + matK regions is useful for the identification in plants.The combination of genes rpoA, rpoB, rpoC1 and rpoC2 is proposed as a phylogenetic marker in systematic and molecular phylogeny of flowering plants (Logacheva, Penin, Samigullin, Vallejo-Roman, & Antonov, 2007).The result from concatenated regions (rpoC1+ITS), tested only for C. walkeriana x C. loddigesii pair, discriminated both species with 100% resolution (Figure 3), with a mean distance of 4.4% in our dataset (Table 3).Species resolution abilities of the DNA barcode regions and their combinations were proved through the methods tested, but the method used by Hollingsworth et al. (2009) returned the best results, reaching 82% based only on ITS region.High levels of species resolution for ITS and ITS2 have been reported in several previous plant barcode studies, but lower for rpcC1 (Chen et al., 2010;Selvaraj et al., 2012;Little, 2014).In Orchidaceae, Xiang et al. (2011) reported the use of the regions rbcL, matK, trnH-psbA, and ITS, to discriminate among species of the genus Holcoglossum (Orchidaceae: Aeridinae).These were successfully implemented in barcoding species of the orchid genus Dendrobium.For other genus of Orchidaceae from Korea, Kim et al. (2014) used another combination of DNA barcodes, based on four regions combined, and reached a 98.8% species resolution

Conclusion
Although some methods have been proposed for the Orchidaceae, our work showed that it is possible to discriminate between C. walkeriana and C. loddigesii based only on ITS region, but the inclusion of other high variable markers could be valuable to discriminate all of the other species, as found by Kim et al. (2014).Therefore, taking into account the current economic importance and conservation status of both species, such region provides a rapid identification method to differentiate the species C. walkeriana and C. loddigesii, with a great power of discrimination and precise identification of these two orchid species.
Plant material and genetic analysis of DNA barcode candidates Growers from the States of Minas Gerais (MG) and São Paulo (SP) supplied three C. walkeriana individuals, and one of C. loddigesii.Three other C. loddigesii individuals were sampled from the "Professor Paulo Sodero Martins" Orchid Collection of the Genetics Department (ESALQ/USP), Universidade de São Paulo, Piracicaba, São Paulo, Brazil (Table

Figure 1 .
Figure 1.Alignment detail of the rPOC1 gene.This alignment includes 35 species including public sequences.Notice that for C. walkeriana it is possible to distinguish it from all other species based on SNP (C/T) at position 66 (classified as diagnostic character) from 518bp alignment.Also, the SNP (G/C) at position 30 can be classified as partial diagnostic to distinguish C. walkeriana, since C. violacea and C. nobilior share this same SNP.

Figure 2 .Figure 3 .
Figure 2. NJ tree including public records for ITS region.Numbers above the branches represent bootstrap values (≥70%).Genbank accession numbers are listed with samples.

Table 1 .
List of Cattleya walkeriana and C. loddigesii individuals with their respective varieties and source/origin and GenBank accession numbers.

Table 2 .
Sequence characteristics of the regions tested.

Table 3 .
Summary from analysis indicating resolution of regions tested for Cattleya genus.
* See text for details, diagnostic character analysis incomplete sampling, only species with two or more sequences.Due to low variability of rpoC1 gene, this item was not accessed for this region.