Journal of Natural Resources and Environmental Management 12. : 112-122. http://dx. org/10. 29244/jpsl. E-ISSN: 2460-5824 http://journal. id/index. php/jpsl Molecular taxonomy via DNA barcodes for species identification in selected genera of Fabaceae I Gusti Ayu Kusuma Wardania. Fitri Yola Amanditab. Carina Carneiro de Melo Mourac. Oliver Gailingc. Iskandar Z Siregara a Tropical Silviculture Study Program. Department of Silviculture. Faculty of Forestry and Environment. IPB University. IPB Dramaga Campus. Bogor, 16680. Indonesia b Center for Standardization of Environmental Quality Instrument. Ministry of Environment and Forestry. Building 210 Center for Science and Technology Research Area. South Tangerang, 15314. Indonesia c Forest Genetics and Tree Breeding. University of Gyttingen. Germany Article Info: Received: 21 - 07 - 2021 Accepted: 10 - 02 - 2022 Keywords: Biodiversity assessment. DNA barcode, fabaceae, matK, rbcL Corresponding Author: Iskandar Z Siregar Department of Silviculture. Faculty of Forestry and Environment. IPB University. Tel. 62-251-8626806, 8626886 Email: siregar@apps. Abstract. Fabaceae is an invaluable plant family with considerable ecological and economic importance, for example, as a food source, bio-fertilizer, and medicinal properties. However, several members of this family have been overexploited in Indonesia, thereby the existence of several species belonging to this family is critically endangered. Therefore, it is essential to support conservation efforts to ensure the overall survival of this plant family. provided a molecular survey of Fabaceae in converted landscapes of Indonesia through DNA barcoding and aimed to evaluate the effectiveness of core barcoding chloroplast markers matK, rbcL, and their combination . atK rbcL), as DNA barcodes for species identification in Fabaceae. generated DNA barcodes of matK and rbcL regions from 51 species belonging to 28 genera and 47 species belonging to 31 genera, respectively. The results showed that the highest accuracy level for species identification was at 90% with matK rbcL and 82. 05% with matK. Additionally, matK had the highest mean of interspecific and intraspecific distances at 0. 134 and 0. Furthermore, the most highly resolved phylogenetic tree was generated using the Neighbor-Joining method. Based on the overall performance, matK is superior compared to rbcL, and the use of combined matK rbcL barcodes is highly recommended for the selected genera in this How to cite (CSE Style 8th Editio. Wardani IGAK. Amandita FY. Moura CCDM. Gailing O. Siregar IZ. Molecular taxonomy via DNA barcodes for species identification in selected genera of Fabaceae. JPSL 12. : 112-122. http://dx. org/10. 29244/jpsl. INTRODUCTION Fabaceae is the third-largest family of flowering plants . after Orchidaceae and Asteraceae with 19 500 species and 750 genera (Christenhusz and Byng, 2016. Willis, 2. Botanists divided it into 6 sub-families, namely Caesalpinioideae . genera and 4 400 specie. Cercidoideae . genera and 335 specie. Detarioideae . genera and 760 specie. Dialioideae . genera and 85 specie. Duparquetioideae . genus and 1 specie. , and Faboideae or Papilionoideae . genera and 14 000 specie. (Gomes et al. Fabaceae include a large number of cultivated plants with high economic value, such as food crops, animal feed, ornamental plants, medicinal plants, timber and wood products (Graham and Vance, 2. Jurnal Pengelolaan Sumber Daya Alam dan Lingkungan 12. : 112-122 Globally. Fabaceae is found to be highly diverse in tropical regions (Yahara et al. , 2. In Indonesia, despite its wide utilization (Purwanto, 2012. Suherman and Herdiawan, 2015. Widodo et al. , 2019. Suwardi et al. , 2. , reports on Fabaceae distribution and diversity so far are very limited (Hariyati et al. , 2018. Putri and Dharmono, 2018. Mountara et al. , 2021. Suarna and Wijaya, 2. Five plants of Fabaceae in Indonesia, namely Pseudosindora palustris Symington. Archidendron royenii Kosterm. Sophora rubriflora P. Tsoong. Pterocarpus indicus Willd. , and Albizia rufa (Hassk. ) Benth. are listed as endangered species in the IUCN Red List . ttps://w. According to Budiharja et al. , habitat loss is the major cause of plant endangerment in Indonesia. Conservation efforts are urgently needed to prevent a further decrease of the species diversity within Fabaceae Family. This is influenced by the accuracy of species identification, which is carried out using conventional taxonomic methods and molecular techniques. However, many species are similar in morphological appearance, thereby making it difficult to distinguish between species. According to Elansary et al. , morphological identification is not effective, especially for complex taxonomic groups, such as Argyreia (Convolvulacea. (Traiperm et al. , 2. Cuscuta (Convolvulacea. (Park et al. , 2. Pulsatilla (Ranunculacea. (Li et al. , 2. , and Vicia (Fabacea. (Han et al. , 2. Moreover, morphological characters are influenced by the environment, as some reproductive traits are only seasonally available, making morphological species identification less specific in the absence of reproductive structures, affecting the accuracy of species identification (Hikmah et al. , 2. Therefore, the potential of molecular techniques needs to be explored for the proper identification of specimens belonging to Fabaceae. DNA barcoding is a molecular technique used to identify species using DNA code-based similarity in combination with morphological characters, which minimizes errors from conventional identification (Liu et , 2. It has the basic principle of identification using a short DNA sequence "barcode" from a standardized genome part of the specimen being studied (Hebert et al. , 2. The unknown barcode sequence is compared with known barcode reference sequences and identified as a specimen when the query sequence matches with the target sequence with a high percentage of identity and similarity (Lis et al. , 2. Meanwhile, it may reveal morphological misidentification or even allows for the identification of cryptic species (Hajibabaei et al. , 2. The Consortium for Barcode of Life (CBOL, 2. stated that plant identification generally uses chloroplast DNA maturase K . atK) and ribulose-1, 5-bisphosphate carboxylase oxygenase . bcL), as well as a combination of matK rbcL (Hollingsworth et al. , 2. Amandita et al. reported that the use of two plastid markers, matK and rbcL, is efficient in identifying flowering plants from the lowland rainforest of Sumatra to the genus level. Meanwhile, a study carried out by Gao et al. reported that the matK marker correctly identified approximately 80% and 96% of specimens at the species and genus level of Fabaceae. Saadullah et al. stated that the combination of matK rbcL markers is the best method for identifying 62 specimens from the Fabaceae Family originating from Pakistan. In addition to species identification. DNA barcoding is also useful to determine the species genetic relatedness by constructing a phylogenetic tree, a representation of evolutionary relationships in a group of organisms with a common ancestor (Ochieng et al. , 2007. Patwardhan et al. , 2. Hartvig et al. stated that the maximum parsimony and neighbor-joining methods were the best approaches for the genus Dalbergia. According to Saadullah et al. , neighbor-joining is an appropriate approach to identify specimens at the Fabaceae Family level. The use of DNA sequences in this study is aimed to investigate the ability of DNA barcodes matK and rbcL in identifying Fabaceae plant species, as well as to evaluate its accuracy level in reconstructing the phylogenetic relationship between the sampled species. Wardani IGAK. Amandita FY. Moura CCDM. Gailing O. Siregar IZ METHODS DNA Barcode Sequences A total of 43 matK sequences and 106 rbcL sequences were derived from the CRC990-EFForTS project in cooperation with IPB University (Bogor. Indonesi. Jambi University (Jambi. Indonesi. Tadulako University (Palu. Indonesi. , and University of Gyttingen (Gyttingen. German. as summarized in Table 1 (Amandita et al. , 2. Furthermore, 156 sequences of matK and 112 sequences of rbcL were obtained from the Barcode of Life System (Ratnasingham and Hebert, 2. database to increase the sample size and enhance species representation. Sequences of matK and rbcL originating from the same sample, as indicated by the sample ID, were concatenated (Vaidya et al. , 2. to form matK rbcL, resulting in total of 35 sequences. The overall data consisted of 123 species from 48 different genera of Fabaceae. Two species, namely Ceiba speciosa and Adansonia digitata of Malvaceae were selected and added to each matK, rbcL and matK rbcL dataset as an outgroup. Meanwhile, two species from the Polygalaceae Family, namely Monnina aestuans and Polygala chamaebuxus were also added as a sister group (Doyle et al. , 2. Table 1 DNA sequences of matK, rbcL, and matK rbcL used in the study NA of Sequences Marker NA of Species NA of Genus This Study BOLD matK rbcL Editing and Alignment Each sample's forward and reverse sequences were aligned using Codon Code Aligner Software . ttp://w. com/) and combined into a consensus sequence . Multiple alignments were performed using MEGA7 Software (Tamura et al. , 2. to determine the similarity level and align the bases among the contigs. Gaps . he sign "-") were added when necessary to align the bases and interpreted as deletions . issing nucleotide bases in DNA sequenc. (Christinawati et al. , 2. Changes to certain bases were made when differences between paired sequences from the same specimen were found by checking the chromatogram reading of the respective sequence in Codon Code Aligner and comparing to reference sequences of similar species from BOLD. Data Analysis The multiple alignment results were used for further analysis, namely identification suitability analysis, barcoding gap analysis, and phylogenetic analysis. The identification suitability analysis was carried out using the sequences obtained from the CRC990-EFForTS project only to compare the morphological identification by the affiliated taxonomist with the molecular identification using the Basic Local Alignment Search Tools (BLAST) in The National Center for Biotechnology Information (NCBI) . ttp://w. Porter and Hajibabaei, 2. The top BLAST result was taken as the best match for specimen identification when the similarity percentage was at least 80%. The identification suitability percentage was calculated for species, genus, and family level. A barcoding gap analysis was carried out after obtaining data of intraspecific and interspecific genetic distances using MEGA7 Software with Kimura 3 Parameter (Tamura et al. , 2. and ExcaliBAR (Alibadian et al. , 2. Barcoding gaps for each marker were visualized by generating distribution bar charts of intraspecific and interspecific distances using Microsoft Excel. ANOVA analysis and t-tests were also carried out using SPSS Software (Brady et al. , 2. to determine significant differences between intraspecific/interspecific distances. Jurnal Pengelolaan Sumber Daya Alam dan Lingkungan 12. : 112-122 The last analysis was conducted to evaluate the resolution of each phylogenetic tree reconstructed with the Maximum Parsimony (MP). Neighbor Joining (NJ), and Maximum Likelihood (ML) algorithms using MEGA7 Software with 1 000 bootstrap replicates. The bootstrap values were categorized as high . %), moderate . -85%), weak . -69%), or very weak (<50%) following Kress et al. The percentage of monophyletic clade formation of each tree was calculated at species and genus level. RESULTS AND DISCUSSION Comparison of Morphological and Molecular Identification The percentage of corresponding molecular and morphological identifications . dentification suitabilit. of the samples obtained from CRC990-EFForTS project is shown in Table 2 for individual and combined These samples were morphologically identified by comparing their herbarium with the LIPI herbarium collection. Table 2 Identification suitability percentage of each marker used for CRC990-EFForTS project samples Identification Suitability (%) matK rbcL Up to species level Up to genus level Up to family level Mislabelling The highest percentage was obtained at the species level for all the markers, in contrary to Gao et al. , which reported higher identification suitability at the genus level for Fabaceae. Molecular identification of Fabaceae species in this study performed better using matK compared to rbcL, and the use of multilocus matK rbcL improved the identification performance. Other similar studies (Kolondam et al. , 2012. Amandita et al. , 2019. Alasmari, 2. reported the superiority of matK compared to rbcL in terms of plant identification. Meanwhile, 3. 96% of molecular identification did not match the morphological identification at all, and was thus determined as mislabeling, meaning that the sample was probably mislabeled during the field collection or laboratory analysis. Barcoding Gap Analysis A barcoding gap analysis was performed to evaluate if the investigated markers were sufficiently diverse in order to discriminate between two different species. Table 3 shows that the average interspecific genetic distance of matK and rbcL is 0. 134 and 0. 047, respectively, which is significantly higher than the intraspecific genetic distance . 003 and 0. These figures are in accordance with Saadullah et al. , who reported the discriminatory power of matK and rbcL on 22 species of Fabaceae, as well as for other families, such as Myristicaceae (Newmaster et al. , 2. and Rosaceae (Pang et al. , 2. Moreover, the low resolution of rbcL compared to matK might be due to the low mutation rate of this gene, as reported by Frascaria-Lacoste et al. and Stenyien . Table 3 Average values of intraspecific and interspecific distances of each marker Intraspecific Distance Interspecific Distance T-test P Marker Value Range Mean (SD) Range Mean (SD) 003 . <0. 0001*** <0. 0001*** matK rbcL 001 . <0. 0001*** ***: significant Wardani IGAK. Amandita FY. Moura CCDM. Gailing O. Siregar IZ The one-way ANOVA shown in Table 4 indicates that the interspecific genetic distances were significantly different for the three markers tested, but this was not the case for intraspecific genetic distances, except for the matK and rbcL comparison. Furthermore, the intra- and interspecific genetic distances of matK rbcL were intermediate, as the properties of intra- and interspecific genetic distances acquired from matK and rbcL were compromising each other. Marker Comparison matK x rbcL matK x matK rbcL rbcL x matK rbcL Table 4 One-way ANOVA results for each marker Intraspecific Distance Interspecific Distance Mean Difference P value Mean Difference P value <0. <0. >0. <0. >0. <0. ns: not significant Despite the significant differences between the intra- and interspecific genetic distances of the investigated markers. Figure 1 shows that none of the markers used in this study revealed a clear barcoding The absence of a barcoding gap due to the overlap of intra- and interspecific genetic distances might indicate that the marker is not a suitable DNA barcode for the taxa in question. However, other factors such as sample size and taxonomical representation also influence the distribution of the intra and interspecific variation within the dataset (Meyer and Paulay, 2. A 100 B 250 Frequency of occurance Frequency of occurance Frequency of occurance Genetic distances Genetic distances Interspecific distance Interspecific distance Genetic distances Figure 1 Distribution of interspecific and intraspecific distances for markers (A) matK, (B) rbcL, and (C) matK rbcL Jurnal Pengelolaan Sumber Daya Alam dan Lingkungan 12. : 112-122 Species-Tree Inferences Phylogenetic Tree Reconstruction Phylogenetic trees are important tools to acquire information on biodiversity, genetic classification, and to study evolutionary relationships. In this study, nine phylogenetic trees were reconstructed based on the aligned sequences of matK, rbcL, and matK rbcL using Neighbor Joining. Maximum Parsimony, and Maximum Likelihood algorithms. Figures 2-4 show phylogenetic trees constructed using the Neighbor Joining approach as the best algorithm to provide highly resolved phylogenetic relationships in the Fabaceae Family, meanwhile the phylogenetic trees reconstructed using Maximum Parsimony and Maximum Likelihood are presented in Supplementary Material (Figures S1-S. Figure 2 Neighbor Joining tree of selected Fabaceae species based on matK data set, the clades highlighted represent the subfamilies: Cercidoideae. Detarioideae. Dialioideae. Caesalpinoideae. Mimosoideae. Faboideae (Node values represent bootstrap suppor. Wardani IGAK. Amandita FY. Moura CCDM. Gailing O. Siregar IZ Figure 3 Neighbor Joining tree of selected Fabaceae species based on rbcL data set, the clades highlighted represent the subfamilies: Cercidoideae. Detarioideae. Dialioideae. Caesalpinoideae. Mimosoideae. Faboideae (Node values represent bootstrap suppor. Jurnal Pengelolaan Sumber Daya Alam dan Lingkungan 12. : 112-122 Figure 4 Neighbor Joining tree of selected Fabaceae species based on the matK rbcL data set, the clades highlighted represent the subfamilies: Cercidoideae. Detarioideae. Dialioideae. Mimosoideae. and Faboideae (Subfamily Caesalpinoideae is not represented in the dataset, and Node values represent bootstrap suppor. A AugoodAy phylogenetic tree in biosystematics needs to be monophyletic, dichotomous, consistent, with high bootstrap value, shows no polytomies, and forms well-resolved clades. A monophyletic group originates from a single ancestor therefore, their members have similar traits, genetic patterns, and biochemistry (Rahayu and Jannah, 2. The topologies of the phylogenetic trees reconstructed based on matK and rbcL in this study were generally congruent, but there were some differences in the clade positions and bootstrap values. The resolution of the trees was evaluated based on the percentage of the monophyletic clades at species and genus level, as shown in Table 5. Table 5 Percentage of monophyletic clades in the phylogenetic trees Species level (%) Genus level (%) Algoritm Neighbor Joining Maximum Parsimony Maximum Likelihood Monophyletic clades with bootstrap values less than 0. 7 were excluded from the estimation as considered unreliable (Hillis and Bull, 1. Both matK and rbcL show high species-level resolution . -95%), meaning most of the species included in the dataset were resolved to be monophyletic clades with bootstrap values Wardani IGAK. Amandita FY. Moura CCDM. Gailing O. Siregar IZ higher than 0. The percentage of monophyletic clades in the matK rbcL phylogenetic trees was not calculated as the data set is relatively limited compared to matK and rbcL. However, the phylogenetic visualization of this combined marker confirmed the results based on the single markers. As an overview of the effectiveness of matK and rbcL as plant barcodes, this study showcased that these two plastid markers worked well in identifying plant species of Fabaceae, at least for the selected genera included, which are particularly important to expand the knowledge of Indonesian floral composition. CONCLUSION Molecular identification with DNA barcodes is effectively applied to the Fabaceae species with high accuracy by matK and matK rbcL compared to rbcL. Recommendations for the phylogenetic approach of Fabaceae Family are Neighbor Joining which is more informative in phylogenetic tree reconstruction. Future studies should include supplement markers, such as psbA-trnH or ITS/ITS2 in combination with matK and ACKNOWLEDGMENT The authors are grateful to the CRC990-EFForTS Program for supporting this study (Ecological and Socioeconomic Functions of Tropical Lowland Rainforest Transformation Systems, https://w. de/effort. including the ABS Research Fund 2021. This research was supported by the Deutsche Forschungsgemeinschaft (DFG. German Research Foundatio. -project ID 192626868-SFB 990 and the Ministry of Research. Technology and Higher Education (Ristekdikt. in the framework of the collaborative German-Indonesian research project CRC990. The authors are also grateful to the technical assistance of Alexandra Dolynska and Larissa Kunz for technical support in the laboratory experiments. REFERENCES