5.9
CiteScore
5.9
Impact Factor
Turn off MathJax
Article Contents

GAEP: a comprehensive genome assembly evaluating pipeline

doi: 10.1016/j.jgg.2023.05.009
Funds:  We would like to thank all the people who contributed ideas and assistance to this study. This study was supported by the National Key Research and Development Project Program of China and the China Postdoctoral Science Foundation.
  • Received Date: 2023-05-09
  • Revised Date: 2023-05-19
  • Accepted Date: 2023-05-23
  • Available Online: 2023-05-26
  • With the rapid development of sequencing technologies, especially the maturity of third-generation sequencing technologies, there has been a significant increase in the number and quality of published genome assemblies. The emergence of these high-quality genomes has raised higher requirements for genome evaluation. Although numerous computational methods have been developed to evaluate assembly quality from various perspectives, the selective use of these evaluation methods can be arbitrary and inconvenient for fairly comparing the assembly quality. To address this issue, we have developed the Genome Assembly Evaluating Pipeline (GAEP), which provides a comprehensive assessment pipeline for evaluating genome quality from multiple perspectives, including continuity, completeness, and correctness. Additionally, GAEP includes new functions for detecting misassemblies and evaluating the assembly redundancy, which performs well in our testing. GAEP is publicly available at https://github.com/zy-optimistic/GAEP under the GPL3.0 License. With GAEP, users can quickly obtain accurate and reliable evaluation results, facilitating the comparison and selection of high-quality genome assemblies.
  • loading
  • [1]
    Badouin, H., Velt, A., Gindraud, F., Flutre, T., Dumas, V., Vautrin, S., Marande, W., Corbi, J., Sallet, E., Ganofsky, J., et al., 2020. The wild grape genome sequence provides insights into the transition from dioecy to hermaphroditism during grape domestication. Genome Biol. 21, 223.
    [2]
    Bao, E., Song, C., Lan, L., 2017. ReMILO: reference assisted misassembly detection algorithm using short and long reads. Bioinformatics 34, 24-32.
    [3]
    Bradnam, K.R., Fass, J.N., Alexandrov, A., Baranay, P., Bechner, M., Birol, I., Boisvert, S., Chapman, J.A., Chapuis, G., Chikhi, R., et al., 2013. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2, 10.
    [4]
    Chen, Y., Zhang, Y., Wang, A.Y., Gao, M., Chong, Z., 2021. Accurate long-read de novo assembly evaluation with Inspector. Genome Biol. 22, 312.
    [5]
    Du, H., Yu, Y., Ma, Y., Gao, Q., Cao, Y., Chen, Z., Ma, B., Qi, M., Li, Y., Zhao, X., et al., 2017. Sequencing and de novo assembly of a near complete indica rice genome. Nat. Commun. 8, 15324.
    [6]
    Gurevich, A., Saveliev, V., Vyahhi, N., Tesler, G., 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072-1075.
    [7]
    Hunt, M., Kikuchi, T., Sanders, M., Newbold, C., Berriman, M., Otto, T.D., 2013. REAPR: a universal tool for genome assembly evaluation. Genome Biol. 14, R47.
    [8]
    Khelik, K., Sandve, G.K., Nederbragt, A.J., Rognes, T., 2020. NucBreak: location of structural errors in a genome assembly by using paired-end Illumina reads. BMC Bioinformatics 21, 66.
    [9]
    Kim, D., Paggi, J.M., Park, C., Bennett, C., Salzberg, S.L., 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907-915.
    [10]
    Li, G., Wang, L., Yang, J., He, H., Jin, H., Li, X., Ren, T., Ren, Z., Li, F., Han, X., et al., 2021. A high-quality genome assembly highlights rye genomic characteristics and agronomically important genes. Nat. Genet. 53, 574-584.
    [11]
    Li, H., 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987-2993.
    [12]
    Li, H., 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. p. arXiv:1303.3997. https://doi.org/10.48550/arXiv.1303.3997.
    [13]
    Li, H., 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094-3100.
    [14]
    Liu, H., Wang, X., Wang, G., Cui, P., Wu, S., Ai, C., Hu, N., Li, A., He, B., Shao, X., et al., 2021. The nearly complete genome of Ginkgo biloba illuminates gymnosperm evolution. Nat. Plants 7, 748-756.
    [15]
    Liu, J., Seetharam, A.S., Chougule, K., Ou, S., Swentowsky, K.W., Gent, J.I., Llaca, V., Woodhouse, M.R., Manchanda, N., Presting, G.G., et al., 2020. Gapless assembly of maize chromosomes using long-read technologies. Genome Biol. 21, 121.
    [16]
    Lu, J., Pan, C., Fan, W., Liu, W., Zhao, H., Li, D., Wang, S., Hu, L., He, B., Qian, K.,et al., 2021. A Chromosome-level Assembly of A Wild Castor Genome Provides New Insights into the Adaptive Evolution in A Tropical Desert. Genomics Proteomics Bioinformatics. https://doi.org/10.1016/j.gpb.2021.04.003.
    [17]
    Lucas, S.J., Kahraman, K., Avsar, B., Buggs, R.J.A., Bilge, I., 2021. A chromosome-scale genome assembly of European hazel (Corylus avellana L.) reveals targets for crop improvement. Plant J. 105(5), 1413-1430.
    [18]
    Ma, H., Liu, Y., Liu, D., Sun, W., Liu, X., Wan, Y., Zhang, X., Zhang, R., Yun, Q., Wang, J., et al, 2021a. Chromosome-level genome assembly and population genetic analysis of a critically endangered rhododendron provide insights into its conservation. Plant J. 107, 1533-1545.
    [19]
    Ma, Z., Zhang, Y., Wu, L., Zhang, G., Sun, Z., Li, Z., Jiang, Y., Ke, H., Chen, B., Liu, Z., et al., 2021b. High-quality genome assembly and resequencing of modern cotton cultivars provide resources for crop improvement. Nat. Genet. 53, 1385-1391.
    [20]
    MacDonald, M.L., Lee, K.H., 2021. EvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality. BMC Bioinformatics 22, 570.
    [21]
    Manchanda, N., Portwood, J.L., Woodhouse, M.R., Seetharam, A.S., Lawrence-Dill, C.J., Andorf, C.M., Hufford, M.B., 2020. GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics 21, 193.
    [22]
    Mapleson, D., Garcia Accinelli, G., Kettleborough, G., Wright, J., Clavijo, B.J., 2017. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics 33, 574-576.
    [23]
    Miao, J., Feng, Q., Li, Y., Zhao, Q., Zhou, C., Lu, H., Fan, D., Yan, J., Lu, Y., Tian, Q., et al., 2021. Chromosome-scale assembly and analysis of biomass crop Miscanthus lutarioriparius genome. Nat. Commun. 12, 2458.
    [24]
    Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D., Gurevich, A., 2018. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics 34, i142-i150.
    [25]
    Muggli, M.D., Puglisi, S.J., Ronen, R., Boucher, C., 2015. Misassembly detection using paired-end sequence reads and optical mapping data. Bioinformatics 31, i80-i88.
    [26]
    Ono, Y., Asai, K., Hamada, M., 2020. PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores. Bioinformatics 37, 589-595.
    [27]
    Oppenheimer, J., Rosen, B.D., Heaton, M.P., Vander Ley, B.L., Shafer, W.R., Schuetze, F.T., Stroud, B., Kuehn, L.A., McClure, J.C., Barfield, J.P., et al., 2021. A Reference Genome Assembly of American Bison, Bison bison bison. J. Hered. 112, 174-183.
    [28]
    Ou, S., Chen, J., Jiang, N.J.N.a.r., 2018. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126-e126.
    [29]
    Parra, G., Bradnam, K., Korf, I., 2007. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061-1067.
    [30]
    Phillippy, A.M., Schatz, M.C., Pop, M., 2008. Genome assembly forensics: finding the elusive mis-assembly. Genome Biol. 9, R55.
    [31]
    Putnam, N.H., O'Connell, B.L., Stites, J.C., Rice, B.J., Blanchette, M., Calef, R., Troll, C.J., Fields, A., Hartley, P.D., Sugnet, C.W., et al., 2016. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342-350.
    [32]
    Qin, L., Hu, Y., Wang, J., Wang, X., Zhao, R., Shan, H., Li, K., Xu, P., Wu, H., Yan, X., et al., 2021a. Insights into angiosperm evolution, floral development and chemical biosynthesis from the Aristolochia fimbriata genome. Nat. Plants 7, 1239-1253.
    [33]
    Qin, P., Lu, H., Du, H., Wang, H., Chen, W., Chen, Z., He, Q., Ou, S., Zhang, H., Li, X., et al., 2021b. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184, 3542-3558.e3516.
    [34]
    Quinlan, A.R., Hall, I.M., 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842.
    [35]
    Rhie, A., McCarthy, S.A., Fedrigo, O., Damas, J., Formenti, G., Koren, S., Uliano-Silva, M., Chow, W., Fungtammasan, A., Kim, J., et al., 2021. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737-746.
    [36]
    Rhie, A., Walenz, B.P., Koren, S., Phillippy, A.M., 2020. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245.
    [37]
    Ruan, J., Li, H., 2020. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155-158.
    [38]
    Shang, L., Li, X., He, H., Yuan, Q., Song, Y., Wei, Z., Lin, H., Hu, M., Zhao, F., Zhang, C., et al., 2022. A super pan-genomic landscape of rice. Cell Res. 32, 878-896.
    [39]
    Shen, C., Du, H., Chen, Z., Lu, H., Zhu, F., Chen, H., Meng, X., Liu, Q., Liu, P., Zheng, L., et al., 2020. The Chromosome-Level Genome Sequence of the Autotetraploid Alfalfa and Resequencing of Core Germplasms Provide Genomic Resources for Alfalfa Research. Mol. plant 13, 1250-1261.
    [40]
    Simao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., Zdobnov, E.M., 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210-3212.
    [41]
    Song, J.M., Xie, W.Z., Wang, S., Guo, Y.X., Koo, D.H., Kudrna, D., Gong, C., Huang, Y., Feng, J.W., Zhang, W., et al., 2021. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14, 1757-1767.
    [42]
    Sun, Y., Shang, L., Zhu, Q.H., Fan, L., Guo, L., 2022. Twenty years of plant genome sequencing: achievements and challenges. Trends Plant Sci., 27, 391-401.
    [43]
    Treangen, T.J., Salzberg, S.L., 2011. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13, 36-46.
    [44]
    Walker, B.J., Abeel, T., Shea, T., Priest, M., Abouelliel, A., Sakthikumar, S., Cuomo, C.A., Zeng, Q., Wortman, J., Young, S.K., et al., 2014. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963.
    [45]
    Wang, B., Yang, X., Jia, Y., Xu, Y., Jia, P., Dang, N., Wang, S., Xu, T., Zhao, X., Gao, S., et al., 2021. High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads. Genomics, Proteomics & Bioinformatics 20, 4-13.
    [46]
    Xu, T., Li, Y., Zheng, W., Sun, Y., 2021. A chromosome-level genome assembly of the blackspotted croaker (Protonibea diacanthus). Aquaculture and Fisheries 7, 616-622.
    [47]
    Zhang, X., Chen, S., Shi, L., Gong, D., Zhang, S., Zhao, Q., Zhan, D., Vasseur, L., Wang, Y., Yu, J., et al., 2021. Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis. Nat. Genet. 53, 1250-1259.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (162) PDF downloads (35) Cited by ()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return