BUSCO: Assessing Genomic Data Quality and Beyond

基因组 计算机科学 工作流程 计算生物学 基因组学 生物 基因 遗传学 数据库
作者
Mosè Manni,Matthew Berkeley,Mathieu Seppey,Evgeny M. Zdobnov
出处
期刊:Current protocols [Wiley]
卷期号:1 (12) 被引量:716
标识
DOI:10.1002/cpz1.323
摘要

Evaluation of the quality of genomic "data products" such as genome assemblies or gene sets is of critical importance in order to recognize possible issues and correct them during the generation of new data. It is equally essential to guide subsequent or comparative analyses with existing data, as the correct interpretation of the results necessarily requires knowledge about the quality level and reliability of the inputs. Using datasets of near universal single-copy orthologs derived from OrthoDB, BUSCO can estimate the completeness and redundancy of genomic data by providing biologically meaningful metrics based on expected gene content. These can complement technical metrics such as contiguity measures (e.g., number of contigs/scaffolds, and N50 values). Here, we describe the use of the BUSCO tool suite to assess different data types that can range from genome assemblies of single isolates and assembled transcriptomes and annotated gene sets to metagenome-assembled genomes where the taxonomic origin of the species is unknown. BUSCO is the only tool capable of assessing all these types of sequences from both eukaryotic and prokaryotic species. The protocols detail the various BUSCO running modes and the novel workflows introduced in versions 4 and 5, including the batch analysis on multiple inputs, the auto-lineage workflow to run assessments without specifying a dataset, and a workflow for the evaluation of (large) eukaryotic genomes. The protocols further cover the BUSCO setup, guidelines to interpret the results, and BUSCO "plugin" workflows for performing common operations in genomics using BUSCO results, such as building phylogenomic trees and visualizing syntenies. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Assessing an input sequence with a BUSCO dataset specified manually Basic Protocol 2: Assessing an input sequence with a dataset automatically selected by BUSCO Basic Protocol 3: Assessing multiple inputs Alternate Protocol: Decreasing analysis runtime when assessing a large number of small genomes with BUSCO auto-lineage workflow and Snakemake Support Protocol 1: BUSCO setup Support Protocol 2: Visualizing BUSCO results Support Protocol 3: Building phylogenomic trees.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
耍酷的甜瓜完成签到,获得积分10
刚刚
1秒前
王顺发发布了新的文献求助10
1秒前
jackie完成签到,获得积分10
1秒前
ding应助Qinghen采纳,获得10
2秒前
2秒前
小蘑菇完成签到,获得积分10
2秒前
2秒前
xfwang完成签到,获得积分10
2秒前
初景发布了新的文献求助10
3秒前
3秒前
冯昊完成签到,获得积分10
3秒前
巨型肥猫完成签到 ,获得积分10
3秒前
华仔应助史书采纳,获得10
5秒前
Orange应助cyy采纳,获得10
6秒前
天天快乐应助制冷剂采纳,获得10
6秒前
共享精神应助L長様采纳,获得10
7秒前
无极微光应助温水采纳,获得20
8秒前
Hello应助心灵美的大叔采纳,获得10
8秒前
科研通AI6.1应助博儒艾特采纳,获得10
8秒前
麻麻薯完成签到 ,获得积分10
11秒前
11秒前
潘潘完成签到,获得积分10
12秒前
14秒前
碧蓝的碧完成签到,获得积分10
14秒前
科研通AI6.2应助小郭小郭采纳,获得10
14秒前
cys完成签到,获得积分10
15秒前
17秒前
cyy发布了新的文献求助10
18秒前
18秒前
小蘑菇应助蓝白采纳,获得10
18秒前
18秒前
内向秋烟完成签到,获得积分10
19秒前
陆小果完成签到,获得积分10
19秒前
CUSP完成签到,获得积分20
19秒前
Kkkkk发布了新的文献求助10
19秒前
iligll发布了新的文献求助10
20秒前
starfish发布了新的文献求助10
22秒前
余如龙完成签到,获得积分10
22秒前
Tang完成签到,获得积分10
22秒前
高分求助中
Adhesion Science: Principles & Practice 1234
Signals, Systems, and Signal Processing 610
Solution-State NMR of Lignocellulosic Biomass 400
Introduction to Cosmetic Formulation and Technology, 2nd Edition 400
Petrology and Plate Tectonics,2025 400
Burger's Medicinal Chemistry and Drug Discovery 400
A Step-by-Step Guide to Qualitative Data Coding 2nd Edition 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6692070
求助须知:如何正确求助?哪些是违规求助? 8435178
关于积分的说明 18022402
捐赠科研通 5920266
什么是DOI,文献DOI怎么找? 2985441
邀请新用户注册赠送积分活动 1961332
关于科研通互助平台的介绍 1900678