加速
计算机科学
聚类分析
冗余(工程)
数据挖掘
序列(生物学)
并行计算
机器学习
生物
操作系统
遗传学
作者
LiMin Fu,Beifang Niu,Zhengwei Zhu,Sitao Wu,Weizhong Li
出处
期刊:Bioinformatics
[Oxford University Press]
日期:2012-10-11
卷期号:28 (23): 3150-3152
被引量:11609
标识
DOI:10.1093/bioinformatics/bts565
摘要
SUMMARY: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. AVAILABILITY: http://cd-hit.org. CONTACT: liwz@sdsc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
科研通智能强力驱动
Strongly Powered by AbleSci AI