BLASThit表格返回非重复结果的重复函数 _编程开发

BLASThit表格返回非重复结果的重复函数

创始人

2024-12-20 09:30:42

0次

在BLAST查询脚本中使用Python的collections库中的OrderedDict数据结构，以确保返回的BLAST结果不会重复：

from Bio.Blast.Applications import NcbiblastpCommandline
from Bio.Blast import NCBIXML
from collections import OrderedDict

blout_file = "output.blast"
fasta_file = "sequences.fasta"

blastp_cline = NcbiblastpCommandline(query=fasta_file, db="nr", outfmt=5, out=blout_file)
stdout, stderr = blastp_cline()

blast_records = NCBIXML.parse(open(blout_file))

unique_hits = {}

for blast_record in blast_records:
    seq_id = blast_record.query_id
    unique_hits[seq_id] = OrderedDict()
    for alignment in blast_record.alignments:
        hit_id = alignment.hit_id
        hit_len = alignment.length
        if hit_id not in unique_hits[seq_id]:
            unique_hits[seq_id][hit_id] = {"length": hit_len, "evalue": []}
        unique_hits[seq_id][hit_id]["evalue"].append(alignment.hsps[0].expect)

for query_id, hit_dict in unique_hits.items():
    for hit_id, values in hit_dict.items():
        print(query_id, hit_id, values["length"], min(values["evalue"]))

这个示例BLAST查询使用BLAST+软件包从“nr”数据库中查询多个DNA或蛋白质序列。现在，我们可以使用标准的Python集合和字典来存储BLAST结果，并使用collections库的OrderedDict确保唯一结果。最后，代码会打印所有查询序列的独特BLAST正常化结果。

上一篇：BLAS输入矩阵检查

下一篇：BLAST数据库错误：BLASTDB别名文件创建失败。一些引用的文件可能丢失。

BLASThit表格返回非重复结果的重复函数

相关内容

热门资讯