antiSMASH datasets#

class igua.dataset.antismash.AntiSMASHGenBankDataset(path, *, mode='region')#

A dataset composed of antiSMASH regions in a GenBank file.

antiSMASH reports regions in a GenBank file but not necessarily with one region per record. This class supports extracting regions independently and recovering the right cluster ID per region.

extract_clusters(progress)#

Extract the clusters from the dataset.

Parameters:

progress (rich.progress.Progress) – A Progress instance that can be used for tracking progress.

Yields:

Cluster – A cluster object for each gene cluster to be processed in the dataset.

extract_proteins(progress, cluster_ids)#

Extracts protein sequences from GenBank files.

Parameters:
Yields:

Protein – A protein object for each protein of the gene clusters to be processed in the dataset.

class igua.dataset.antismash.AntiSMASHZipDataset(path, *, mode='region')#

A dataset composed of antiSMASH results in a Zip file.

antiSMASH can be configured to report all results inside a Zip file. This class supports reading the antiSMASH-predicted regions from a Zip archive, including handling of region IDs, without requiring the archive to be decompressed.

extract_clusters(progress)#

Extract the clusters from the dataset.

Parameters:

progress (rich.progress.Progress) – A Progress instance that can be used for tracking progress.

Yields:

Cluster – A cluster object for each gene cluster to be processed in the dataset.

extract_proteins(progress, cluster_ids)#

Extracts protein sequences from GenBank files.

Parameters:
Yields:

Protein – A protein object for each protein of the gene clusters to be processed in the dataset.