antiSMASH datasets#
- class igua.dataset.antismash.AntiSMASHGenBankDataset(path, *, mode='region')#
A dataset composed of antiSMASH regions in a GenBank file.
antiSMASH reports regions in a GenBank file but not necessarily with one region per record. This class supports extracting regions independently and recovering the right cluster ID per region.
- extract_clusters(progress)#
Extract the clusters from the dataset.
- Parameters:
progress (
rich.progress.Progress) – AProgressinstance that can be used for tracking progress.- Yields:
Cluster– A cluster object for each gene cluster to be processed in the dataset.
- extract_proteins(progress, cluster_ids)#
Extracts protein sequences from GenBank files.
- Parameters:
progress (
rich.progress.Progress) – AProgressinstance that can be used for tracking progress.cluster_ids (
collections.abc.Collectionofstr) – A collection of cluster IDs from which to extract proteins.
- Yields:
Protein– A protein object for each protein of the gene clusters to be processed in the dataset.
- class igua.dataset.antismash.AntiSMASHZipDataset(path, *, mode='region')#
A dataset composed of antiSMASH results in a Zip file.
antiSMASH can be configured to report all results inside a Zip file. This class supports reading the antiSMASH-predicted regions from a Zip archive, including handling of region IDs, without requiring the archive to be decompressed.
- extract_clusters(progress)#
Extract the clusters from the dataset.
- Parameters:
progress (
rich.progress.Progress) – AProgressinstance that can be used for tracking progress.- Yields:
Cluster– A cluster object for each gene cluster to be processed in the dataset.
- extract_proteins(progress, cluster_ids)#
Extracts protein sequences from GenBank files.
- Parameters:
progress (
rich.progress.Progress) – AProgressinstance that can be used for tracking progress.cluster_ids (
collections.abc.Collectionofstr) – A collection of cluster IDs from which to extract proteins.
- Yields:
Protein– A protein object for each protein of the gene clusters to be processed in the dataset.