IGUA
#
Iterative Gene clUster Analysis, a fast and flexible gene cluster family delineation method.
Overview#
IGUA is a method for high-throughput content-agnostic identification of Gene Cluster Families (GCFs) from gene clusters of genomic and metagenomic origin. It performs three clustering iterations to perform GCF assignment:
Reduce the input sequence space by identifying which gene clusters are almost identical fragments from each other. Useful in combining datasets of metagenomic origin, or in combining results from various genome mining tools which may report overlapping gene clusters
Find similar gene clusters in genomic space, using linear clustering with lower sequence identity and reciprocical coverage. Effectively combine gene clusters with high conservation while taking into account possible evolutionary events like gene deletion or fusion.
Compute a numerical representation of gene clusters in terms of protein composition, using representatives from a protein sequence clustering, and perform hierarchical clustering. Identify more distant gene clusters which share the same proteins while not necessarily maintaining the same gene architecture.
Compared to similar methods such as BiG-SLiCE or BiG-SCAPE, IGUA does not use Pfam domains to represent gene cluster composition, using instead representatives from a fast unsupervised clustering computed with MMseqs2. This allows IGUA to accurately account for proteins that may not be covered by Pfam, and avoids performing a costly annotation step with HMMER. The resulting protein representatives can be later annotated indepently to transfer annotations to the GCFs.
Setup#
Have a look at the Installing page to find the different
ways to install IGUA. IGUA depends on some common scientific Python packages
(scipy, anndata, pandas) as well as a working MMseqs2 install.
Citation#
IGUA is scientific software, with a preprint currently available on bioRxiv.
Library#
License#
This library is provided under the GNU General Public License 3.0 or later. See the Copyright Notice section for more information including the complete license text.
This project was co-developed by: