IGUA Stars#

_images/logo.png

Iterative Gene clUster Analysis, a fast and flexible gene cluster family delineation method.

Actions Coverage PyPI Bioconda AUR Wheel Versions Implementations License Source Mirror Issues Docs Changelog Downloads Preprint

Overview#

IGUA is a method for high-throughput content-agnostic identification of Gene Cluster Families (GCFs) from gene clusters of genomic and metagenomic origin. It performs three clustering iterations to perform GCF assignment:

1️⃣ Fragment mapping

Reduce the input sequence space by identifying which gene clusters are almost identical fragments from each other. Useful in combining datasets of metagenomic origin, or in combining results from various genome mining tools which may report overlapping gene clusters

2️⃣ Nucleotide deduplication

Find similar gene clusters in genomic space, using linear clustering with lower sequence identity and reciprocical coverage. Effectively combine gene clusters with high conservation while taking into account possible evolutionary events like gene deletion or fusion.

3️⃣ Protein representation

Compute a numerical representation of gene clusters in terms of protein composition, using representatives from a protein sequence clustering, and perform hierarchical clustering. Identify more distant gene clusters which share the same proteins while not necessarily maintaining the same gene architecture.

Compared to similar methods such as BiG-SLiCE or BiG-SCAPE, IGUA does not use Pfam domains to represent gene cluster composition, using instead representatives from a fast unsupervised clustering computed with MMseqs2. This allows IGUA to accurately account for proteins that may not be covered by Pfam, and avoids performing a costly annotation step with HMMER. The resulting protein representatives can be later annotated indepently to transfer annotations to the GCFs.

Setup#

Have a look at the Installing page to find the different ways to install IGUA. IGUA depends on some common scientific Python packages (scipy, anndata, pandas) as well as a working MMseqs2 install.

Citation#

IGUA is scientific software, with a preprint currently available on bioRxiv.

Library#

License#

This library is provided under the GNU General Public License 3.0 or later. See the Copyright Notice section for more information including the complete license text.

This project was co-developed by:

_images/lumc.png
Martin Larralde
PhD Candidate
Georg Zeller
Associate Professor
_images/umu.png
Laura M. Carroll
Assistant Professor
Josefin Blom
PhD Student
Hadrien Gourlé
Postdoctoral Fellow
_images/epfl.svg
Hale-Seda Radoykova
Doctoral Assistant
Lucas Paoli
Associate Professor