Clustering#

class igua.clustering.ClusteringStrategy#

An abstract clustering strategy to cluster compositional data.

abstract cluster(X, weights=None)#

Cluster the given observations.

Parameters:
Returns:

numpy.ndarray – A flat array of shape \(m\) which assigns an arbitrary cluster number to each observation of X (see scipy.cluster.hierarchy.fcluster documentation).

class igua.clustering.HierarchicalClustering(ClusteringStrategy)#

A clustering strategy implementing hierarchical clustering.

__init__(*, method, distance=0.8, precision='double', jobs=1)#

Create a new hierarchical clustering strategy.

Parameters:
  • method (str) – The name of the linkage method to use: average, single, complete, weighted, centroid, median or ward.

  • distance (float) – The distance cutoff for the flat clusters created with scipy.cluster.hierarchy.fcluster.

  • precision (str) – The floating-point precision to use: half, single or double. Note that changing precision (in particular with half-precision) may lead to differences in the generated clustering.

  • jobs (int) – The number of parallel threads to use to perform the pairwise distance computation.

cluster(X, weights=None)#

Cluster the given observations.

Parameters:
Returns:

numpy.ndarray – A flat array of shape \(m\) which assigns an arbitrary cluster number to each observation of X (see scipy.cluster.hierarchy.fcluster documentation).

class igua.clustering.LinearClustering(ClusteringStrategy)#

A clustering strategy similar to MMseqs2 linear clustering.

__init__(*, distance=0.8)#

Create a new linear clustering strategy.

Parameters:

distance (float) – The distance cutoff to use for clustering observations together.

cluster(X, weights=None)#

Cluster the given observations.

Parameters:
Returns:

numpy.ndarray – A flat array of shape \(m\) which assigns an arbitrary cluster number to each observation of X (see scipy.cluster.hierarchy.fcluster documentation).