Clustering#

class igua.clustering.ClusteringStrategy#

An abstract clustering strategy to cluster compositional data.

abstract cluster(X, weights=None)#

Cluster the given observations.

Parameters:

X (scipy.sparse.csr_matrix) – A matrix of shape \(m \times n\) with compositional data.
weights (numpy.ndarray or None) – The weights (of shape \(n\)) to use for computing distances. If None, use uniform weights.

Returns:

numpy.ndarray – A flat array of shape \(m\) which assigns an arbitrary cluster number to each observation of X (see scipy.cluster.hierarchy.fcluster documentation).

class igua.clustering.HierarchicalClustering(ClusteringStrategy)#

A clustering strategy implementing hierarchical clustering.

__init__(*, method, distance=0.8, precision='double', jobs=1)#

Create a new hierarchical clustering strategy.

Parameters:

method (str) – The name of the linkage method to use: average, single, complete, weighted, centroid, median or ward.
distance (float) – The distance cutoff for the flat clusters created with scipy.cluster.hierarchy.fcluster.
precision (str) – The floating-point precision to use: half, single or double. Note that changing precision (in particular with half-precision) may lead to differences in the generated clustering.
jobs (int) – The number of parallel threads to use to perform the pairwise distance computation.

cluster(X, weights=None)#

Cluster the given observations.

Parameters:

X (scipy.sparse.csr_matrix) – A matrix of shape \(m \times n\) with compositional data.
weights (numpy.ndarray or None) – The weights (of shape \(n\)) to use for computing distances. If None, use uniform weights.

Returns:

numpy.ndarray – A flat array of shape \(m\) which assigns an arbitrary cluster number to each observation of X (see scipy.cluster.hierarchy.fcluster documentation).

class igua.clustering.LinearClustering(ClusteringStrategy)#

A clustering strategy similar to MMseqs2 linear clustering.

__init__(*, distance=0.8)#

Create a new linear clustering strategy.

Parameters:: distance (float) – The distance cutoff to use for clustering observations together.

cluster(X, weights=None)#

Cluster the given observations.

Parameters:

X (scipy.sparse.csr_matrix) – A matrix of shape \(m \times n\) with compositional data.
weights (numpy.ndarray or None) – The weights (of shape \(n\)) to use for computing distances. If None, use uniform weights.

Returns:

numpy.ndarray – A flat array of shape \(m\) which assigns an arbitrary cluster number to each observation of X (see scipy.cluster.hierarchy.fcluster documentation).