Clustering#
- class igua.clustering.ClusteringStrategy#
An abstract clustering strategy to cluster compositional data.
- abstract cluster(X, weights=None)#
Cluster the given observations.
- Parameters:
X (
scipy.sparse.csr_matrix) – A matrix of shape \(m \times n\) with compositional data.weights (
numpy.ndarrayorNone) – The weights (of shape \(n\)) to use for computing distances. IfNone, use uniform weights.
- Returns:
numpy.ndarray– A flat array of shape \(m\) which assigns an arbitrary cluster number to each observation ofX(seescipy.cluster.hierarchy.fclusterdocumentation).
- class igua.clustering.HierarchicalClustering(ClusteringStrategy)#
A clustering strategy implementing hierarchical clustering.
- __init__(*, method, distance=0.8, precision='double', jobs=1)#
Create a new hierarchical clustering strategy.
- Parameters:
method (
str) – The name of the linkage method to use: average, single, complete, weighted, centroid, median or ward.distance (
float) – The distance cutoff for the flat clusters created withscipy.cluster.hierarchy.fcluster.precision (
str) – The floating-point precision to use: half, single or double. Note that changing precision (in particular with half-precision) may lead to differences in the generated clustering.jobs (
int) – The number of parallel threads to use to perform the pairwise distance computation.
- cluster(X, weights=None)#
Cluster the given observations.
- Parameters:
X (
scipy.sparse.csr_matrix) – A matrix of shape \(m \times n\) with compositional data.weights (
numpy.ndarrayorNone) – The weights (of shape \(n\)) to use for computing distances. IfNone, use uniform weights.
- Returns:
numpy.ndarray– A flat array of shape \(m\) which assigns an arbitrary cluster number to each observation ofX(seescipy.cluster.hierarchy.fclusterdocumentation).
- class igua.clustering.LinearClustering(ClusteringStrategy)#
A clustering strategy similar to MMseqs2 linear clustering.
- __init__(*, distance=0.8)#
Create a new linear clustering strategy.
- Parameters:
distance (
float) – The distance cutoff to use for clustering observations together.
- cluster(X, weights=None)#
Cluster the given observations.
- Parameters:
X (
scipy.sparse.csr_matrix) – A matrix of shape \(m \times n\) with compositional data.weights (
numpy.ndarrayorNone) – The weights (of shape \(n\)) to use for computing distances. IfNone, use uniform weights.
- Returns:
numpy.ndarray– A flat array of shape \(m\) which assigns an arbitrary cluster number to each observation ofX(seescipy.cluster.hierarchy.fclusterdocumentation).