Python API Reference

KMeans clustering algorithm implemented in C.

class kmeans.KMeans(n_clusters, max_iter=100, tol=0.0001)[source]

Bases: object

K-Means clustering.

Parameters:

n_clusters (int) – The number of clusters to form.
max_iter (int, optional) – Maximum number of iterations (default: 100).
tol (float, optional) – Convergence tolerance (default: 1e-4).

__init__(n_clusters, max_iter=100, tol=0.0001)[source]

fit(X)[source]

Compute k-means clustering.

Parameters:: X (array-like of shape (n_samples, n_features)) – Training data.
Returns:: self – Fitted estimator.
Return type:: KMeans

fit_predict(X)[source]

Compute clustering and return cluster labels.

Parameters:: X (array-like of shape (n_samples, n_features)) – Training data.
Returns:: labels – Index of the cluster each sample belongs to.
Return type:: ndarray of shape (n_samples,)

predict(X)[source]

Predict the closest cluster for each sample.

Parameters:: X (array-like of shape (n_samples, n_features)) – New data to predict.
Returns:: labels – Index of the cluster each sample belongs to.
Return type:: ndarray of shape (n_samples,)

kmeans.kmeans(data, k, max_iterations=100, tolerance=0.0001)[source]

Perform k-means clustering on the given data.

Parameters:

data (array-like of shape (n_samples, n_features)) – The input data to cluster.
k (int) – The number of clusters.
max_iterations (int, optional) – Maximum number of iterations (default: 100).
tolerance (float, optional) – Convergence tolerance (default: 1e-4).

Returns:

centroids (ndarray of shape (k, n_features)) – The final cluster centroids.
labels (ndarray of shape (n_samples,)) – Index of the cluster each sample belongs to.

Functional API

kmeans.kmeans(data, k, max_iterations=100, tolerance=0.0001)[source]

Perform k-means clustering on the given data.

Parameters:

data (array-like of shape (n_samples, n_features)) – The input data to cluster.
k (int) – The number of clusters.
max_iterations (int, optional) – Maximum number of iterations (default: 100).
tolerance (float, optional) – Convergence tolerance (default: 1e-4).

Returns:

centroids (ndarray of shape (k, n_features)) – The final cluster centroids.
labels (ndarray of shape (n_samples,)) – Index of the cluster each sample belongs to.

Object-Oriented API

KMeans Class

class kmeans.KMeans(n_clusters, max_iter=100, tol=0.0001)[source]

Bases: object

K-Means clustering.

Parameters:

n_clusters (int) – The number of clusters to form.
max_iter (int, optional) – Maximum number of iterations (default: 100).
tol (float, optional) – Convergence tolerance (default: 1e-4).

__init__(n_clusters, max_iter=100, tol=0.0001)[source]

fit(X)[source]

Compute k-means clustering.

Parameters:: X (array-like of shape (n_samples, n_features)) – Training data.
Returns:: self – Fitted estimator.
Return type:: KMeans

predict(X)[source]

Predict the closest cluster for each sample.

Parameters:: X (array-like of shape (n_samples, n_features)) – New data to predict.
Returns:: labels – Index of the cluster each sample belongs to.
Return type:: ndarray of shape (n_samples,)

fit_predict(X)[source]

Compute clustering and return cluster labels.

Parameters:: X (array-like of shape (n_samples, n_features)) – Training data.
Returns:: labels – Index of the cluster each sample belongs to.
Return type:: ndarray of shape (n_samples,)

__init__(n_clusters, max_iter=100, tol=0.0001)[source]

fit(X)[source]

Compute k-means clustering.

Parameters:: X (array-like of shape (n_samples, n_features)) – Training data.
Returns:: self – Fitted estimator.
Return type:: KMeans

predict(X)[source]

Predict the closest cluster for each sample.

Parameters:: X (array-like of shape (n_samples, n_features)) – New data to predict.
Returns:: labels – Index of the cluster each sample belongs to.
Return type:: ndarray of shape (n_samples,)

fit_predict(X)[source]

Compute clustering and return cluster labels.

Parameters:: X (array-like of shape (n_samples, n_features)) – Training data.
Returns:: labels – Index of the cluster each sample belongs to.
Return type:: ndarray of shape (n_samples,)

Attributes

After calling fit(), the following attributes are available:

kmeans.centroids_

Type:: numpy.ndarray of shape (n_clusters, n_features)

Coordinates of cluster centers.

kmeans.labels_

Type:: numpy.ndarray of shape (n_samples,)

Labels of each point indicating cluster assignment.

C Extension Module

Note

The _kmeans module is a low-level C extension. Most users should use the high-level Python API instead.

kmeans._kmeans.fit(data, k, max_iterations, tolerance)

Low-level k-means fitting function.

Parameters:

data (numpy.ndarray) – Input data array (n_samples, n_features)
k (int) – Number of clusters
max_iterations (int) – Maximum iterations
tolerance (float) – Convergence tolerance

Returns:

Tuple of (centroids, labels)

Return type:

tuple[numpy.ndarray, numpy.ndarray]

kmeans._kmeans.predict(data, centroids)

Predict cluster labels for data points.

Parameters:

data (numpy.ndarray) – Input data array (n_samples, n_features)
centroids (numpy.ndarray) – Cluster centroids (k, n_features)

Returns:

Cluster labels

Return type:

numpy.ndarray

Examples

See Examples for more usage examples.