Python API Reference

KMeans clustering algorithm implemented in C.

class kmeans.KMeans(n_clusters, max_iter=100, tol=0.0001)[source]

Bases: object

K-Means clustering.

Parameters:
  • n_clusters (int) – The number of clusters to form.

  • max_iter (int, optional) – Maximum number of iterations (default: 100).

  • tol (float, optional) – Convergence tolerance (default: 1e-4).

__init__(n_clusters, max_iter=100, tol=0.0001)[source]
fit(X)[source]

Compute k-means clustering.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.

Returns:

self – Fitted estimator.

Return type:

KMeans

fit_predict(X)[source]

Compute clustering and return cluster labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.

Returns:

labels – Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

predict(X)[source]

Predict the closest cluster for each sample.

Parameters:

X (array-like of shape (n_samples, n_features)) – New data to predict.

Returns:

labels – Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

kmeans.kmeans(data, k, max_iterations=100, tolerance=0.0001)[source]

Perform k-means clustering on the given data.

Parameters:
  • data (array-like of shape (n_samples, n_features)) – The input data to cluster.

  • k (int) – The number of clusters.

  • max_iterations (int, optional) – Maximum number of iterations (default: 100).

  • tolerance (float, optional) – Convergence tolerance (default: 1e-4).

Returns:

  • centroids (ndarray of shape (k, n_features)) – The final cluster centroids.

  • labels (ndarray of shape (n_samples,)) – Index of the cluster each sample belongs to.

Functional API

kmeans.kmeans(data, k, max_iterations=100, tolerance=0.0001)[source]

Perform k-means clustering on the given data.

Parameters:
  • data (array-like of shape (n_samples, n_features)) – The input data to cluster.

  • k (int) – The number of clusters.

  • max_iterations (int, optional) – Maximum number of iterations (default: 100).

  • tolerance (float, optional) – Convergence tolerance (default: 1e-4).

Returns:

  • centroids (ndarray of shape (k, n_features)) – The final cluster centroids.

  • labels (ndarray of shape (n_samples,)) – Index of the cluster each sample belongs to.

Object-Oriented API

KMeans Class

class kmeans.KMeans(n_clusters, max_iter=100, tol=0.0001)[source]

Bases: object

K-Means clustering.

Parameters:
  • n_clusters (int) – The number of clusters to form.

  • max_iter (int, optional) – Maximum number of iterations (default: 100).

  • tol (float, optional) – Convergence tolerance (default: 1e-4).

__init__(n_clusters, max_iter=100, tol=0.0001)[source]
fit(X)[source]

Compute k-means clustering.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.

Returns:

self – Fitted estimator.

Return type:

KMeans

predict(X)[source]

Predict the closest cluster for each sample.

Parameters:

X (array-like of shape (n_samples, n_features)) – New data to predict.

Returns:

labels – Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

fit_predict(X)[source]

Compute clustering and return cluster labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.

Returns:

labels – Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

__init__(n_clusters, max_iter=100, tol=0.0001)[source]
fit(X)[source]

Compute k-means clustering.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.

Returns:

self – Fitted estimator.

Return type:

KMeans

predict(X)[source]

Predict the closest cluster for each sample.

Parameters:

X (array-like of shape (n_samples, n_features)) – New data to predict.

Returns:

labels – Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

fit_predict(X)[source]

Compute clustering and return cluster labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.

Returns:

labels – Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

Attributes

After calling fit(), the following attributes are available:

kmeans.centroids_
Type:

numpy.ndarray of shape (n_clusters, n_features)

Coordinates of cluster centers.

kmeans.labels_
Type:

numpy.ndarray of shape (n_samples,)

Labels of each point indicating cluster assignment.

C Extension Module

Note

The _kmeans module is a low-level C extension. Most users should use the high-level Python API instead.

kmeans._kmeans.fit(data, k, max_iterations, tolerance)

Low-level k-means fitting function.

Parameters:
  • data (numpy.ndarray) – Input data array (n_samples, n_features)

  • k (int) – Number of clusters

  • max_iterations (int) – Maximum iterations

  • tolerance (float) – Convergence tolerance

Returns:

Tuple of (centroids, labels)

Return type:

tuple[numpy.ndarray, numpy.ndarray]

kmeans._kmeans.predict(data, centroids)

Predict cluster labels for data points.

Parameters:
  • data (numpy.ndarray) – Input data array (n_samples, n_features)

  • centroids (numpy.ndarray) – Cluster centroids (k, n_features)

Returns:

Cluster labels

Return type:

numpy.ndarray

Examples

See Examples for more usage examples.