K means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.
The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both k-means and Gaussian mixture modeling. They both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the Gaussian mixture model allows clusters to have different shapes.
Simple explanation
- Define k = 3 clusters
- Randomly initialize 3 cluster centers
- Assign each point to the nearest cluster center based oneuclidean_distance
- Calculate the mean value of all points assigned to each cluster
- Update the cluster centers to the new means calculated in step 4
- Repeat steps 3 to 5 until the cluster assignments do not change (convergence)
Javascript code
const k = 3; // Number of clusters
function clusterPoints() {
const clusterCenters = initializeCenters();
while (true) {
const assignments = assignPoints(clusterCenters);
const newCenters = calculateCenters(assignments);
if (areCentersEqual(clusterCenters, newCenters)) {
return assignments;
}
clusterCenters = newCenters;
}
}
function initializeCenters() {
return [
{ x: 0, y: 0, value: 0 },
{ x: 0, y: 0, value: 1 },
{ x: 0, y: 0, value: 2 }
];
}
function assignPoints(centers) {
return points.map(point => {
const closest = centers.reduce((a, c) => distance(a, point) < distance(c, point) ? a : c);
return { ...point, cluster: closest.value };
});
}
function calculateCenters(assignments) {
const centers = [];
for (let i = 0; i < k; i++) {
const points = assignments.filter(a => a.cluster == i);
const { value: mean } = points.reduce(sumPoints)/points.length;
centers.push({ value: mean });
}
return centers;
}For parsing GeoJson
const clusters = clusterPoints(points, 3) // Get 3 clusters
function clusterPoints(points, k) {
// Initialize cluster centers
const centers = initializeCenters(points, k);
while (true) {
const assignments = assignPoints(points, centers);
const newCenters = calculateCenters(assignments);
if (areCentersEqual(centers, newCenters)) {
return convertToPolygons(assignments);
}
centers = newCenters;
}
}
function convertToPolygons(assignments) {
const clusters = [];
for (let i = 0; i < k; i++) {
const points = assignments.filter(a => a.cluster == i);
// Generate GeoJSON polygon for points in cluster
const coordinates = points.map(p => p.geometry.coordinates);
clusters.push({
type: "Feature",
geometry: {
type: "Polygon",
coordinates: [coordinates]
}
});
}
return clusters;
}function euclideanDistance(x1, y1, x2, y2) {
return Math.sqrt((x2 - x1)**2 + (y2 - y1)**2);
}K Means cluster won’t work for wave height because I can’t give initial values, it may be a classification problem and not clustering problem.
Questions
- What is the difference between K means clustering and K nearest neibours?
- In this K Means clusters why do I need to keep recalculation the center?
- CASE STUDY
- In Voyage Optimizer, for getting mean of wave height the calculation is stuck on recalculation of mean values. The threshold provided was 0.1 which may not be practical, had tried with other values by didn’t work.