K Means Cluster

K means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.

The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both k-means and Gaussian mixture modeling. They both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the Gaussian mixture model allows clusters to have different shapes.

Simple explanation

Define k = 3 clusters
Randomly initialize 3 cluster centers
Assign each point to the nearest cluster center based oneuclidean_distance
Calculate the mean value of all points assigned to each cluster
Update the cluster centers to the new means calculated in step 4
Repeat steps 3 to 5 until the cluster assignments do not change (convergence)

Javascript code

    const k = 3; // Number of clusters
 
  
 
function clusterPoints() {
 
  const clusterCenters = initializeCenters();
 
  while (true) {
 
    const assignments = assignPoints(clusterCenters);
 
    const newCenters = calculateCenters(assignments);
 
    if (areCentersEqual(clusterCenters, newCenters)) {
 
      return assignments;
 
    }
 
    clusterCenters = newCenters;    
 
  }
 
}
 
  
 
function initializeCenters() {
 
  return [
 
    { x: 0, y: 0, value: 0 },
 
    { x: 0, y: 0, value: 1 },
 
    { x: 0, y: 0, value: 2 }
 
  ];
 
}
 
  
 
function assignPoints(centers) {
 
  return points.map(point => {
 
     const closest = centers.reduce((a, c) => distance(a, point) < distance(c, point) ? a : c);
 
     return { ...point, cluster: closest.value };  
 
  });  
 
}
 
  
 
function calculateCenters(assignments) {
 
  const centers = [];
 
  for (let i = 0; i < k; i++) {
 
    const points = assignments.filter(a => a.cluster == i);      
 
    const { value: mean } = points.reduce(sumPoints)/points.length;
 
    centers.push({ value: mean });  
 
  }  
 
  return centers;      
 
}

For parsing GeoJson

const clusters = clusterPoints(points, 3) // Get 3 clusters
 
function clusterPoints(points, k) {
  // Initialize cluster centers 
  const centers = initializeCenters(points, k);
  
  while (true) {
    const assignments = assignPoints(points, centers);
    
    const newCenters = calculateCenters(assignments);
    
    if (areCentersEqual(centers, newCenters)) {  
      return convertToPolygons(assignments);     
    }
    
    centers = newCenters;    
  }
}
 
function convertToPolygons(assignments) {
  const clusters = [];
  
  for (let i = 0; i < k; i++) {    
    const points = assignments.filter(a => a.cluster == i); 
    
    // Generate GeoJSON polygon for points in cluster    
    const coordinates = points.map(p => p.geometry.coordinates);
    
    clusters.push({
      type: "Feature",
      geometry: {
        type: "Polygon",
        coordinates: [coordinates]        
      }      
    });
  }  
  
  return clusters;
}

euclidean_distance

function euclideanDistance(x1, y1, x2, y2) { 
return Math.sqrt((x2 - x1)**2 + (y2 - y1)**2); 
}

K Means cluster won’t work for wave height because I can’t give initial values, it may be a classification problem and not clustering problem.

Questions

What is the difference between K means clustering and K nearest neibours?
In this K Means clusters why do I need to keep recalculation the center?
CASE STUDY
- In Voyage Optimizer, for getting mean of wave height the calculation is stuck on recalculation of mean values. The threshold provided was 0.1 which may not be practical, had tried with other values by didn’t work.

Knowledge Base | Daily Notes

Explorer

K Means Cluster

Simple explanation

Javascript code

For parsing GeoJson

Questions

Graph View

Backlinks