Sorting a graveyard of bones

Dividing skeletons with elbows

A lot of bones in the human body have funny shapes and names.

There’s even a bone specifically called the funny bone.

But if I gave you the skeletal remains of 100 deceased humans and asked you to sort them:

How would you sort it?

Sounds strange even being asked that, but all of your bones serve similar functions.

Some of them protect key organs, let you move around, and support your layer of flesh.

But how many groups of bones should you make?

Most of your bones protect you, but some are completely useless.

There’s no perfect number to fit all the types of bones, but there has to be an ideal number for this.

So how do we find the ideal number?

The Elbow Method.

We are essentially trying to divide a dataset into groups based on similarities in features.

We do this with K-means clustering, a technique for exactly what we want with the example of bones above.

The Elbow Method helps us determine the optimal number of clusters (k) in K-means clustering.

This is done by varying the number of clusters and calculating the Within-Cluster Sum of Squares (WCSS) for each.

What is WCSS?

Each cluster has a centroid and a bunch of points around it.

You take the distance between each point and the centroid and square it.

Then you take the sum of all the squared distances.

That’s WCSS.

When you plot WCSS against k, the plot looks like an elbow.

As k increases, WCSS will decrease, with the largest value of WCSS when k=1.

The more groups (k) you have, the more specific each group gets, meaning every point is more and more like the centroid, which decreases the WCSS.

The optimal number of clusters (k) is determined by finding the point where the graph starts to look like a straight line.

In other words, the rate of decrease in WCSS starts to slow down.

This point (the elbow) has a needed balance of compactness and separation of clusters.

In short, the Elbow Method helps us determine the optimal number of clusters (k) in K-means clustering by analyzing the relationship between k and WCSS.

This also finds a good balance between compactness and separation of clusters.

Tweet of the week

Let’s deep dive into a chart that is underused but can yield great results.

Yes! It’s the Violin chart which is a hybrid of Boxplot and Kernel Density Estimation.

Join the conversation

or to participate.