For #wisdomwendesdays I’ve been asked by someone trying to learn Data Science what the difference is between Variance and Standard Deviation in terms of purpose.
For the sake of the uninitiated, I’ll share an article that shows the simplest explanation of what Variance and Standard Deviation are.
Now back to the purpose of each, the easiest analogy I can give you is like how the Red and Blue Pills work in the movie – The Matrix.
Let me explain.
Standard Deviation = Blue Pill
Standard Deviation is easier to visualize in the real world because it is in the same unit of measure as the Mean or Average, the Median, the Minimum and Maximum values, and so forth.
So if your purpose is on how you’re going to visualize your data and plot it on a graph for you to make decisions based on what you see, then it would be better to use Standard Deviation.
Variance = Red Pill
Now if you want to do some higher-level analysis, only then will you appreciate the benefits of utilizing Variance, which is essentially Standard Deviation, squared.
An example I can give you is when you would want to combine algorithms or mathematical models to form a new one, you can easily combine Means, but not with Standard Deviation.
What you can do is square the Standard Deviations to get the Variances, and then combine the models.
After which if you would want to get the Standard Deviation of the resulting model in order for you to create a visual representation via a graph or chart, simply take the Square Root of the Variance.
Ending note
So what do you think? Do you agree? Can you think of any other situations where Standard Deviation or Variance is better to use than the other?