How to Properly Interpret Error Bars

“The meaning of error bars is often misinterpreted, as is the statistical significance of their overlap.”

Jeffrey Boschman
One Minute Machine Learning

--

Krzywinski, M., Altman, N. Error bars. Nat Methods 10, 921–922 (2013). https://doi.org/10.1038/nmeth.2659

Given a figure comparing some mean values with error bars, the author is likely trying to do one of two things: 1. show the variability of observations in that sample (descriptive statistics), or 2. show how well the sample mean represents the population mean (inferential statistics; where the term “statistical significance” comes into play).

Unfortunately, people often forget this and misinterpret error bars — for example, people often assume that a gap between bars of two sample means implies a statistically significance difference (i.e. statistical significance, given a standard P value of 0.05, means that it is less than 5% likely that the difference between the points is only due to random chance), but this isn’t always true because it depends on what type of error bar is being used.

The second article from Nature Method’s Points of Significance, “Error Bars”, talks about the differences between three common types of error bars— standard deviation, standard error of the mean, and confidence interval.

Prerequisite info: Sampling and estimation

Standard Deviation

Error bars based on standard deviation (s.d.) only tell us about the spread of the population (i.e. what values new samples taken from the population might have; part of descriptive statistics). They do not technically indicate anything about the uncertainty of a mean or statistical significance. You can think of a sample s.d. as an estimate of the variability of observations.

Standard Error of the Mean

Error bars based on standard error of the mean (s.e.m.) do reflect the uncertainty of the mean (i.e. indicate the reliability of a measurement; part of inferential statistics). You can think of s.e.m. as an estimate of the variability of possible values of means of samples, as long as the sample size is large enough (beware of small sample sizes). With non-overlapping s.e.m. error bars, you cannot assume that the P value is less than 0.05 (i.e. there is a significant difference between the means); as you can see in Figure 3 (below) from the article, s.e.m. bars are not overlapping even at P = 0.1.

Confidence Intervals

Error bars based on confidence intervals (C.I.) also indicate the reliability of a measurement (i.e. part of inferential statistics) — they indicates a range that would capture the population mean CI% of the time (e.g. if you have a 95% C.I., then if you collect 20 samples and calculate their means and C.I. ranges, then 19 of the ranges would contain the true population mean). You can relate C.I. to s.e.m. using the t-statistic, and for large sample size the s.e.m. is approximately a 67% C.I. With non-overlapping error bars, you can probably assume that the P value is less than 0.05, but this is a bit overkill as 95% C.I. bars only start to overlap at around 0.005 (Figure 3 from the article).

Again, from Krzywinski, M., Altman, N. Error bars. Nat Methods 10, 921–922 (2013). https://doi.org/10.1038/nmeth.2659

Summary

You cannot get any information about statistical significance from s.d. error bars. You can draw conclusions about statistical significance from s.e.m. or 95% C.I. error bars, but whether or not they less than 0.05 p-value is not intuitive.

--

--

Jeffrey Boschman
One Minute Machine Learning

An endlessly curious grad student trying to build and share knowledge.