What is the p-value? (In layman terms)

9/03/2018

For the stats novice like me understanding what the p-value is can be difficult. This is because when asked, professional statisticians tend to try to give a complete and accurate description of what the p-value is and how it is derived. For example, here is the definition from the American Statistical Association

In statistical hypothesis testing, the p-value or probability value or asymptotic significance is the probability for a given statistical model that, when the null hypothesis is true, the statistical summary (such as the sample mean difference between two compared groups) would be the same as or of greater magnitude than the actual observed results.

I also have heard descriptions that start with a example of a coin that is flipped 1000 times. At that point I go; "buckle up Tim, it's going to be a bumpy ride".

While these are very accurate descriptions of the p-value, as an engineer looking in from the outside into the stats world, I just want a simple definition that gives me some intuition as to what I'm looking at when I see a reported p-value.

So here we go:

To understand what the p-value is, you first need to understand what a null hypothesis is. When running a hypothesis test / experiment, the null hypothesis says that there is no difference or no change between the two tests. The alternate hypothesis is the opposite of the null hypothesis and states that there is a difference between the two tests. The goal of the experiment is usually to disprove the null hypothesis, and to prove/test the alternate hypothesis. Let me illustrate this with some examples:
If you are trying to test whether a new marketing campaign generates more revenue, the null hypothesis is that there is no change in the revenue as a result of the new marketing campaign. And the alternate hypothesis is that the new marketing campaign performs better (or worse) than the previous campaign. If you are trying to prove that a new drug lowers cholesterol, the null hypothesis states that there is no difference in cholesterol between the group with the drug and the group without. While the alternate hypothesis states that the new drug does have an effect on cholesterol levels. If you are trying to test whether a new server version has better or worse performance than the previous version, the null hypothesis is that both server versions have equal performance. And the alternate hypothesis is that there is a meaningful difference in the performance of the old and new server.

So what is the simple layman's definition of the p-value? The p-value is the probability that the null hypothesis is true. That's it! A high p-value indicates that the null hypothesis is likely true and a low p-value indicates that the null hypothesis is likely false.

In the example where we are trying to test whether a new marketing campaign generates more revenue; we can think of the p-value as the probability that the null hypothesis, which states that there is no change in the revenue as a result of the new marketing campaign, is true. If the value of the p-value is 0.25, then there is a 25% probability that there is no real increase or decrease in revenue as a result of the new marketing campaign. If the value of the p-value is 0.04 then there is a 4% probability that there is no real increase or decrease in revenue as a result of the new marketing campaign. As you can surmise, the lower the p-value, the more confident we are that the alternate hypothesis is true, which in this case means that the new marketing campaign causes an increase or decrease in revenue.

So what do p-values really tell us? p-values tell us whether an observation is as a result of a change that was made or is a result of random occurrences. In order to accept a test result we want the p-value to be low. How low you may ask? Well that depends on what standard you want to set / follow. In most fields acceptable p-values should be under 0.05 while in other fields a p-value of under 0.01 is required. This cut-off number is known in statistics as the alpha and results from experiments with p-values below this threshold are considered to be statistically significant. So when a result has a p-value of 0.05 or lower we can say that we are 95% confident that there is an actual difference between the two observations as opposed to just differences due to random variations. And as a result, we have reasonable grounds to support the alternate hypothesis and reject the null hypothesis.