The 95th percentile is a way of describing, in a single value, a surprisingly large outcome for any quantity which can vary. As in ‘surprisingly large, but not astonishingly large’.
For example, heights vary across people. Consider adult UK women, who have a mean height of about 5’4’’ with a standard deviation of about 3’’. A woman who is 5’7’’ inches would be tall, and one who is 5’9’’ would be surprisingly tall. 5’9’’ is the 95th percentile for adult UK women. The thought experiment involves lining every adult UK woman up by height, from shortest to tallest, and walking along the line until you have passed 95% of all women, and then stopping. The height of the woman you are standing in front of is the 95th percentile of heights for adult UK women.
The formal definition of the 95th percentile is in terms of a probability distribution. Probabilities describe beliefs about uncertain quantities. It is a very deep question about what they represent, which I will not get into! I recommend Ian Hacking, ‘An introduction to probability and inductive logic’ (CUP, 2001), if you would like to know more. If H represents the height of someone selected at random from the population of adult UK women, then H is uncertain, and the 95th percentile of H is 5’9’’. Lest you think this is obvious and contradicts my point about probabilities being mysterious, let me point out the difficulty of defining the notion ‘selected at random’ without reference to probability, which would be tautological.
So the formal interpretation of the 95th percentile is only accessible after a philosophical discussion about what a probability distribution represents. In many contexts the philosophy does not really matter, because the 95th percentile is not really a precise quantity, but a conventional label representing the qualitative property ‘surprisingly large, but not astonishingly large’. If someone is insisting that only the 95th percentile will do, then they are advertising their willingness to have a long discussion about philosophy.
Blog post by Prof. Jonathan Rougier, Professor of Statistical Science.
First blog in the series here.
Second blog in series here.