Scaling up probabilities in space

Suppose you have some location or small area, call it location A, and you have decided for this location the 1-in-100 year event for some magnitude in that area is ‘x’. That is to say, the probability of an event with magnitude exceeding ‘x’ in the next year at location A is 1/100. For clarity, I would rather state the exact definition, rather than say ‘1-in-100 year event’.

Now suppose you have a second location, call it location B, and you are worried about an event exceeding ‘x’ in the next year at either location A or location B. For simplicity suppose that ‘x’ is the 1-in-100 year event at location B as well, and suppose also that the magnitude of events at the two locations are probabilistically independent. In this case “an event exceeding ‘x’ in the next year at either A or B” is the logical complement of “no event exceeding ‘x’ in the next year at A, AND no event exceeding ‘x’ in the next year at B”; in logic this is known as De Morgan’s Law. This gives us the result:

Pr(an event exceeding ‘x’ in the next year at either A or B) = 1 – (1 – 1/100) * (1 – 1/100).

This argument generalises to any number of locations. Suppose our locations are numbered from 1 up to n, and let ‘p_i’ be the probability that the magnitude exceeds some threshold ‘x’ in the next year at location i. I will write ‘somewhere’ for ‘somewhere in the union of the n locations’. Then, assuming probabilistic independence as before,

Pr(an event exceeding ‘x’ in the next year somewhere) = 1 – (1 – p_1) * … * (1 – p_n).

If the sum of all of the p_i’s is less than about 0.1, then there is a good approximation to this value, namely

Pr(an event exceeding ‘x’ in the next year somewhere) = p_1 + … + p_n, approximately.

But don’t use this approximation if the result is more than about 0.1, use the proper formula instead.

One thing to remember is that if ‘x’ is the 1-in-100 year event for a single location, it is NOT the 1-in-100 year event for two or more locations.  Suppose that you have ten locations, and x is the 1-in-100 year event for each location, and assume probabilistic independence as before.  Then the probability of an event exceeding ‘x’ in the next year somewhere is 1/10. In other words, ‘x’ is the 1-in-10 year event over the union of the ten locations. Conversely, if you want the 1-in-100 year event over the union of the ten locations then you need to find the 1-in-1000 year event at an individual location.

These calculations all assumed that the magnitudes were probabilistically independent across locations. This was for simplicity: the probability calculus tells us exactly how to compute the probability of an event exceeding ‘x’ in the next year somewhere, for any joint distribution of the magnitudes at the locations. This is more complicated: ask your friendly statistician (who will tell you about the awesome inclusion/exclusion formula). The basic message doesn’t change, though. The probability of exceeding ‘x’ somewhere depends on the number of locations you are considering. Or, in terms of areas, the probability of exceeding ‘x’ somewhere depends on the size of the region you are considering.

Blog post by Prof. Jonathan Rougier, Professor of Statistical Science.

First blog in series here.

Second blog in series here.

Third blog in series here.

Fourth blog in series here.

What is Probability?

The paradox of probability

Probability is a quantification of uncertainty. We use probability words in our everyday discourse: impossible, very unlikely, 50:50, likely, 95% certain, almost certain, certain. This suggests a shared understanding of what probability is, and yet it has proved very hard to operationalise probability in a way that is widely accepted.

Uncertainty is subjective

Uncertainty is a property of the mind, and varies between people, according to their learning and experiences, way of thinking, disposition, and mood. Were we being scrupulous we would always say “my probability” or “your probability” but never “the probability”. When we use “the”, it is sometimes justified by convention, in situations of symmetry: tossing a coin, rolling a dice, drawing cards from a pack, balls from a lottery machine. This convention is wrong, but useful — were we to inspect a coin, a dice, a pack of cards, or a lottery machine, we would discover asymmetry.

Agreement about symmetry is an example of a wider phenomenon, namely consensus. If well-informed people agree on a probability, then we might say “the probability”. Probabilities in public discourse are often of this form, for example the IPCC’s “extremely likely” (at least 95% certain) that human activities are the main cause of global warming since the 1950s. Stated probabilities can never be defended as ‘objective’, because they are not. They are defensible when they represent a consensus of well-informed people. People wanting to disparage this type of stated probability will attack the notion of consensus amongst well-informed people, often by setting absurdly high standards for what we mean by ‘consensus’, closer to ‘unanimity’.

Abstraction in mathematics

Probability is a very good example of the development of abstraction in mathematics. Early writers on probability in the 17th century based their calculations strongly on their intuition. By the 19th century mathematicians were discovering that intuition was not good guide to the further development of their subject. Into the 20th century mathematics was increasingly defined by mathematicians as ‘the manipulation of symbols according to rules’, which is the modern definition. What was surprising and gratifying is that mathematical abstraction continued (and continues) to be useful in reasoning about the world. This is known as “the unreasonable effectiveness of mathematics”.

The abstract theory of probability was finally defined by the great 20th century mathematician Andrey Kolmogorov, in 1933: the recency of this date showing how difficult this was. Kolmogorov’s definition paid no heed at all to what ‘probability’ meant; only the rules for how probabilities behaved were important. Stripped to their essentials, these rules are:

1. If A is a proposition, then Pr(A) >= 0.
2. If A is certainly true, then Pr(A) = 1.
3. If A and B are mutually exclusive (i.e. they cannot both be true), then Pr(A or B) = Pr(A) + Pr(B).

The formal definition is based on advanced mathematical concepts that you might learn in the final year of a maths degree at a top university.

‘Probability theory’ is the study of functions ‘Pr’ which have the three properties listed above. Probability theorists are under no obligations to provide a meaning for ‘Pr’. This obligation falls in particular to applied statisticians (also physicists, computer scientists, and philosophers), who would like to use probability to make useful statements about the world.

Probability and betting

There are several interpretations of probability. Out of these, one interpretation has emerged to be both subjective and generic: probability is your fair price for a bet. If A is a proposition, then Pr(A) is the amount you would pay, in £, for a bet which pays £0 if A turns out to be false, and £1 if A turns out to be true. Under this interpretation rules 1 and 2 are implied by the reasonable preference for not losing money. Rule 3 is also implied by the same preference, although the proof is arcane, compared to simple betting. The overall theorem is called the Dutch Book Theorem: if probabilities are your fair prices for bets, then your bookmaker cannot make you a sure loser if and only if your probabilities obey the three rules.

This interpretation is at once liberating and threatening. It is liberating because it avoids the difficulties of other interpretations, and emphasises what we know to be true, that uncertainty is a property of the mind, and varies from person to person. It is threatening because it does not seem very scientific — betting being rather trivial — and because it does not conform to the way that scientists often use probabilities, although it does conform quite closely to the vernacular use of probabilities. Many scientists will deny that their probability is their fair price for a bet, although they will be hard-pressed to explain what it is, if not.

Blog post by Prof. Jonathan Rougier, Professor of Statistical Science.

First blog in series here.


Second blog in series here

Third blog in series here.

Model uncertainties in multispecies ecological models

We live in an increasingly uncertain world.  Therefore, when we model environmental processes of interest, it is vital to account for the inherent uncertainties in our analyses and ensure that this information is communicated to relevant parties.  Whilst the use of complex statistical models to estimate quantities of interest is becoming increasingly common in environmental sciences, one aspect of uncertainty that is frequently overlooked is that of model uncertainty.  Much of the research I conduct considers this additional aspect of uncertainty quantification; that is not just uncertainty in the quantities of interest, but also in the models that we use to estimate them.

An example of this is in a paper recently published in Ecology and Evolution (Swallow et al., 2016), which looks at how different species of birds that we commonly see in our gardens respond to the same environmental factors (or covariates).  Some of the species have declined rapidly over the past 40 years, whilst others have remained stable or even increased in number.  Possible drivers of these changes that have been suggested include increases in predators, changes in climate and availability of natural food sources.  Statistically speaking, we try to understand and quantify changes in observed numbers of birds by relating them to changes in measured environmental quantities that the birds will be subjected to, such as numbers of predators, weather variables, habitat quality etc.  Most previous analyses have modelled each of the species observed at many different geographical locations (or monitoring sites) independently of each other, and estimated the quantities of interest completely separately, despite the fact that all these species share the same environment and are subject to the same external influences.  So how do we go about accounting for the fact that similar species may share similar population drivers?

This essentially constitutes a model uncertainty problem – that is, which parameters should be shared across which species in our statistical model and which parameters should be distinct?

If we were to consider two different species and use two different environmental factors to explain changes in those species, say habitat type and average monthly temperature, there are four possible models to consider.  That is,

Model
Habitat type
Temperature
No parameters
1
Shared
Shared
2
2
Distinct
Shared
3
3
Shared
Distinct
3
4
Distinct
Distinct
4

This can easily be extended to a higher number of species and covariates.

There is also inevitably going to be some aspects of variability shown by some of the species that we cannot account for through the quantities we have measured.  We account for this using site-specific random effects, which explain variability that is linked to a specific monitoring site, but which is not accounted for by the environmental covariates in the model.  Again, we would usually assume this is a single quantity representing the discrepancy between what we have accounted for using our measured covariates and what is ‘left over’.  Following on from work of previous authors (Lahoz-Monfort et al., 2011), we again split this unexplained variation into two – unexplained variation that is common to all species and unexplained variation that is specific to a single species.  The ratio of these two quantities can give us a good idea of what measurements we may be missing.  Is it additional environmental factors that are wide-ranging in their effects or is it something relating to the specific ecology of an individual species?

In the paper, we apply our method to a large dataset spanning nearly 40 years, collected as part of the British Trust for Ornithology’s Garden Bird Feeding Survey.  We selected two groups of similar species commonly found in UK gardens during the winter.  For ecological reasons, we would expect the species within the two groups to show similar traits, so they act as ideal study species for detecting synchrony in responses to environmental factors.  Whilst most the results were consistent with those from single-species models (e.g. Swallow et al., 2015), studying the species at an ecosystem level also highlighted some additional relationships that it would be impossible to study under more simplistic models.  The results highlight that there is unsurprisingly a large degree of synchrony across many of these species, and that they share many of the traits and drivers of population change.  The synchronies observed in the results corresponded to both significant positive or negative relationships with covariates, as well as those species that collectively show no strong relationship with a given environmental factor.  There is, however, more to the story and some of the species showed strong differences in how they respond to external factors.  Highlighting these differences may offer important information on how best to halt or reverse population declines.

The results from our analyses showed the importance of considering model uncertainty in statistical analyses of this type, and that by incorporating relevant uncertainties, we can improve our understanding of the environmental processes of interest.  Incorporating more data into the analysis will help in further constraining common or shared parameters and reduce uncertainties in them.  It also allows us to guide and improve future data collection procedures if we can gain a better understanding of what is currently missing from our model.

Blog written by Dr Ben Swallow, a Postdoctoral Research Associate, studying Ecological and environmental statistics in the School of Chemistry.






References

Lahoz-Monfort, J. J., Morgan, B. J. T., Harris, M. P., Wanless, S., & Freeman, S. N. (2011). A capture-recapture model for exploring multi-species synchrony in survival. Methods in Ecology and Evolution, 2(1), 116–124.

Swallow, B., Buckland, S. T., King, R. and Toms, M. P. (2015). Bayesian hierarchical modelling of continuous non-negative longitudinal data with a spike at zero: An application to a study of birds visiting gardens in winter. Biometrical Journal, 58(2), 357–371

Swallow, B., King, R., Buckland, S. T. and Toms, M. P. (2016). Identifying multispecies synchrony in response to environmental covariates. Ecology and Evolution, 6(23), 8515–8525

Figure 1. Blue tits show a highly synchronous response with great tits, and to a lesser degree coal tits, to their surrounding environment.

 

Figure 2. Male house sparrow feeding on fat balls.  Whilst they show some synchrony in their response to environmental factors, they appear to be subject to a differing ecology to the other two species they were compared with.

The 95th percentile

The 95th percentile is a way of describing, in a single value, a surprisingly large outcome for any quantity which can vary.  As in ‘surprisingly large, but not astonishingly large’.

For example, heights vary across people.  Consider adult UK women, who have a mean height of about 5’4’’ with a standard deviation of about 3’’. A woman who is 5’7’’ inches would be tall, and one who is 5’9’’ would be surprisingly tall.  5’9’’ is the 95th percentile for adult UK women.  The thought experiment involves lining every adult UK woman up by height, from shortest to tallest, and walking along the line until you have passed 95% of all women, and then stopping.  The height of the woman you are standing in front of is the 95th percentile of heights for adult UK women.

The formal definition of the 95th percentile is in terms of a probability distribution.  Probabilities describe beliefs about uncertain quantities.  It is a very deep question about what they represent, which I will not get into!  I recommend Ian Hacking, ‘An introduction to probability and inductive logic’ (CUP, 2001), if you would like to know more.  If H represents the height of someone selected at random from the population of adult UK women, then H is uncertain, and the 95th percentile of H is 5’9’’.  Lest you think this is obvious and contradicts my point about probabilities being mysterious, let me point out the difficulty of defining the notion ‘selected at random’ without reference to probability, which would be tautological.

So the formal interpretation of the 95th percentile is only accessible after a philosophical discussion about what a probability distribution represents.  In many contexts the philosophy does not really matter, because the 95th percentile is not really a precise quantity, but a conventional label representing the qualitative property ‘surprisingly large, but not astonishingly large’.  If someone is insisting that only the 95th percentile will do, then they are advertising their willingness to have a long discussion about philosophy.

Blog post by Prof. Jonathan Rougier, Professor of Statistical Science.
First blog in the series here.
Second blog in series here.

1-in-200 year events

You often read or hear references to the ‘1-in-200 year event’, or ‘200-year event’, or ‘event with a return period of 200 years’. Other popular horizons are 1-in-30 years and 1-in-10,000 years. This term applies to hazards which can occur over a range of magnitudes, like volcanic eruptions, earthquakes, tsunamis, space weather, and various hydro-meteorological hazards like floods, storms, hot or cold spells, and droughts.

‘1-in-200 years’ refers to a particular magnitude. In floods this might be represented as a contour on a map, showing an area that is inundated. If this contour is labelled as ‘1-in-200 years’ this means that the current rate of floods at least as large as this is 1/200 /yr, or 0.005 /yr. So if your house is inside the contour, there is currently a 0.005 (0.5%) chance of being flooded in the next year, and a 0.025 (2.5%) chance of being flooded in the next five years. The general definition is this:

‘1-in-200 year magnitude is x’ = ‘the current rate for events with magnitude at least x is 1/200 /yr’.

Statisticians and risk communicators strongly deprecate the use of ‘1-in-200’ and its ilk.First, it gives the impression, wrongly, that the forecast is expected to hold for the next 200 years, but it is not: 0.005 /yr is our assessment of the current rate, and this could change next year, in response to more observations or modelling, or a change in the environment.

Second, even if the rate is unchanged for several hundred years, 200 yr is the not the average waiting time until the next large-magnitude event. It is the mathematical expectation of the waiting time, which is a different thing. The average is better represented by the median, which is 30% lower, i.e. about 140 yr. This difference between the expectation and the median arises because the waiting-time distribution has a strong positive skew, so that lots of short waiting-times are balanced out a few long ones. In 25% of all outcomes, the waiting time is less than 60 yr, and in 10% of outcomes it is less than 20 yr.

So to use ‘1-in-200 year’ in public discourse is very misleading. It gives people the impression that the event will not happen even to their children’s children, but in fact it could easily happen to them. If it does happen to them, people will understandably feel that they have been very misled, and science and policy will suffer reputational loss, which degrades its future effectiveness.

So what to use instead? ‘Annual rate of 0.005 /yr’ is much less graspable than its reciprocal, ‘200 yr’. But ‘1-in-200 year’ gives people the misleading impression that they have understood something. As Mark Twain said “It ain’t what you don’t know
that gets you into trouble. It’s what you know for sure that just ain’t so.” To demystify ‘annual rate of 0.005 /yr’, it can be associated with a much larger probability, such as 0.1 (or 10%). So I suggest ‘event with a 10% chance of happening in the next 20 yr’.

Blog post by Prof. Jonathan Rougier, Professor of Statistical Science.


First blog in series here.


Third blog in series here.

Converting probabilities between time-intervals

This is the first in an irregular sequence of snippets about some of the slightly more technical aspects of uncertainty and risk assessment.  If you have a slightly more technical question, then please email me and I will try to answer it with a snippet.

Suppose that an event has a probability of 0.015 (or 1.5%) of happening at least once in the next five years. Then the probability of the event happening at least once in the next year is 0.015 / 5 = 0.003 (or 0.3%), and the probability of it happening at least once in the next 20 years is 0.015 * 4 = 0.06 (or 6%).

Here is the rule for scaling probabilities to different time intervals: if both probabilities (the original one and the new one) are no larger than 0.1 (or 10%), then simply multiply the original probability by the ratio of the new time-interval to the original time-interval, to find the new probability.

This rule is an approximation which breaks down if either of the probabilities is greater than 0.1. For example, to scale a probability of 0.04 in the next 5 years up to 20 years we cannot simply multiply by 4, because the result, 0.16 (or 16%), is larger than 0.1. In this case we have to use the proper rule, which is

p_new = 1 – (1 – p_orig)^(int_new / int_orig)

where ‘^’ reads ‘to the power of’. The example above becomes

p_new = 1 – (1 – 0.04)^(20 / 5) = 0.15 (or 15%).

So the approximation would have been 1 percentage point out in this case. The highlighted text in yellow can be pasted directly into a spreadsheet cell (the answer is 0.1507).

Of course it is unlikely to matter in practice whether the probability is 0.15 or 0.16.  But the difference gets bigger as the probabilities get bigger.  For example, it would definitely be a mistake to multiply a 0.25 one-year probability by 5 to find the five-year probability, because the result would be greater than 1.  Using the formula, the correct answer is a five-year probability of 0.76.

Blog post by Prof. Jonathan Rougier, Professor of Statistical Science.

Second blog in series here.
Third blog in series here.

Image: By Hovik Avetisyan [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons