Bayes' Theorem 1, Mandatory Filtering 0
Unfortunately the Rudd government are pressing forward with their proposal for mandatory internet filtering. Recently, Electronic Frontiers Australia summarised the results of an analysis of current ISP-level filters commissioned by my old mates at ACMA. The figures are frankly begging to be plugged into Bayes' Theorem, so let's do that.
Firstly some terms. Let "P" be the event of discovering a Porn site on the internet. Let "N" be the converse event: discovering a Non-porn site. Let "+" be the event of a postive detection by an ISP filter. And obviously by "porn" I mean "inappropriate material", the definition of which may or may not coincide with the Government's; for the (admittedly rhetorical) purposes of this exercise it doesn't matter too much.
One of the key assumptions we have to make is P(P), or the probability of discovering a porn site. Obviously this depends greatly on how hard you're searching for it! Now I don't know about you, but I find that I almost never stumble upon a porn site by accident. Almost all of my regular news sources are relatively clean, or use the NSFW tag generously. But I'm quite happy to concede that my web habits are non-representative. So let's just assume a regular family internet connection with a moderate amount of parental supervision, with a 5% chance of accidentally stumbling on porn. This still seems quite high, but it will suffice for now.
To do the calculation we also need:
- P(N), which is pretty obviously 1 - P(P), or 0.95.
- P(+|P) is the probability of a positive detection by the filter, given a porn site. According to the report, this varies from 87–98%. Let's be generous and say a probability of 0.95.
- P(+|N) is the probability of a false positive detection by the filter. Again the results vary, this time from 1.3–7.8%. Let's use a similarly generous probability of 0.04.
- P(+) is the probability of a positive detection given any input. This is calculated by adding P(+|P) × P(P) and P(+|N) × P(N).
Now using Bayes' Theorem we can calculate P(P|+). In other words: if a filter blocks a given site, what is the probability that it was porn?
P(P|+) = P(+|P) × P(P) / P(+) = 0.95 × 0.05 / ( 0.95 × 0.05 + 0.04 × 0.95 ) = 0.55
In other words, each time the filter blocks something there is an about even chance that it wasn't porn. In my opinion this is sufficiently damning evidence to show the worthlessness of any of these filters.
So obviously we made some assumptions about the prevalence of porn, and hence the probability of discovering it. If we assume the internet is 50% porn, then the filter starts to look vaguely effective:
P(P|+) = P(+|P) × P(P) / P(+) = 0.95 × 0.5 / (0.95 × 0.5 + 0.04 × 0.5) = 0.96
But this is clearly a ridiculous assumption. If, on the other hand, we say that P(P) is lower, maybe a 1% chance of stumbling on porn — which frankly still sounds high to me — then the filters look even more useless:
P(P|+) = P(+|P) × P(P) / P(+) = 0.95 × 0.01 / ( 0.95 × 0.01 + 0.04 × 0.99 ) = 0.19
So if the filter is blocking something, there's an 80% chance that it wasn't porn! Fantastic! For some reason these calculations seem to be missing from the ACMA report.
See the EFA analysis for more on the mandatory filtering, and while you're there, join up. I have.