By Chuan-Zheng Lee
I get that racism stirs emotions, but I try to give attempts at statistical analyses a fair hearing. About half of the work released by Labour on Saturday is actually sound. This half is also the half that has received the most criticism. Bayesian inference is a perfectly good means of developing probabilistic models about things like “based on their name, of what ethnicity is this person?” Reading Rob Salmond’s explanation of it yesterday, there’s nothing obviously untoward about this part of their methodology. I know a lot of people have felt offended about an apparent conflation of last name with role in the housing market, but strictly speaking, Labour’s analysis doesn’t imply it.
In statistical jargon, what Mr Salmond’s Bayesian analysis computes is an expectation (over Bayesian probabilities) of the number of buyers of each ethnicity, among those who bought a house between February and April with an unidentified agency representing 45% of the market in that period. You can think of this as, “if you picked 3,922 people with these surnames but otherwise at random, and repeated this experiment lots of times, on average, any such 3,922 people would have this many of each ethnicity: 40.7% of them would be European, 39.5% of them would be Chinese,” and so on.
If both of these two assumptions hold:
- The 3,922 in their data set are representative of the 8,790 who bought houses in that period.*
- Knowing whether someone bought a house tells you nothing statistically more about their ethnicity than their surname does. For example, a Wang who bought a house is no more or less likely to be Chinese than a Wang who did not. (In probability speak, ethnicity and buying a house are conditionally independent given one’s surname.)
then the 3,922 people in their data set are “otherwise (than surnames) at random” for the purposes of determining the ethnicity of house buyers in Auckland between February and April.
Even intuitively, this should look fine. Whenever we see names before faces, we’ll often make guesses about ethnicity based on last name. Sometimes we’ll be wrong—for example, someone in an interracial marriage who changed her last name—but the odd error doesn’t stop us from guessing. We keep doing this because it’s a fairly good heuristic. If you think you don’t guess ethnicities based on names, I politely suggest that you ask yourself again.
Where, then, did Labour go so horribly wrong? In pretty much all of the other half, the part where they tried to make the leap from ethnicity to residency.
Here are Phil Twyford’s and Mr Salmond’s claims, emphasis added:
- “39.5% of the Auckland houses sold went to people who appear to be ethnically Chinese. This is a large discrepancy from the 9% of the Auckland population who are ethnically Chinese.” (ref)
- “It’s staggering evidence that strongly suggests there’s a significant offshore Chinese presence in the Auckland real estate market. It could not possibly be all Chinese New Zealanders buying; that’s implausible.” (ref)
Claim 1 follows from the Bayesian analysis and it’s fine. Claim 2, however, is useless. Depending on how you read it, the italicised part is either trivially true or flatly wrong.
If he meant it literally, as in, you only need to find one Chinese foreigner to prove him right, then that’s obviously true but meaningless (i.e., trivially true). Presumably what he actually meant is that it’s implausible that claim 1 can be explained by anything other than a “significant offshore Chinese presence”. Not just unlikely, but implausible.
Yet, despite the rigour involved in arriving at claim 1, neither Mr Twyford nor Mr Salmond give any precision to what they mean by “implausible”. To me, this means some very high posterior probability (if you’re a Bayesian), like 0.999 or something, or some extraordinarily low p-value (if you’re a frequentist), like 0.001 (well below the typical 0.05). But maybe they had something else in mind, and that’s okay. They also neglected to specify what they meant by “significant”—say, “enough to affect prices”, or “comprises 10% of the market”.
Now, I know what they’ll say. It’s not possible to quantify this sort of hypothesis. Well, firstly, it is: I just did, in two different ways. They might not have the data to reach that criterion, but that’s another matter. Secondly, that’s no excuse for the sort of magic trick they performed, especially if the claim is that something is “implausible”. If we want to give useful political sound-bites, at least keep to “it merits further investigation” or something like that. That’s defensible. The sweeping statement Mr Twyford paraded on Saturday is not.
The leap from claim 1 to claim 2 that Messrs Twyford and Salmond made requires an implicit assumption. You have to believe that the propensity of a resident to buy a house is roughly independent of ethnicity. That is, ethnically-Chinese residents are just as likely to buy a house as Indian, European, Māori, Pasifika and other residents.
Keith Ng and Thomas Lumley, among others, have offered plenty of reasons to believe this assumption might be false. Maybe recent migrants tend to buy houses and have cash, for example. Maybe Chinese just prefer real estate to stocks, tend to save more, move more often, get more help from their parents. There are all sorts of hypotheses that Labour failed to rule out.
In response, Mr Salmond tried to address two of them with a comparison between the resident Chinese and resident Indian populations. Even this, though, only “contra-indicates” at most two such explanations (relating to recent migrants), and at best in a way that suggests they can’t be responsible for the whole difference between 39.5% and 9%. Here’s the thing though: with so many variables and alternative hypotheses, it could easily be that all of them are responsible, each in a small way, that add up to the difference observed in claim 1. And, to be sure, there might also be an “offshore Chinese presence” somewhere in there too. The problem is that we don’t know, based on this data set, what it is. For this reason, it’s perfectly “plausible” that the “offshore Chinese presence” Mr Twyford asserts is not “significant” (whatever that means).
Is it a tall ask for Labour to rule out all alternative explanations in combination? Yes, of course it is. And that is precisely the point: there are too many competing explanations for the metric Labour have used—ethnicity—to tell us anything useful about residency. The root problem is that Labour’s analysis doesn’t measure residency. It measures ethnicity. As Professor Lumley put it:
If you have a measure of ‘foreign real estate ownership’ that includes my next-door neighbours and excludes James Cameron, you’re doing it wrong, and in a way that has a long and reprehensible political history.
It may be frustrating for Labour to find the data it wants, but that doesn’t entitle them to present a half-baked “analysis” to try to back claims that Chinese foreigners are responsible for soaring house prices. And this is putting aside the artificial restriction of housing supply through zoning and height restrictions, and a continued concerning tendency for Labour to talk a lot about Chinese people and not very much about Canadian, British and American foreign investors.
* Even if they’re not, it’s worth noting that 45% is such a large share of the whole market that even if every ethnically-Chinese buyer used this agency, there would still be an overrepresentation of ethnic Chinese.^