Friday, August 7, 2015

Eva Vivalt Did Not Show QALYs/$ of Interventions Follow a Gaussian Curve

(Epistemic status: I have zero statistics background, but damned if I won’t give this a shot anyway.)

In a recent blog post, Robin Hanson said Eva Vivalt's data indicate that the distribution of charity impacts is close to Gaussian, and is not a fat-tailed power law like Effective Altruists claim. If that's true, it pretty much undermines Effective Altruism altogether, because it means that there's not a big difference between a decent intervention and the best intervention.

Suppose the following three interventions had identical effect sizes: feeding people carrots, handing out chess strategy manuals, and deworming.

I hope you're currently wondering what the hell I just asked you to suppose, because the previous sentence was nonsense. You can only talk about the “effect size” of carrots if you’re measuring an additional thing besides carrots. The additional thing is probably not “handing out chess strategy manuals”, because then the effect size of carrots would be a measure of how good carrots are at handing out chess strategy manuals.

How about if I’m studying “feeding people carrots to make them taller”, “handing out chess strategy manuals to make them smarter”, and “deworming to eliminate intestinal parasites”?

That’s much better! Now it makes sense to talk about effect sizes for these things. There’s some amount of taller people get when you give them carrots, some amount of smarter people get when you give them chess strategy manuals, and some amount of dewormed people become when you give them wormicide.

Now what does it mean for these things to have identical effect sizes?

There are actually several reasonable answers, but here’s the one I’m seeing in Eva’s slides.

Say that the average height is 5’7”, and most people are between 5’5” and 5’9”. So the usual variation in height is “within 2 inches of average”, or a range of 4 inches. When I give somebody a bundle of carrots, they grow an inch. 5’8” plus bundle of carrots equals 5’9”. We can express the effectiveness of carrots in terms of variation in the general population: If everybody gets their Height stat by rolling a four sided die to adjust away from human average of 5’7”, eating a bundle of carrots gives you a plus one to your Height roll. It’s like a 25% bonus to randomness.

Now say that the average IQ is 100, and most people are between 95 and 105: the usual variation in IQ is “within 5 points of average”, or a range of 10 points. When I give somebody a chess book, they gain two and a half IQ points. We can express the effectiveness of the book in terms of variation in the general population: If everybody gets their Intelligence stat by rolling a ten sided die to adjust away from human average of 100, reading a chess book gives you a plus 2.5 to your Int roll. It’s a 25% bonus compared to randomness.

Then we can (sort of) compare the effect sizes of carrots and chess books: Carrots give a 25% bonus, and chess books also give a 25% bonus.

Is that useful information, though? Why does it matter if carrots give the same size of bonus to growth that chess books give to IQ?

Now if carrots happened to also make people smarter, then comparing effect sizes would be useful. We’d be talking about two different interventions aimed at the same outcome. Furthermore, we could dispense with the standardized statistical effect size stuff, and look directly at the absolute number of IQ points gained from a dollar’s worth of chess books, and the number of IQ points gained from a dollar’s worth of carrots.

If it turned out that all intelligence interventions gave the same Int bonus per dollar, then we might as well flip a coin to decide between carrots and chess books. Same thing if it turned out that we weren’t good enough at measuring things to tell the difference between the effects of carrots and chess books on intelligence. Any time spent “picking the best one” would be wasted.

But what if you don't know how much chess books and carrots cost? And what if you don't know how many carrots are in a bundle? Maybe you know that a "bundle of carrots" - whatever number of carrots the charity is distributing per person - has the same effect as a chess book, but you don't know that the chess book costs four times as much as the bundle of carrots. It would be premature to say that time spent choosing between carrots and chess books is wasted, because if you learned the cost, you'd fund the carrots.

It might even be that the effect of carrots only seems to match the effect of chess books because the carrot people kept putting more carrots into the bundles until a bundle was as good as a chess book, and then they stopped. (Because that's where they reach some socially agreed-upon level of 'statistical significance', perhaps?) Maybe if you put twelve carrots into the bundle instead of six, you get twice as many IQ points as a chess book causes, and for half the price! But you don't know; you just know that as things are now, the carrot charity is making people about as much smarter as the chess charity.

And that, it seems to me, is what Eva's data actually say. When I emailed her for clarification, she said that, "Most of the interventions can't be statistically distinguished from each other for a given outcome." She also said cost wasn't factored in yet (though she suspected it wouldn't change much). So if she's right, then as far as we can tell, all the deworming interventions are currently equally good at killing worms, all the microfinance interventions are equally good at alleviating poverty, and so on for the top 20 international development programs.

If it’s true that the top 20 international development programs are just as good at whatever they do as all the other programs they’re directly competing with, even when you factor in cost, this has significant implications for Effective Altruism. It means we can stop evaluating individual charities once we've identified the "pretty good" ones.

But there’s a stronger claim it’s easy to confuse this with. (Eva’s presentation was called “Everything You Know Is Wrong”, and a couple of her slides said, “Anyone who claims they know what works is lying”. I tend to expect confusion when such strong System 1 language accompanies abstract statistical analysis.) The stronger claim is “the top 20 interventions are equally good at saving lives, regardless of how they go about it”. If that were true, it would chuck the premises of Effective Altruism right out the window.

If you want to compare two interventions with different outcomes - medicated mosquito nets vs. microfinance - you’re going to need some way of converting between malaria and poverty. When we’re talking about altruism, the common factor is Quality Adjusted Life Years.

There’s some amount of better a person’s life is when they don’t have malaria, and some amount of time they remain malaria-free after you give them mosquito nets. There’s also some amount of better a person’s life gets when they’re fifty bucks less poor, and some amount of time they stay fifty bucks less poor after you give them fifty bucks. So you can compare bug nets to microfinance through QUALYs once you've got data on 1) effect sizes, 2) how nice it is to be healthy or less poor, 3) how long people stay healthy or less poor once they get that way via bug nets or microfinance.

You need all three of those things. The fact that the effect sizes are identical doesn't matter if a medicated mosquito net is worth orders of magnitude more QUALYs than fifty bucks. Eva's data only include effect sizes.

It doesn't make sense to compare "how many worms a deworming intervention kills" with "how much AIDS a box of condoms prevents" until you know how much those problems affect quality of life, and for how long. So even if all deworming interventions are equally effective, the choice between “deworming” and “condoms” could still be massively important.

And that’s the central claim I take EA to be making.

Eva's analysis says nothing about distribution of QUALYs over all EA interventions under consideration. Maybe it’s Gaussian after all, but this isn’t new evidence either way.

1 comment:

Christian Germain said...

Excellent piece of writing. I didn't see Eva's presentation so I cannot comment on whether this refutes it or not, but as someone coming to the table to learn about EA and decide whether I think it is worth my time to follow, this was a great explanation.