Apples, Oranges, and Author Earnings

I mentioned on Twitter last week that a recent update on Author Earnings comparing extrapolated Amazon data to Bookscan numbers is actively misleading. I thought it might bear unpacking that a little bit by way of analogy.

Imagine there is a grocery store selling both apples and oranges, and you need to figure out how many of each fruit is sold in a week (or at least which sells more than the other.) So you camp out in the store for an hour, and count how many of each fruit the customers buy. 

You're probably going to get some useful information from that, to be sure -- whether the ratio between apples and oranges is roughly comparable, for example. You can even extrapolate from that hour -- multiply by how many hours the store is open, and you might get a ballpark number for how much fruit is sold. But that number risks being wildly inaccurate, because you're relying on that single sample hour to be perfectly typical. But a store has busy hours and slow hours -- some hours nobody's buying. Some hours, maybe someone's buying fruit for a world-record-size fruit salad. Some hours, you get a run of people allergic to citrus. All you can get is a very rough idea.

You can also ask a couple of orchard owners how much they get, look at the prices in circulars, and try to work out how much money the grocery store is making off fruit. But it would be a terrible mistake to try to, say, calculate the orchards' operating income from that loose guess of yours. The picture is a lot bigger than that one hour at one store, and is influenced by a lot of other factors.

Now let's say you get your hands on another source of information -- maybe the inventory records of a competing grocery store showing how many oranges it sold that same week. That's hard data, and it's great -- you can learn a little more about the size of the orange market in town from that.

But you can't then combine those two kinds of information as if they were the same to make conclusions about, say, whether Grocery Store A sells more oranges than Grocery Store B, and certainly not about whether Sunny Orange Productions is making more money than Crisp Apple Growers.

One of them is a cobbled-together piece of data and guesswork; one is hard data, but for only part of the equation. Each one of them tells some interesting stories, to be sure, but it's just as important to know what information the data can't tell you.