Although the normal distribution -- pictured in the header thanks to Wikipedia -- is beautiful in its simplicity, the beer scores coming from untappd aren't necessarily going to fall into that kind of shape.
For one, there's a maximum score. You can only give a beer five stars, and so there will be a cliff at the end where users are forced to stop. A second point is less exciting, statistically speaking, but just as important: not all untappd users had access to half-stars throughout the year. The app added that functionality earlier in the year, the desktop version didn't have half-stars until mid-summer. They're likely to be skewed towards the maximum by a third factor -- a bad beer won't sell enough to keep getting ratings. There's a survivor bias inherent in these numbers.
The upshot of these caveats is that our beer check-ins probably won't sit in a normal distribution. They're likely to skew towards the higher scores, and the half-stars are likely to be well behind the full-star ratings in terms of number.
When we look at all check-ins for the American IPA style, we see these effects immediately:
Maybe there were still a couple surprises in the data. Did you think that twice as many IPAs would get four stars as got three? While running the replacement level data, it was clear to us that three wasn't the middle ground that you might suppose it to be. A three comes with a bit of shame now. And this graph hammers that point home. The new three is four.
But IPAs are the meat and potaoes of craft beer. Other styles inspire more disparate reactions. Think of sours and the pucker face you get from a non-fan. Is it possible that the shape of sour scores is very different from that of American IPAs?
There is a certain amount of love-it-or-leave-it in these numbers. Look how sour ratings show fewer threes (11% to 18%), about as many fours (33% to 36%) and a ton more fives (17% to 11%). We have a hard time with zeroes around here -- a zero can be zero stars or just a ratings-less check-in -- but if all styles get hit by the star-less check-in equally, there are likely more people giving sours zeroes. All told, the extremes captured 33.1% of the sour check-ins, while they only described 24.1% of the IPA ratings.
Smoked beers look like sours if nobody really loved sours. That's a tough percentage of five-star beers right there.
But let's put the beers up against each other, throwing Pumpkins into the mix.
Well maybe people don't really love pumpkin beers after all, since they end up short at almost all the prime rating spots. You can see a 'more normal' distribution for IPAs and pumpkins, and a more extreme one for sours and smoked beers, too.
Seems like the shape of the scores will be interesting for discussions of replacement level at some point. Any use of arithmetic means will be skewed in beer that produces the extreme distriution, for example. Maybe the shape of the curve itself can help us identify replacement level.
We'll have to figure out how to use it best. Your ideas are welcome.