The Metrics of Imperials

John Choiniere, June 12, 2013

Do you like imperial-style beers?

I’m guessing you do; after all, you’re at a website dedicated to looking at beer analytically – you’re probably not just a standard Bud/Bud Light/etc drinker.  You probably understand and appreciate well-crafted beers, across many different styles.  And at least anecdotally, if you like craft beer in general, you like imperial-style craft beers.  

But this is BeerGraphs.  We’re not interested in anecdotes, we’re interested in data.  Do craft-beer drinkers like those in our data sample prefer their beer imperial-style? And, if so, is it something that can be shown with numbers? 

To examine this question, I used our February/March data set (about 1.5 million check-ins) and selected pairs of beers in corresponding imperial/non-imperial styles – IPA with double IPA, etc.  I limited myself to pairing beers from within the same brewery, in an effort to isolate just the effects of “imperial”-izing a beer and avoid accidentally measuring the overall difference in quality from one brewery to the next.  This also should minimize distribution differences, check-in location differences, etc.  I also chose only those beers that had at least 50 check-ins, to try to avoid and small-sample-size issues (I originally used those with 250+, but the total sample was too small then). Lastly, once I had the data narrowed down that far, I manually inspected the pairings, to exclude any obviously non-comparable beers – for example, I didn’t think it was appropriate to compare a brewery’s standard American stout with its imperial coffee vanilla oaked stout (if such a thing exists).  I also had to exclude beers with incomplete information (e.g., no ABV value).  I did, however, keep separate any beer with a year in its name, reasoning that if a beer vintage was specified, the recipe likely changes (or at least gets tweaked) from year to year and as such should be considered as an independent brew.  This left me with 252 pairings from 138 breweries.

The very first thing I discovered in this data set was just how rare imperial beers that aren’t IPAs or stouts are; together, those two style families accounted for 227 of the 252 pairings (90%).  In fact, none of the other styles had enough pairs to give their statistics any sort of meaning, so while I left all 252 in when looking at general effects, analysis of style-specific effects was limited to just IPAs and stouts.

For both IPAs and stouts, imperial-type beers were rated higher, on average, than their regular counterparts.  The stouts had the more pronounced effect; by weighted average (based on number of check-ins), the imperial label was worth 0.473 extra stars (standard deviation: 0.454), while for IPAs it was only worth an extra 0.164 stars (SD: 0.161).  If you prefer relative terms, imperial stouts were rated 15% higher (± 11%) than standards; imperial IPAs were 5% higher (± 3%) than standards.  Across all styles, imperials earned 0.188 extra stars (SD: 0.185), or a ratings bump of 6% (SD: 4%).
I was also curious as to whether the amount/degree of imperializing (for lack of a better word) had any effect on the ratings; do people prefer these imperials simply because they’re boozier?  Within the sample I was looking at, I found mostly no correlation either overall or within individual styles – stouts showed a very mild correlation between relative ABV and relative score difference (r2 = 0.17), but all other correlations were below 0.09.  I can’t say I’m surprised by this; intuitively, I think the thing imperial IPA/stout/whatever drinkers like about their beverage(s) of choice is the amplification of the flavors, not the ability to get more drunk off the same number of drinks.

I looked at the correlations between score differences and individual scores as well; perhaps unsurprisingly, there’s both a(n admittedly small) positive correlation between relative score bump and imperial score (r^2 = 0.12) and a negative correlation between relative score bump and standard score (r^2 = 0.23).  Maybe this is obvious – the better the imperial and/or the worse the standard, the more space there is for separation between the scores – but even if it was something most people could have guessed, it’s good to be able to show it with data, too.

I’d also like to point out that despite the large margin of error in the ratings bumps, you, the beer drinker, have a pretty decent chance at getting a better (or at least better-rated) beer if you go for imperial over standard.  By any comparison, and within either style with sufficient sample size to draw conclusions, there’s a full standard deviation around the average score bump that’s still an improvement; this matches well with the ratios of positive score differences to negative score differences (2.54 for IPAs, 5.13 for stouts, 2.69 overall).

So, lastly, I’d like to ask the question: what’s wrong with those imperial-style ales that score lower than their standard counterparts?  The answer appears to vary depending on if you’re talking about IPAs or stouts.  While of course there are a number of factors, two things really jump out of the data.  

Looking at the IPAs, on the standard side there’s only a difference of 0.1 in the weighted average of those with positive differences and those with negative; however, there’s a difference of 0.5 on the imperial side!  With the stouts, it’s the opposite – it’s not so much that the imperials are disliked (though they certainly come in below the others), it’s that people particularly love the standard stouts: 3.46 v. 3.09 for those with negative differences versus those with positive.

Are there ways to improve this study?  Sure.  For one, there’s nothing saying the 50+ check-in cutoff is at the right level – it was chosen fairly arbitrarily, though with the goals in mind that were stated above.  For what it’s worth, I ran the same data/comparisons using at cutoff of 250 check-ins, and found nearly identical results; some details were changed, and the standard deviations were larger, but the conclusions were the same.  It also seems quite possible I made a mistake or two in the manual data culling; if my adjustments were at all skewed based on something like familiarity with the beers in question, I may have included or excluded things where I otherwise wouldn’t have.  In general, though, I think the methodology is fairly sound, and that there are good conclusions to be drawn here.  

Imperial-style beers do, in fact, rate more highly than their standard counterparts.

Header image courtesy