I love that BeerGraphs combines the worlds of suds and stats. Generally speaking, I’m a fan of drawing links between disparate ideas; and I enjoy gaining deeper insight into the things I enjoy – or, at the very least, acquiring a fresh perspective of them. When I heard rumblings about this endeavor, then, I knew I wanted to participate in some capacity or other.
Truth be told, though, I was skeptical, too.
I’m no mathematician; nor am I some untapped (pun intended) genius with applied statistics. That said, I’m not a complete noob, either, and it gave me pause to discover that Untapp’d would be the data source for BeerGraphs’ metrics.
I should emphasize my concern had nothing to do with Untapp’d itself (it’s a great app); rather, it centered on the fundamentally subjective nature of the data culled therefrom. After all, they're ratings. They're people's opinions. And opinions are a little trickier to organize, codify, and give credence to than other, comparitively objective, phenomenon.
To wit, I thought of surveys.
Surveys are vital, to be sure. Among other uses, surveys help organizations collect feedback. Survey responses inform product or program review/development and decision-making processes. And the administration of surveys implicitly represents a gesture of good will from an organization, as if to say, “hey, what you think is important to us.”
All this is to say that surveys, by way of respondents’ answers, produce potentially valuable information. The problem, though, is that any number of unanticipated (or altogether hidden) mediating and moderating variables could have an effect on respondents’ answers. For example, when most people take surveys, they are subtly influenced by the human needs to be liked and noticed (by way of illustration, Twitter and Facebook do a great job of capitalizing on those needs).
Given those psychological variables, it’s easy to see how respondents’ satisfaction ratings in particular may be unduly influenced – especially when they know, as is the case with Untapp’d reviews, that their answers aren’t anonymous. People behave differently when they know (or think they know) they are being (or potentially could be) observed.
The other broad variable which concerned me about Untapp’d data is what folks refer to as “park effects.” When a survey is administered in a controlled environment, the data analysts can rest easy knowing each of the respondents was under basically the same environmental influence at the time of administration. And even though two people might respond very differently to the same environment, it's still much easier, given the need to cull meaning from the data, to control that variable. In the case of Untapp’d, though…let’s just say there is likely significant environmental variation among the folks who rate beers with that app.
Given these (and other) potential pitfalls with the raw data, why would a suds geek even try to find meaning therein? Well, because there’s an antidote. (And also because it’s wicked fun to try.)
The antidote is what I like to call “The Ballad of Ted Berg.” That is to say, the antidote to the subjective nature of survey responses is sample size. The larger the sample, the more meaningful the data. Why? Because there can be incredible variation between individual responses, for any number of silly reasons unrelated to -- gee-whiz! -- what the person really thinks about the object of interest (in this case, beer). The Saber-dudes and dudettes among you know this to be true even with regards to data culled from the comparatively objective events of baseball.
Let’s talk more about psychology.
Say there’s a guy in a bar somewhere in Sausalito who loves Pliny the Elder (because duh), but he’s never rated it before, since he just downloaded Untapp’d. He’s an emotionally unstable drunk under any circumstance; and on this particular night, he's just been dumped by his beloved long-term girlfriend. As you might imagine, he’s having a rough go of it: he’s been pumping cryptic drivel into the social mediasphere all night long, and he’s building a really good head of lyrical steam. He remembers Untapp’d, and decides he’s going to kick things off there with a bang. He picks the handle, “H8_U_GRL,” orders his Elder, and trashes it with astonishing eloquence, because, hey, this guy wants to watch the world burn right now.
Is this review representative of his opinion? No, it isn't. But because he was under the influence of other variables, his response was skewed.
Intrepid survey administrators run this risk all the time; but it's a risk mitigated by sample size: not everyone who completes a survey is going to be under extreme circumstantial influence when they take the survey.
So if all we had were five total check-ins (n=5) for Pliny the Elder, Misery Guy’s ill-considered response would have an enormous impact on the whole. It would be folly for an analyst to draw conclusions about the value of PTE from such a small sample, because when n=5, each response means too much.
Bring n > 100, though, and the silly rating becomes a drop in a much more meaningful pond. (In the case of Pliny the Elder, which I have yet to imbibe, incidentally, it’s a delicious pond indeed.)
So help us out! Encourage your friends to utilize Untapp’d. Help spread the word and keep those sample sizes robust. The more data we have, the more we’ll learn about beer. And the more we learn about beer, well…the possibilities are endless, my friends.