Judgments of Paris, Princeton, and Lenox, Part 4
As indicated earlier, it was striking in 1976 when Californian wines “sort of” beat French wines in Paris tastings. I say “sort of” because Orley Ashenfelter and Richard E. Quandt found that while Californian wines got the highest ranking in both red and white categories, judges found very little difference between the French and US red and whites overall. The American Association of Wine Economists arranged a re-enactment of the Paris tasting in Princeton last summer. But in Princeton, French wines were compared to New Jersey wines. The New Jersey wines did quite well, but again, the judges’ rankings differed significantly.
The Lenox Wine Club
In November 2012, the Lenox Wine Club (LWC) was created. Consisting of 14 “veteran” wine drinkers, it decided to start with four tastings: “heavy whites”, “heavy reds”, “light whites”, and “light reds”. All tastings address the following questions:
- Among comparably-priced wines, are the judgments of the tasters similar enough to identify a significant preference among the wines, and
- Does price matter?
Blind tastings are done at a restaurant with very light hors d’oeuvres. Tasters were asked to score the wines on a scale of 5 with 5 best, 1 worst. As reported earlier, 3-liter box wines got the best scores for the “Heavy Reds”, “Light Reds (tied for best)” and “Heavy Whites” tasting. In all three cases, the boxes (priced at $4/750 ML) beat out wines costing as much as $80+. But just as was the case at Paris and Princeton, the results were hardly definitive because of the scoring differences among judges.
The “Heavy Reds Blend” Tasting
On March 29th, The Lenox Wine Club tasted “Heavy Red Blends”. They included:
- a Bordeaux;
- a Côtes du Rhône;
- a Shiraz/Cabernet Sauvignon/Merlot blend
- a Carménère/Syrah blend, and
- a box blend of Merlot, Cabernet Sauvignon, Zinfandel, Syrah, and Petite Sirah.
The results for the tasting are presented in Table 1. It is again notable that while to Falernia got the highest score, the Bota Box was not far behind. And the two French wines – the Bordeaux and the Côtes du Rhône – rated next to worst and worst, respectively.
Table 1. – Lenox Wine Club Scores, “Heavy Red Blends” (5 – best, 1 – worst)
As has been the case in the earlier tastings, the correlation between price and score was negative. That means that the higher priced wines received lower scores.
In Table 2, the correlation between each taster’s choices and the average scores for the “Heavy Whites”, “Heavy Reds”, and “Light Reds” are presented. A high positive number indicates a taster is close to the overall average. For example, KM’s correlation of 1.00 in the “Heavy Reds” tasting means KM’s scores are exactly the same as the overall average. Low or negative numbers indicates the opposite. If you look at the average for individuals, it appears we have several “rogues” in the group: the scores of BB, JM, LR, LS, and MS have virtually no correlation with the overall average scores.
Table 2. – How Tasters’ Scores Correlated to Average Scores
And finally, we see that the overall average correlation of tasters’ scores was highest for Heavy Reds and lowest for Heavy Red Blends.
One other measure is worth mentioning. The Kendall W statistic indicates how much congruence there was among the ratings of the tasters. The Kendall W for the Heavy Red Blends was only 0.022, indicating there should be very little confidence in the judges’ overall ratings.
Wine with Food
Since more wine is consumed at meals, it is odd that most wine tastings are done without food. In our prior tasting (Light Reds), we asked tasters to indicate whether there were any changes in their scores after consuming the main course. We abandoned this approach for the Heavy Red Blends tasting but will be working on a different approach to score wines with food in future tastings.
How Good Are LWC Members as Judges?
Since our last tasting, Neal Hulkower, a friend/expert on wine scoring/rating/ranking methods, introduced me to the writings of Robert Hodgson. Hodgson has been analyzing judge performance at the California State Fair Commercial Wine Competition for over a decade. The key result is that only about 10 % of the judges are consistent in their ratings. So how does Hodgson judge consistency? Include more than one glass of the same wine in each blind tastings for judge candidates. If the candidates do not score glasses of the same wine nearly the same, they should not be judges. And apparently, many do not.This leads him the conclusion that competition awards are stochastic in nature, i.e., the awards have a major random component. In short, Hodgson eschews the concept that competition awards represent a reliable basis for informed consumer choice.
So Hodgson’s findings intrigue me and other members of the LWC. So at the the final tasting of the Lenox Wine Club later this month, another 5 wines will be tasted, but there will be a sixth glass containing one of the 5 wines being tasted. Admittedly, this far less rigorous than what Hodgson would recommend, but but interesting and fun nonetheless.
While the “robust” performance of the box wines in our tastings is amusing and somewhat remarkable, our results reflect the common pattern of most tastings: the ratings/scores of the “judges” are all over the map. As noted in my earlier reports, this could be either because the taste differences of the wines could not be picked up by the tasters or because the ratings/scores are dominated by different taste preferences of the judges. But as Hodgson’s work suggests, it could also be because our members cannot really distinguish between wines. Stay tuned….