Break Accuracy: Optimizing the Sorting Power of Preliminary Rounds Using Tapered Points

by R. Eric Barnes, Paul Kehle, Nick McKenny and Chuan-Zheng Lee • HWS

Ideally, preliminary rounds at tournaments sort teams so that the best teams break into elimination rounds. At the World Championships of debate, the scoring system during the nine preliminary rounds does a poor job of sorting teams accurately. Adding additional rounds would increase the accuracy and fairness, but this is impractical. Using mathematical models and computer simulations of tournaments, we show that using a slightly different scoring system over the nine preliminary rounds would improve the accuracy of the break even more than would doubling the number of preliminary rounds to 18. Other implications and insights into tabulation and sorting accuracy are also discussed.

Section 1: Motivation

Famously, at the 2007 WUDC in Vancouver, the top team from Yale was coming off a series of big victories, including the Oxford IV, and was one of the teams favored to win the tournament.  After the nine preliminary rounds, they ended as the top team on 17 points and failed to break.[1]  They performed very well in early rounds, but then picked up very few points in the later rounds, finishing:  1st, 1st, 1st, 1st, 3rd, 3rd, 3rd, 4th, 2nd.  Their speaker points averaged 83.3, the 5th highest at the tournament.  It seems clear that their debating in preliminary rounds was better than many of the teams who advanced to elimination rounds, but the sorting system excluded them.  There are many other stories like this one, but our argument is not going to be founded on anecdotes.  These simply provide vivid illustrations.

Figure 1: Easy and hard paths to the break

It is worth noting that a complementary problem also occurs.  Some teams achieve very modest results in the first six rounds and then get many points in the last 3 rounds (in much weaker rooms), allowing them to break.  Figure 1 shows two different teams take very different paths to 17 points after 9 rounds.  This chart is not even the most extreme version of this that is possible.  In the past 3 years, there have been teams who broke at the WUDC who scored at least half of their points in the final 3 rounds.  But, again, the important point here is not about isolated dramatic instances, it is the idea that there is so much of a difference between the easy way and the hard way to get enough points to break.  Any power-matched (i.e., Swiss-system) tournament will have somewhat easier and harder paths, but our concern here is the width of the gap between the skill required for these two paths in the status quo.  We want a system that allows teams to make up for mistakes (their own or the judging panel’s), but not one that makes it more likely that teams who have debated worse in preliminary rounds will take a spot in the break that should have gone to a team who debated better.  

Several other concerns also exist regarding the status quo.  First, because of the scoring in BP, teams competing in lower rooms often score enough points to advance on the tab above a large number of teams who were competing in higher rooms.  This leap-frogging over other teams in higher rooms is widely acknowledged, though people may not fully understand how disruptive this is to getting an accurate ordering on the tab, as will be shown below.  Second, the outcome of early rounds at tournaments (particularly the WUDC) has a surprisingly small influence on a team’s ability to break, while in contrast, the outcome of the final preliminary round is startlingly important, many times more influential than earlier rounds.  Third, there is strong and growing concern that speaker points are unreliable, particularly for speakers from certain groups who are traditionally underrepresented in the break.  This is relevant here because the status quo system makes speaker points quite important in determining who breaks, since it creates very large ties (bubbles) that are broken using speaker points.

All that said, our motivation is ultimately a single underlying problem, of which the above problems are merely symptoms.  The status quo simply does a poor job of sorting out which teams deserve to break.  This assertion is not based on anecdotal evidence or on limited data from some subset of actual tournaments.  Instead, it is based on extensive mathematical modeling of tournament and running simulations of literally millions of tournaments.  An obvious solution to the problem of inaccurate sorting of teams would be to add more rounds.  The WUDC could run 12, 15, or even 18 rounds, and this would indeed improve the accuracy of the break.  Obviously, such a plan is entirely impractical because of the additional time and money that it would entail.  But what if there were a way to get the same quality of sorting that we would get from additional rounds, without actually having more rounds?  Surely, this would be ideal.  This is what the scoring system we propose below will do.

Next page

[1]  In 2007, when only 32 teams broke at the WUDC, it took at least 18 points to break.