Break Accuracy: Optimizing the Sorting Power of Preliminary Rounds Using Tapered Points

by R. Eric Barnes, Paul Kehle, Nick McKenny and Chuan-Zheng Lee • HWS

Ideally, preliminary rounds at tournaments sort teams so that the best teams break into elimination rounds. At the World Championships of debate, the scoring system during the nine preliminary rounds does a poor job of sorting teams accurately. Adding additional rounds would increase the accuracy and fairness, but this is impractical. Using mathematical models and computer simulations of tournaments, we show that using a slightly different scoring system over the nine preliminary rounds would improve the accuracy of the break even more than would doubling the number of preliminary rounds to 18. Other implications and insights into tabulation and sorting accuracy are also discussed.

7.2 | The “Less Fun” Objection

The second objection we have encountered does not dispute our conclusion that the break would be more accurate using a tapered system.  Instead, the concern here is that tapered systems result in more teams falling out of contention earlier in the tournament, which may make the experience less fun for them.  Also, some argue that this has the potential to cause some teams to take their remaining preliminary debates less seriously, which could impact others who are still in contention for the ESL or EFL break.  We acknowledge that there is a significant value to having fun at tournaments and that debate competitions are more fun when one perceives that there is still a chance to make the break.  It is also plausible that some teams will take debates less seriously once they know that they are no longer in contention.

Our first response focuses on the potential distortion to the ESL and EFL breaks from teams who perceive they are no longer in contention for the open break.  While it is likely that some teams don’t debate as intensely or try as hard to win once they are out of contention, it is very rare that they debate so egregiously poorly as to be disruptive of the competition and undermine the validity of a round’s results, as opposed to just making it somewhat easier for other teams to pick up points.  This is because most debaters at Worlds don’t just care about breaking, they also care about winning individual debates (since they are competitive people) and they care about where they end up on the tab (since the tab is public).  If giving disruptively bad speeches were to become a problem, steps could be taken to deter this sort of thing, but we don’t believe this is likely to be a problem.[33]

Our second response here focuses on the lessened enjoyment of participants because they are knocked out of contention sooner.  Promoting this kind of enjoyment is admirable, at least insofar as it does not excessively interfere with core goals of the tournament.  We could ensure that every team was still in contention for all preliminary rounds by adopting a system that gave vastly more points in the last round than in all the others.  But doing this would dramatically reduce the accuracy of the break, so it’s a terrible idea.  We want people to enjoy the tournament, but not at the expense of achieving an accurate break, which is the primary function of preliminary rounds.[34]  Also, there is an argument to be made that because rounds between teams of similar ability are more fun, a tapered system will result in debaters enjoying their debates more, since tapered systems sort teams more quickly into accurate skill levels.  But even aside from this, since tapered systems dramatically increase the accuracy of the break, it is worth sacrificing the fun that a few teams may have otherwise experienced.  It very much is not fun to be excluded from the break when you really deserve to be included.[35]  

7.3 | The “Better Alternatives” Objection

The third objection we consider is that there are better scoring systems than the Early Taper system and that if a major reform of scoring is going to be adopted, we should adopt another system instead.  The primary alternative that people have raised here is some version of Elo ratings, which are used in other competitive activities (e.g., chess).[36]  Again, we have two responses here.

Our first response is to the more general criticism.  In all honesty, we welcome any system that does outperform the Early Taper system that we are proposing here.  If anyone is aware of a practical system that will more dramatically improve the accuracy of the break, but not be significantly disruptive to the tournament (e.g., making it much longer or fundamentally changing the debating or judging), then we are happy to advocate for that system instead of ours.  But, at this time, we are not aware of any such systems.

Our second response is to explain our rejection of the Elo system in particular.  To be clear, there are two very different ways in which an Elo system could be used at Worlds, as a local system or as a global system.  A local Elo system would have every team enter Worlds each year with the same number of points, which would then be adjusted over the course of preliminary rounds against other teams.  There is nothing fundamentally unfair about this system, but it also does not offer any advantage at all in improving the accuracy of the break.  Based on our extensive modeling of this system, it performs exactly as well as the status quo, which is to say, not particularly well.  Others who initially found a local Elo system appealing have reached the same conclusion.

In contrast, a global Elo system would allow teams to bring different numbers of points into the tournament, under a system where having more points grants a team a distinct advantage.  These points would be earned in other local and regional tournaments.  There are many problems here, but we will focus on just a few.  First, this certainly abandons the sense that at the beginning of the tournament, everyone is starting on a level playing field.  Some teams will have a head start in the race to break, and this seems deeply unfair.  The obvious response to this is that the teams who have head starts have earned them and that these head starts are reliably correlated with team skill.

If this response were correct, it would be worth having the argument about whether we want teams with higher skill to begin the tournament with an advantage or not, but the response is based on a false premise.  Because there are both limited interactions between teams in different regions and significant difference in the regional skill levels, the skill ratings that teams would bring into Worlds in a global Elo system would be wildly inaccurate and so very unfair.  This is enough to discredit the suggestion, but on top of that, the requisite international system would also be an enormous task to maintain and would promote perverse incentives in which tournaments to attend.  And, on top of all that, although we have no proof that this system would produce less accurate breaks (since this would require modeling the entire international debating circuit), there is also no evidence that a global Elo system would result in a more accurate break.[37]

Next page

[33]  Two of the authors have judged at numerous WUDC tournaments since 2007 and neither of us have seen this occur, nor have we heard any first-person accounts of this occurring.  This is obviously not to say that it has not happened, but it may be a much more popular story than it is a common occurrence.

[34]  We hope that no one is tempted to make a related objection based on the absurd claim that inaccuracies in the break are somehow a desirable aspect of the competition and that a more accurate break would be worse for some reason.  The same critique would seem to argue against trying to make the judging at tournaments better (say, by having briefings).  This bizarre objection claims that ET would make the break too accurate.  So, if ET were used, then these objectors would presumably support intentionally adding inaccuracy into the team tab before selecting teams for the break.  Perhaps they would require that every team draw a random card that moved them up or down the team tab between 1 and 50 of places.  Yes, this is an absurd idea.  Indeed, it’s exactly as absurd as the claim that introducing greater accuracy into the break is a problem.

[35]  We would also encourage those teams to develop an appreciation of the intrinsic value of the activity and its long-term extrinsic value (i.e., skill development), instead of being so focused on the immediate extrinsic rewards (i.e., competitive success).

[36]  For a simple description of the Elo rating system, see:

[37]  None of this says anything against a desire to establish a global Elo rating system that functions alongside tournaments, but plays no role in deciding who breaks.  Such a system may indeed be worthwhile and is totally compatible with running tournaments using tapered scoring systems like ET.