Break Accuracy: Optimizing the Sorting Power of Preliminary Rounds Using Tapered Points

by R. Eric Barnes, Paul Kehle, Nick McKenny and Chuan-Zheng Lee • HWS

Ideally, preliminary rounds at tournaments sort teams so that the best teams break into elimination rounds. At the World Championships of debate, the scoring system during the nine preliminary rounds does a poor job of sorting teams accurately. Adding additional rounds would increase the accuracy and fairness, but this is impractical. Using mathematical models and computer simulations of tournaments, we show that using a slightly different scoring system over the nine preliminary rounds would improve the accuracy of the break even more than would doubling the number of preliminary rounds to 18. Other implications and insights into tabulation and sorting accuracy are also discussed.

Section 8: Limitations

At this time, our research has focused on the annual WUDC tournament, with nine rounds and between 240 and 440 teams.  We have started to do research that would apply directly to smaller tournaments with fewer rounds.  In particular, we are working on identifying ideal taper systems for common major tournament sizes (80 – 160 teams) and numbers of rounds (usually 5, 6 or 8).  Our initial findings give us excellent reason to believe that the SQ system is inferior to a tapered system for common tournament sizes like these, but these initial findings are not yet ready for publication.

In constructing our tournament simulator, we tried to hew as closely as possible to current practices in the WUDC, but we did not model tabbing complications like position rotation.  Since position rotation places an additional constraint on how teams can be placed into particular rooms, it is likely that this would somewhat disrupt the accuracy of any sorting process.  But, of course, it will cause this disruption regardless of which scoring system is being used and there is no reason to think that this constraint would be more harmful to one system than any other system. 

There may exist more radically different point distribution systems that produce even better results.  For example, we could investigate systems that don’t conform to the (3,2,1,0) proportions, or we could experiment with systems that inserted a randomly paired round into the middle of the tournament.  But these are more likely to offend people’s sensibilities and so are less likely to be incorporated into actual practice.  We may look into these from a more purely mathematical perspective in the future, without the practical constraints stemming from convincing debate practitioners to try something radically different.

Currently, tabulation software is not capable of running different point values for different rounds, which means that running a tournament on a tapered scoring system would require it to be tabulated by hand, which is impractical.  Fortunately, one of the authors contributes to a popular tabbing program (Tabbycat) and is planning to make the necessary modifications so that we can test this out at a real tournament.  If test runs are successful, this software modification can then be made available to other tournaments, including Worlds.

Some readers may be concerned about drawing significant conclusions based largely on computer simulations of tournaments.  Although we understand where this feeling comes from, it is crucially important to realize that it is almost impossible for “real world” data to provide relevant evidence for these kinds of conclusions, because one cannot ever get the necessary insight into the actual demonstrated skill of teams in a real tournament, which is what allows one to distinguish between how deserving various teams are (i.e., what their rank on the tab should be).  Only a simulation that starts by stipulating performance quality can take the necessary “God’s eye” perspective here, and only a computer can run a sufficient number of trials to assure us that the results we are seeing are statistically significant.  In short, there’s simply no better approach to these questions than computer simulations.

Next page