Section 1: Introduction
In our previous paper “Break Accuracy: Optimizing the Sorting Power of Preliminary Rounds Using Tapered Points”, we wrote about improving the accuracy of the break at British Parliamentary debate tournaments. For that research, we used computer simulations of millions of tournaments to analyze how the accuracy of the break could be improved. A more accurate break means a break where teams who debated better at that tournament are more likely to break, and where the teams who do break are better ordered according to this demonstrated skill. We first confirmed the widely held belief that running more preliminary rounds would increase break accuracy significantly, and then we showed that using a different scoring system would also result in a much more accurate break. Using five different metrics, we showed that by using a new scoring system called Early Taper (ET) over 9 rounds, we could achieve a more accurate break than by using the Status Quo (SQ), even if the SQ scoring system were used for 18 rounds. The ET scoring system offers more points in the early rounds of the tournament and then a typical amount of points in later rounds, as shown in the table here.
The Break Accuracy paper explained why tapered point systems like ET work better than the status quo system and also answered various objections to adopting the new system. The present paper continues to look deeper into how scoring systems work, how to measure their success and how to fully address the most common concerns about implementing ET. Section 2 takes a closer look at the results of a few randomly selected pairs of simulations that share identical inputs, to see how the results of ET and SQ differ when holding everything constant. We also introduce two new metrics that can be aggregated over thousands of simulations. Section 3 explores how much influence the results from each of the 9 rounds at Worlds has on a team’s final standing in both SQ and ET scoring systems. We also look closely at the claim that ET scoring might be detrimental to the competitive success of teams from disadvantaged institutions and also how ET might help these teams. Section 4 looks at what we can learn from our model about which round has the most reliable and valid decisions, taking into consideration a range of assumptions about judge packing. Basically, in which round are judge panels most likely to give the correct call. Section 5 looks at how ET impacts pull-ups. Section 6 examines how ET affects a team’s ability to recover from bad rounds. Section 7 discusses how to weigh the various trade-offs in scoring system advantages and disadvantages regarding fairness. And, lastly, Section 8 takes a few steps back from the details and discusses the overall robustness of the model we are using and addresses a range of concerns about its applicability to actual debate tournaments.