Section 9: Conclusion
The purpose of this paper has been to deepen the community’s understanding of how scoring systems work at large debate tournaments in the British Parliamentary style, such as the World Championships. A strong case was made in our previous paper for the use of tapered scoring, but here we show that the case is even stronger. New metrics show that to the extent that tapered scoring produces unfair results, these results are consistently fairer than results produced by the status quo system of scoring.
Perhaps the most important insight here is our analysis of round influence. It was common knowledge in debate circles that the later rounds were more influential in determining one’s success than the earlier rounds, but we don’t think anyone fully appreciated the actual gap in how much different rounds impact one’s final ranking. This analysis demonstrates that far from making some rounds too influential, tapered scoring actually equalizes the impact that the various rounds have. This round influence analysis likely has interesting connections to call accuracy in various rounds, which we are excited to explore more in a future paper.
Interestingly, initial objections to tapered scoring often turned out to be advantages of the tapered system. Most obviously, an initially popular objection to tapered scoring claimed that it is unfair because it made some rounds more influential than others. In fact, tapered scoring solves the problem is the status quo of some rounds being radically more influential, which amplifies bad luck from imbalanced motions, bad judging, etc. Similarly, our analysis of how pull-ups work confirms that tapered scoring would be fairer than the status quo in this regard as well.
Finally, we have addressed two other objections to our earlier research, one simple and one quite subtle. We have shown that tapered scoring allows teams to recover from a bad round or two, and so is not as unforgiving as it might appear. Indeed, the ability of teams to recover is part of what allows the system to be as accurate as it is. The subtler objections came from those who challenged how our model was constructed, such as our decisions regarding which elements of a tournament to include in the model and how we set our “noise” variables. We took some time here to justify the choices we made. We explained why different seemingly reasonable choices would likely not improve the model and would be very unlikely to alter the outcome in any case.
At several points, we indicate that more research is necessary to fully explore an area. Call accuracy is one area that will likely require a paper of its own. Additionally, there are several other aspects of scoring systems that we do not mention here that are also deserving of discussion and will be part of our future work.