Fairness, Scoring Systems and the World Universities Debating Championship

by R. Eric Barnes, Paul Kehle, Chuan-Zheng Lee & Hugh N. McKenny

Tapered scoring creates more accurate breaks than traditional scoring in British Parliamentary (BP) debate tournaments. This paper further examines how BP scoring systems work and addresses some concerns that have been raised about tapered scoring. We begin by deploying some intuitive metrics that have been used in the past to evaluate break accuracy. A significant portion of the paper is devoted to evaluating the level of influence that different rounds have. We consider whether there is any good justification for different rounds having different levels of influence and also whether tapered scoring would unjustly impact certain teams. The paper explores various other topics relevant to understanding scoring systems, such as how call accuracy changes by round, the effect of pulling up teams, and the ability of teams to recover. We end by discussing two broader themes, how to rationally compare competing scoring systems and how to assess the fundamental model that we have used to justify many of our conclusions. This paper assumes a familiarity with our previous paper “Break Accuracy: Optimizing the Sorting Power of Preliminary Rounds Using Tapered Points”.

Section 9: Conclusion

The purpose of this paper has been to deepen the community’s understanding of how scoring systems work at large debate tournaments in the British Parliamentary style, such as the World Championships.  A strong case was made in our previous paper for the use of tapered scoring, but here we show that the case is even stronger.  New metrics show that to the extent that tapered scoring produces unfair results, these results are consistently fairer than results produced by the status quo system of scoring.

Perhaps the most important insight here is our analysis of round influence.  It was common knowledge in debate circles that the later rounds were more influential in determining one’s success than the earlier rounds, but we don’t think anyone fully appreciated the actual gap in how much different rounds impact one’s final ranking.  This analysis demonstrates that far from making some rounds too influential, tapered scoring actually equalizes the impact that the various rounds have.  This round influence analysis likely has interesting connections to call accuracy in various rounds, which we are excited to explore more in a future paper.

Interestingly, initial objections to tapered scoring often turned out to be advantages of the tapered system.  Most obviously, an initially popular objection to tapered scoring claimed that it is unfair because it made some rounds more influential than others.  In fact, tapered scoring solves the problem is the status quo of some rounds being radically more influential, which amplifies bad luck from imbalanced motions, bad judging, etc.  Similarly, our analysis of how pull-ups work confirms that tapered scoring would be fairer than the status quo in this regard as well. 

Finally, we have addressed two other objections to our earlier research, one simple and one quite subtle.  We have shown that tapered scoring allows teams to recover from a bad round or two, and so is not as unforgiving as it might appear.  Indeed, the ability of teams to recover is part of what allows the system to be as accurate as it is.  The subtler objections came from those who challenged how our model was constructed, such as our decisions regarding which elements of a tournament to include in the model and how we set our “noise” variables.  We took some time here to justify the choices we made.  We explained why different seemingly reasonable choices would likely not improve the model and would be very unlikely to alter the outcome in any case.

At several points, we indicate that more research is necessary to fully explore an area.  Call accuracy is one area that will likely require a paper of its own.  Additionally, there are several other aspects of scoring systems that we do not mention here that are also deserving of discussion and will be part of our future work.

Appendix & Works Cited