Fairness, Scoring Systems and the World Universities Debating Championship

by R. Eric Barnes, Paul Kehle, Chuan-Zheng Lee & Hugh N. McKenny

Tapered scoring creates more accurate breaks than traditional scoring in British Parliamentary (BP) debate tournaments. This paper further examines how BP scoring systems work and addresses some concerns that have been raised about tapered scoring. We begin by deploying some intuitive metrics that have been used in the past to evaluate break accuracy. A significant portion of the paper is devoted to evaluating the level of influence that different rounds have. We consider whether there is any good justification for different rounds having different levels of influence and also whether tapered scoring would unjustly impact certain teams. The paper explores various other topics relevant to understanding scoring systems, such as how call accuracy changes by round, the effect of pulling up teams, and the ability of teams to recover. We end by discussing two broader themes, how to rationally compare competing scoring systems and how to assess the fundamental model that we have used to justify many of our conclusions. This paper assumes a familiarity with our previous paper “Break Accuracy: Optimizing the Sorting Power of Preliminary Rounds Using Tapered Points”.

Section 5: Tapered Scoring and Pull-Ups

One primary factor that drives sorting accuracy is power pairing, along with the other rules that are used to create the draw.  Power pairing is a very useful tool in improving break accuracy, especially when combined with tapered scoring.[23]  But the rules for creating the draw go beyond just power pairing, and these other rules matter too.  In particular, our research suggests that improvements could be achieved by a simple rule change around pull-ups, and combining this with the use of tapered points would make the tournament more equitable.

A team being pulled up into a higher bracket is a disadvantage because that team will compete against harder teams and therefore be less likely to get points.  So, other things being equal, if you get pulled up, you want the round you are pulled up in to have as little influence as possible.[24]  Because of this, if you are getting pulled up in the first four rounds, you are better off under the SQ scoring system, since SQ makes the results of these rounds significantly less influential.  But, if you are getting pulled up in the last four rounds, you are better off under the ET scoring system, since the loss of points then is much less influential on one’s final placement on the tab.  The influence of Round 5 is almost identical in both scoring systems.  Since early round pull-ups are worse in ET and late round pull-ups are worse in SQ (and they are equally harmful in Round 5), one might think that this makes the issue of pull-ups a tie between the two scoring systems.  But this is a mistake.  First, there are no pull-ups in Round 1, so that means pull-ups are more harmful in ET for three rounds, and more harmful in SQ for four rounds.  Second, there are far fewer pull-ups in early rounds (about 20% of the total) than in later rounds (about 67% of the total, with the remaining ≈13% in Round 5).  So, far fewer pull-ups occur in circumstances where SQ is preferable.  Third and most importantly, when comparing the relative harm from pull-ups in ET and SQ, the worst-case-scenario point of comparison for ET is Round 4, in which ET scoring makes the round is slightly more influential than an average round.  In contrast, the worst-case-scenario point of comparison for a pull-up in SQ is Round 9, in which SQ has almost triple the influence of an average round.  (In SQ, Round 8 has double the influence of normal round.)  So, if one is scared about the adverse effects of pull-ups, it is clear that pull-ups are more likely to be seriously harmful in SQ than in ET.

Actually, the story here is a bit more complicated than that.  ET awards more team points during the tournament, so teams end up in a wider range of brackets (i.e., point levels) in rounds 3 – 9, and each bracket has fewer teams.  ET more effectively sorts teams of similar skill level into these smaller brackets.  From this, two things follow:  1) the difference in average skill level between adjacent brackets is smaller in ET; 2) there is a need for more pull-ups in ET.  So, pull-ups will happen to more teams in ET, but when they do happen, they will be less detrimental.  

The risk of a random team getting fewer points in a given round because they were pulled up is likely the same in SQ and ET, but if ET is used, then the harms will be spread out more evenly across a larger portion of the teams.  Essentially, ET reduces the negative impact of pull-ups in two ways.  First, teams are less likely to be outmatched when pulled up because the next bracket up isn’t that much stronger.  Second, ET has no ultra-high stakes rounds like those that happen in SQ on day 3, where a pull-up is likely into a significantly stronger room and where the outcome of that round will have much more influence on where you end up on the final tab.  

Obviously, BP debate requires rooms of four teams even, though brackets often don’t have teams in multiples of four.  So, someone needs to pay the price (largely at random) for making rooms of four.  Pull-ups are like a necessary tax that is levied on some debaters in order to make tournaments function.  The total pull-up tax paid by teams in ET is likely less than in SQ, so that’s a reason to prefer ET.  But even if the total tax were the same for both scoring systems, SQ imposes a larger tax on fewer debaters, while ET imposes a smaller tax on more debaters, making it more equitable.  

Additionally, with a minor (and long overdue) change in WUDC rules about selecting teams being pulled up, tab software could protect teams from multiple pull-ups.  Right now, when selecting a team to be pulled up, the tab team (i.e., the tab software) is forbidden from considering how many times a team has already been pulled up.  Though not essential to our point here, it is clearly unfair to subject some teams to multiple pull-ups when there is a technological mechanism to make the distribution of this burden more equal.  Having more pull-ups that are less harmful is more equitable, especially if we ensure than no one is disadvantaged twice.  In either system, current rules do nothing to prevent teams from being taxed more than once while others have paid no taxes at all.  We can all agree that it is unfair to not address this injustice.

Next page

[23]  Interestingly, in SQ, sorting would actually be improved by having more random rounds in the middle of the tournament, combined with other rounds being power paired.  This is not true with ET.

[24]  See Section 3 on round influence.