Fairness, Scoring Systems and the World Universities Debating Championship

by R. Eric Barnes, Paul Kehle, Chuan-Zheng Lee & Hugh N. McKenny

Tapered scoring creates more accurate breaks than traditional scoring in British Parliamentary (BP) debate tournaments. This paper further examines how BP scoring systems work and addresses some concerns that have been raised about tapered scoring. We begin by deploying some intuitive metrics that have been used in the past to evaluate break accuracy. A significant portion of the paper is devoted to evaluating the level of influence that different rounds have. We consider whether there is any good justification for different rounds having different levels of influence and also whether tapered scoring would unjustly impact certain teams. The paper explores various other topics relevant to understanding scoring systems, such as how call accuracy changes by round, the effect of pulling up teams, and the ability of teams to recover. We end by discussing two broader themes, how to rationally compare competing scoring systems and how to assess the fundamental model that we have used to justify many of our conclusions. This paper assumes a familiarity with our previous paper “Break Accuracy: Optimizing the Sorting Power of Preliminary Rounds Using Tapered Points”.

3.4 | Addressing Ex Ante Disadvantage

Although the central focus of our research is accurate scoring systems, addressing systemic disadvantage is a pressing concern and this seems as good a place as any to discuss some of our most relevant findings about this topic, which concern possible bias in awarding speaker points.  But before getting into our findings, it is worth acknowledging that some policies have a large impact in perpetuating or solving a problem, and other policies have a smaller role to play; some policies are costly or difficult to implement, while others are easy.  

There are several reasonable policies that have been proposed and are likely to help teams from EAD institutions and which should be implemented.[18] These policies include:

  • Budget more money for independent adjudicators from disadvantaged regions;
  • Offer a less expensive tier of accommodations at WUDC to increase accessibility;
  • Increase WUDC registration fees in order to support financial aid for EAD teams;
  • Select motions that are more equitable to teams from disadvantaged regions;
  • Increase the availability of online learning opportunities for EAD institutions, by experienced WUDC judges (e.g., workshops, online tournaments, etc.)
  • Make judges aware of the unfair impact of implicit bias in speaker points.

While none of these directly concerns tapered scoring (our focus here), the need for the last policy regarding speaker points became clear during our research, as we explain in this section.

There is a widespread belief among debaters that speaker points are imprecise, inaccurate, arbitrary or some combination of these.  On top of this, many people believe that speaker points are a prominent vehicle through with implicit bias is unwittingly expressed by judging panels.  Implicit bias is pervasive around the world and surely also impacts team rankings in some cases, but there is reason to believe that it has a stronger impact on speaker points, which (unlike team points) are not a zero-sum game.  

We used the survey data on EAD teams to look into whether there is consistent bias in the assignment of speaker points.[19] It is not surprising that EAD teams would generally end the WUDC ranked lower than non-EAD teams. That’s what systemic disadvantages tend to do, by their very definition. For the same reason, we expect that EAD teams will, as a group, have lower average speaker points than non-EAD teams. But, if an EAD team and a non-EAD team both end the tournament on the same number of team points (i.e., end in the same bracket), there is no reason to expect that the EAD team will have fewer speaker points.[20] Within the range of demonstrated skill that EAD teams occupy, their demonstrated skill level should be quite evenly distributed, just like non-EAD teams. So, there is no justifiable reason to expect EAD teams to have more or fewer speaker points than non-EAD teams in the same point bracket. Thus, since speaker points should represent a cardinal measure of a team’s demonstrated skill, average speaker points for EAD and non-EAD teams should be approximately the same within the same bracket. But when we compared the speaker points earned by EAD teams to non-EAD teams within the same bracket, we found that EAD teams were awarded an average of 10 fewer speaker points.

To be clear, we are not saying that this apparent discrimination is directly connected to these teams being ex ante disadvantaged.  Indeed, because ex ante disadvantage concerns what happens before the tournament, it does not make sense as a direct cause of this.  The more likely culprit here is implicit bias.  Many of the debaters from EAD schools may be more likely to trigger implicit bias and this may account for this speaker points deficit.

While using tapered points cannot eliminate the apparent discriminatory problem of using speaker points to rank teams on the final tab, it can mitigate the harms that come from using speaker points. Scoring systems that create larger brackets (i.e., sets of teams tied on team points) give more power to speaker points. If a system resulted in just two or three teams in each bracket, then speaker points would have very little power in the final rankings and in determining the break. By cutting the size of brackets by more than half, ET dramatically reduces the power that speaker points have.[21] So, adopting tapered scoring may make it easier for EAD teams to break.

Next page

[18]  Some of these recommendations are taken from the “Report of the World Universities Debating Championships” (written by Tshiamo Malatji after the 2019 WUDC in Cape Town) and a letter to the WUDC Council written by Jamie Mighti sent before the 2019 WUDC.

[19]  For the purposes of this section “EAD teams” are teams who were categorized as either moderate EAD or high EAD by our survey analysis, described in the appendix.  In contrast, we use “non-EAD teams” to refer to teams who were categorized as either not EAD or low EAD by our survey analysis.

[20]  As an analogy, imagine that shares of stock in biotech corporations tend to cost more than shares of real estate corporations.  Perhaps the average price of the former is $60 while the average price of the latter is $40, with shares selling in one cent increments.  If we focus just on those stock shares selling between $27 and $28 (or any $1 interval that both occupy), then there’s no reason to expect that the biotech stocks in that interval will be more expensive than real estate stocks.

[21]  As we note in Break Accuracy, “There are other ways to break ties in team points, other than using speaker points, and using these would reduce the impact of systemic bias. Unfortunately, our research strongly suggests that the most obvious alternate method to break ties (using strength of opponents) leads to significantly less accurate breaks. We have not found any methods of breaking ties on the bubble that is clearly superior to using speaker points.”  (Barnes, Kehle, Mckenny, & Lee, 2020)  Using ET and breaking ties with opponent’s strength is more accurate than using SQ and breaking ties with speaker points, but it isn’t as accurate as using ET and breaking ties with speaker points.  But, we are open to the idea that it may nevertheless be worth sacrificing some accuracy in the break to eliminate the use of speaker points entirely from the team tab.