Categories
Uncategorized

Break Accuracy: Optimizing the Sorting Power of Preliminary Rounds Using Tapered Points

by R. Eric Barnes, Paul Kehle, Nick McKenny and Chuan-Zheng Lee • HWS

Ideally, preliminary rounds at tournaments sort teams so that the best teams break into elimination rounds. At the World Championships of debate, the scoring system during the nine preliminary rounds does a poor job of sorting teams accurately. Adding additional rounds would increase the accuracy and fairness, but this is impractical. Using mathematical models and computer simulations of tournaments, we show that using a slightly different scoring system over the nine preliminary rounds would improve the accuracy of the break even more than would doubling the number of preliminary rounds to 18. Other implications and insights into tabulation and sorting accuracy are also discussed.

6.3 | The Most Accurate System

Of the scoring systems examined so far, the Day Taper system is most appealing, but we wanted to identify the best system possible for a 9-round tournament of WUDC’s size.  To ensure we are endorsing the best system, we used a systematic approach to the problem, testing every point pattern over the course of 9 rounds, within the following constraints:  1) Team points remain in proportions of 3,2,1,0;  2) Team points are always in whole numbers;  3) Team points received by the first-place team in a single round never exceed 36;  4) Team points do not increase in subsequent rounds.[19]

Table 6: Point allocations in each round for Early Taper

After this fairly exhaustive search, the scoring system that clearly performed best is what we will call the “Early Taper” system (see Table 6).  There was a significant gap between the performance of this system and any other system.  The charts in Figures 9.1 – 9.6 show how the Early Taper system (ET) performs according to our standard metrics.  Because ET performs so much better than the alternatives, this is the system that we recommend be adopted at Worlds.

Figures 9.1 to 9.6: Metrics for SQ (9 rounds), Early Taper, SQ (18 rounds), DT and RT (360 teams, with noise)

The most important information here is a comparison between ET and the status quo system.  It is clear enough from the data set in these charts that the status quo system performs worse according to all five metrics, but we worry that the charts might not sufficiently emphasize how much worse the status quo performs.  Table 7 calculates the degree to which the status quo falls short of ET.  Another way of understanding the improvement that ET represents is to notice that the average ET score on every metric is better than the 75th percentile of SQ results.  Indeed, in most metrics, the 25th percentile of ET is better than the 75th percentile of SQ.  It would surely be worth accepting some significant costs to achieve improvements that are this dramatic, but in fact, these improvements can be achieved at essentially no cost.

Table 7: How much worse than status quo is Early Taper?

 In case some people find raw numbers unpersuasive, we have also simulated tournaments using the status quo system for double the number of rounds (18 preliminary rounds).

 

As we noted in Section 6.1, running double the number of preliminary rounds definitely makes the break more accurate.  But even running 18 rounds of SQ generates less accurate breaks than running an Early Taper system for just 9 rounds.  In short, the WUDC community can achieve a break accuracy better than holding 18 preliminary rounds if they simply change how points are awarded in the first four rounds of the tournament, with no other changes necessary.[20]

Next page


[19]  After performing this search for scoring systems assuming a field of 360 teams, we took the 25 best performing systems and compared these to each other under tournament sizes of 260 to 440 (at 20 team intervals).  ET was consistently the best performing system under all these tournament sizes.

Additionally, we did not arbitrarily ignore scoring systems that increased in value as the tournament progressed or (more plausibly) increased and then decreased in value.  We tested many versions of these systems and none performed well.  So, these variations were excluded only after due consideration.

[20]  In the Early Taper system, the last five rounds award points just like the status quo system.