Age Handicapping Competitive Runners, Part 2: Tables for Speed Handicaps

<Go directly to age handicapping tables>

Introduction

Can the age-related decline in running speed seen in single age world record holders be meaningfully translated into an age handicapping system for local competitive runners?  I use the term “competitive” runners to designate the subset of runners in local races who prepare for and attempt to give their best performance in the race.  Competitors are essentially distinct from the relatively large group of social and recreational participants who are looking for a “fun” run, an opportunity to share an activity with a friend or friends, or to support some greater community cause.

When we consider the full spectrum of local race participants, whether social, recreational, or competitive, current models based on world records clearly do not work very well as was shown in Racing Among the Ages.  However, perhaps it is inherently less useful to age handicap the recreational and social participant subgroups than it is to age handicap the truly competitive runners who strive for the best performance that is possible for them.  One might suspect that five year age group winners, especially in larger local races, largely consist of truly competitive runners.  Certainly, not every competitive runner will win his or her age group.  However, as we go deeper into the order, it becomes progressively more difficult to distinguish between competitive and non-competitive participants based solely on their time.  Consequently, in this article, the word “local” runner or “local class” refers to data and models based on the records of age group winners in local races.  The term “world class” will refer to models and projections based on single age world records.

With this clarification, the initial question can be reframed as follows: Can the age related decline in speed among world class runners be used to generate an age handicapping system for local class runners (and everyone in between)?

Several popular web sites are constructed on this premise, which is largely untested. Two popular age grading calculators are Aging in Sports and Chess and the WMA Age-grading calculator.  Many other age grading sites are derived, directly or indirectly, from these two sites.  In a 2007 publication, the author of the first site, Ray C. Fair, has questioned “Does a person of average talent … who is in good shape slow down at a similar percent rate as elite athletes?”, p53, (italics added).  The second site also uses a model that assumes a comparable percent decline between world record and more average competitors.  In “Age-graded performances”, the principle author of this second site, Howard Grubb, has stated that “super-veteran (aged over 60 or so) athletes run more slowly at the moment than expected.”

So it is reasonable to be skeptical of the untested assumption that world and local athletes slow down at the same percent with age.  However, there are other ways to model the decline in speed.

A Metric Based on the Absolute Change in Speed.

This article examines a simple alternative to the “Percent for Age” method used by current age grading systems. With the proposed alternative, which I will call “Age Speed Addition”, age related performance changes are modelled as absolute differences in speed, whereas current age grading methods assume age related changes can be expressed on a relative (i.e. percent) scale.

To illustrate these two methods, I started with the single age world records for the male road 5K from the Association of Road Racing Statisticians, www.arrs.net.  The values in this dataset were equalized for the underlying single age population sizes as described in “Age Handicapping Competitive Runners, Part1: Quantifying the Population Effect”. The dataset was also smoothed using the Savitzky-Golay filter as described in the Appendix to this article to give the following equivalent speeds based on world records:

  • World 25 year old male: 14.11 mph
  • World 82 year old male: 8.28 mph

Note that the world 82 year old runs at 58.6% of the speed of the 25 year old and that he is 5.84 mph slower.

The “Percent by Age” method (as used by most current age grading systems) would suggest that the 82 year old competitive runner in a local race should run at 58.6% of the speed of his equivalent 25 year old competitor. The absolute speed method suggests the local 82 year old should run 5.84 mph slower.

To illustrate the application of these methods to local competitors, I will use the single year equivalent performance of male age group winners in 356 local 5K races having between 500 and 999 total participants (see Racing Among the Ages).  As with the world records, these local data were also equalized for population and smoothed per the Appendix.  From this we find that the equalized speed of local 25 year olds is 10.84 mph whereas the equivalent speed of a local 82 year old is 4.76 mph.  The following table summarizes these results:

The “Percent by Age” method suggests that the handicapped speed of the local 82 year old be calculated as  4.76/.586 = 8.12 mph.  On the other hand, the absolute “Age Speed Addition” method handicaps the speed of the 82 year old at 4.76 + 5.84 = 10.60 mph.  As you can see, in this case, the “age speed addition” model provides a handicapped speed that is much closer to the target 10.84 mph of the equalized 25 year old local competitor.

The graph below compares the handicapped speeds for local 5K male competitors between the ages of 25 and 85. The formulas described in Age Handicapping Competitive Runners, Part1: Quantifying the Population Effect were used to get speeds representing the same percentile among the populations for each age.  Consequently a perfect age handicapping system should produce handicapped speeds that are the same for all ages.

In the graph, note that the “Age Speed Addition” method gives handicapped speeds that stay approximately within +/-0.5 mph for the entire range of ages. However, even though it does very well prior to the mid-sixties, the “Percent by Age” method fails rapidly after the mid-sixties, confirming Howard Grubb’s earlier concern.  By way of comparison, the average deviation of speed handicapped by the “Percent by Age” method was 3 times larger than the average deviation of speed handicapped by the “Age Speed Addition” method.

A future article will provide an in depth comparison of the Age Speed Addition method proposed here versus current Age Grading methodology. Suffice it to say here that Age Speed Addition represents a substantial improvement on current methods.

****

Tables of Speed Additions for Age Handicapping Competitive Runners

Single age world records for the Road 5K, 10K, Half Marathon, and Marathon were combined to generate the tables shown below. This data was provided by the Association of Road Racing Statisticians, www.arrs.net.  Incidentally, with age, the absolute speed declines comparably for all of these distances, so, for each gender, a single table is applicable for all distances between 5K and the Marathon.  Note that the “Age Speed Additions” are expressed as MPH, Miles Per Hour.

Appendix: Data Smoothing

Alan Jones has done a good job of explaining the current Age Grading methodology in his article “Age grading running races”.  The methodology is used to create a curve which dominates all single age records and still comes as close to the data as possible.

On the other hand, for the “Age Speed Addition” tables developed here, I use a non-parametric (or, more accurately, pan-parametric) data smoothing methodology. This has the advantage of producing a more adaptive curve and also of incorporating information from every data point.  In the area of signal processing, this smoothing technique is called the Savitzky-Golay filter.  The graph below shows the population adjusted world records for the 5K smoothed with a quadratic S-G filter having a range of 9 below age 30 and a range of 21 for age 30 and above.   All population adjustments use the formulas developed in Part 1 of this series and adjust to the equivalent population at 30 years of age.

To get single year equivalent performances based on 5 year age group winners in local races, I used rolling 5 year intervals and interpolated to integer ages. The results were then adjusted for population and smoothed with an S-G filter as indicated above.

 

Age Handicapping Competitive Runners, Part1: Quantifying the Population Effect

Introduction

Handicapping sporting events has been applied to a wide range of human and animal competitive endeavors.  Wikipedia defines handicapping sporting events as “the practice of assigning advantage . . . to equalize the chances of winning.”

Equalization of performance is the essential feature of any handicapping system.  Most people have passing familiarity with “golf handicaps” which according to the USGA enable “players of differing abilities to compete on an equitable basis.”  Similarly, according to HorseRacing.com, handicapping involves “the practice of adding weight to horses in an effort to equalize their performance.”

Within the sport of human long distance running, handicapping the performances of runners according to age is sometimes referred to as “Age Grading.”   The goal of Age Grading is to equalize the performance and thus provide a “level playing field” for runners of differing ages.  For example, a 30 year old and an 80 year old can compare their Marathon performances to see who performed better for their age.  Or commonly, a 60 year old runner might compare his or her current speed with their speed from 20 years ago after adjusting for the effects of age.

Currently, the best known methods for age handicapping long distance running leverage single age world records in track and field and in road racing.  Several individuals who have loaned their expertise to this endeavor are Howard Grubb, R. C. Fair, Elmer Sterken, and Alan Jones.  Most of these systems for age-grading differ only slightly based on model assumptions and the date they were developed (i.e. some models may have had access to more recent world records.)   Two popular calculators are:    WMA Age-grading calculator and Aging in Sports and Chess.

Nevertheless, these methods of age-grading are not without controversy.  In “Age-graded performances” Howard Grubb has worried that “super-veteran (aged over 60 or so) athletes run more slowly at the moment than expected.”  In a 2003 article titled “From the cradle to the grave: How fast can we run?” Elmer Sterken reached a similar conclusion, as did I in a more recent large study of U.S. based 5K races.

Age Handicapping Based on Population

Since equalization is the essential feature of any handicapping system and for Age Grading in particular, we need to consider the sense in which the world record for, say, a 80 year old can be equated to the world record for a 30 year old. Or similarly how can the 30-34 age group winner in a local race be equated to the 75-79 age group winner?

Among adults, almost all of the single age world record holders for marathons, half-marathons, 10K road races, and 5K road races come from a country in the developed world, or from Kenya and Ethiopia. However, combining all of these countries shows that the male population of 80 year olds is only 30% of the population of 30 year olds.  Obviously there are two possible reasons for the smaller population of 80 year olds:  either fewer people were born 80 years ago than 30 years ago, or more of the older group has died.  In either case, the smaller number of potential competitors makes the older group somewhat less competitive.

Frequently (as in “How Fast Do Old Men Slow Down?” by R.C. Fair and “Age grading running races” by Alan Jones), the very best single age world records for each distance are fitted with a model in an attempt to estimate the upper, “biological limit” or frontier of human performance.  Factors derived from these models are then used by the above referenced Age Grading calculators.

However, another, and potentially more generalizable, way to understand these single age world records is to view them as the speeds attained by (single year) age group winners in an extremely large “race” consisting of everyone who has lived in the last 100 years or so.  Thus, for example, with the road 10K, the world record for 63 year old males was set in 1994 by Ed Whitlock.  Ed, then, is the winner among all 63 year olds who have ever been alive at some point in the past century.

Whether we consider age group winners in local races with 5-year age group intervals or single age world records, it is possible to formalize the impact of the size of the underlying age group population.  For example, to compare the winners of two age groups, symbolized by “j” and “k”, let

Pj, Pk = the total populations, summed over the relevant geography, that fall into the jth and kth age groups, respectively.

Wj(s), Wk(s) = Wj, Wk = the cumulative probability distribution functions (cdf) for the speeds, s, of the winners of the jth and kth age groups, respectively.

As is shown in the appendix, the winners of the two age groups will be at the same percentile among their peers and hence have equivalent age-adjusted performances when

Wj = Wk^(Pj/Pk)

where “^” is the power operator, i.e.   

For example, Racing Among the Ages  presented information on 1283 5K races from all across the U.S.  Included among these races were 356 which are classified into the “large race” category, having between 500 and 999 total finishers.  Letting “j” be males aged 75-79 and “k” be males 30-34, we can use these 356 large races to illustrate how age-group population size can be employed to provide an equalized comparison of the age group winners in these two age groups.

At the last census (2010), the U.S. Census Bureau estimated the U.S. population of males aged 75-79 was Pj = 3,182,388 and the population of males ages 30-34 was Pk = 9,996,500.  Therefore Pj/Pk = 0.32.

By definition, the median speed for age group winners among males 30-34 occurs at the 50th percentile, i.e. when Wk = 0.50. Substituting into the above formula shows

Wj = 0.50^0.32 = 0.80

Consequently, 80th percentile among the M75-79 age group winners is equivalent to the 50th percentile among the M30-34 age group winners.

The inclusive median speed among 356 races occurs at the midpoint between the 178th and 179th fastest age group winners. For M30-34 this value is 10.30 mph (18:05), and for M75-79 this median value is 4.94 mph (37:45).  However, the median of the M75-79 does not represent an equivalent performance among the peer group.  The 80th percentile for age group winners among M75-79 is 6.26 mph (29:49).  Thus, a time of 18:05 for M30-34 is equivalent to a time of 29:49 for M75-79.

In summary, the winners of the “j”th and “k”th age groups in a particular race will be at the same percentile among their peers and hence have equivalent performance if

Wj = Wk^(Pj/Pk)

Note that, in order to compare the performances of different age groups, it is not necessary to know anything about the distribution of speeds for individuals within either underlying age group population.  Nor is it necessary to know the precise sizes of the underlying age group populations, Pk and Pj.  All that is needed is the population ratio and the distribution of speeds among age group winners.

For both age group winners in local races and single age state running records, the distribution of the winning speeds for each age group can be determined by examining several local races or the single age records across several different states.   However, by definition, there is just one world record for each age.  Nevertheless, it is possible to look at the residuals from a fitted model to estimate the distribution of speeds among single age world record holders.  In doing this, we note that the standard deviations from the fitted model increase with age and must be estimated appropriately.

Future Article on Age Handicapping Competitive Running

In a future article, we will apply this simple but elegant formula to age handicap 11 different racing venues with distances ranging between the 5K and the Marathon and competitiveness ranging from age group winners in small local races to single age world record holders.   Moreover, the age handicapping system thus obtained is both simpler and substantially more accurate than current methods.

Appendix:  Computational Outline

By definition any event or venue that is open to all comers has a sampling intensity or “Reach” (R) that is similar for each age group in the applicable geography.  However, this does not mean that the expected number of actual participants in the race will be proportional to the population (Pi) for each age group.  The expected number of participants in a given age group will be proportional to the product of the underlying population and the fraction (Fi) of that population that is Fit and motivated enough to compete in a given race or venue.  Thus the expected number of participants in an age group is R(Fi)Pi.

In the earlier example, we saw that the U.S. population of males between 75 and 79 is 32% of the population aged between 30 and 34.  However, in 1283 U.S. based races, there were only 3% as many individuals in the M75-79 group as were in the M30-34 group.  Thus, among the older group approximately 10% as many are sufficiently Fit and motivated to participate in races.  Undoubtedly, physical limitations prevent many older adults from participating.

The sampling intensity or “Reach” factor, R, would not come into play for world records (except possibly for the impact of various international political considerations), i.e. it has a value of 1.  However, based on marketing, each local race can have its own value for R since some individuals who are fit and willing to participate in a race may not hear about it in time to register; or since some individuals may not participate in a particular event because they have chosen another more desirable event that occurs at the same time.

For any particular event, the number of individuals in the applicable age group population who are unfit or unwilling to compete is (1-Fi)Pi.  Had these individuals been fit and willing to participate, we would expect R(1-Fi)Pi of them to have participated in the event.   Nevertheless, in evaluating an age group winner’s performance among his peers, it reasonable to consider him faster than both all of his peers who participated in the race, plus the expected number of potential participants who did not participate because they are unable or too slow to complete the race successfully.  Thus the age-group winner is the fastest among R(Fi)Pi+ R(1-Fi)Pi = RPi peers.

Then for a given distance (e.g. marathon, half marathon, 10K, 5K) and gender, let

s = speed of an individual at the event.

R = the fraction of individuals in the entire population who participate in the event among all those who are fit and otherwise capable.

Pi = the total population, summed over the relevant geography, that falls into the “i”th age group.

Fi = the fraction of the “i”th age group that is fit and motivated enough to compete in a given race or venue

Ei(s) = Ei = the cumulative probability distribution function (cdf) for speed in the “i”th age group; i.e., it is the percent of the entire population falling into the “i”th age group that are slower than or equal to a speed of “s”.  Note that the fraction of individuals in the “i”th age group who are either unable or unwilling to compete in the race is simply Ei(0).

Wi(s) = Wi = the cumulative probability distribution function (cdf) for the speed of the winners in the “i”th age group.

Since Wi is the cdf of the maximum for a sample of size RPi with cdf Ei,

Wi = Ei^(RPi) 

 thus

Ei = Wi^(1/RPi)

Suppose two individuals belong to different age groups, the “k”th age group and the “j”th age group.  Among their peers, their performances will be equivalent if they each achieve the same percentile; i.e. if

Ek = Ej  

Consequently, the winners of these age groups will be at the same percentile among their peers when

Wj^(1/RPj) = Wk^(1/RPk)

Simplifying this expression yields

Wj^(1/Pj) = Wk^(1/Pk)

Wj = Wk^(Pj/Pk)

It is important to note that this result does not depend on the functional form of the population cdf, Ek(s) and Ej(s), for either age group.  Nor does it depend on knowledge of the exact population, Pk and Pj, of either age group.  All that is needed is the population ratio.