Grading Age Grading: Evaluating Methods for Handicapping Competitive Runners

Introduction

Several methods have been proposed for age handicapping distance runners (5K through Marathon). The purpose of handicapping is two-fold:  it allows runners of different ages to compete on a “level playing field”, and it gives us some insight into the aging process itself.

In this article, I give letter grades (A, B, C, D, and F) to four different handicapping methods based on the accuracy of each when it is applied to runners competing in eleven different venues representing the local, state, and world competitive levels. As you will hopefully agree, I have taken care to ensure that the grading system is both fair and objective. Three of these methods are separately applicable to both males and females and one (YALE_AG) has been developed using data for males only.  Consequently, with 11 venues and 2 genders, three of these methods receive a letter grade in 22 different courses, and one method receives a letter grade in 11 courses.  Finally, after grades have been calculated for all of the courses, the overall GPA is calculated for each of the four methods.

For those desiring a more conventional approach to evaluating these four models, I also fit each model to the eleven venue test data and calculate generalized Coefficients of Determination (R2) for each model across the various venues. As you will see, there is excellent agreement between the letter grade and the Coefficient of Determination.

The four methods and links to references and principal authors are as follows:

Although they use different approaches, the first three of these methods are all based on the single age world records maintained by the Association of Road Racing Statisticians at arrs.net.  The 2017 factors for the YALE_AG method are given for ages 40 and above.  However, the authors recommend an adjustment to extend the age range back to 35 years.  This article uses the recommended adjustment.

The fourth method, UD_5KH, is unique in that it is based on the maximal oxygen uptake during treadmill tests using subjects of different ages. Originally it was developed for only the 5K but, more recently, it has been extended for distances up to the Marathon.  In addition to age, UD_5KH also includes an adjustment for bodyweight.  In this evaluation, the CDC 10th percentile for bodyweight by age was used for UD_5KH since this provided a slightly better fit than did higher percentiles and since competitive runners typically are leaner than the general populations.  Additional observations on each of these methods are discussed later in this article.

There are many age handicapping or “age grading” calculators available on-line; however, virtually all are derivatives of one of the four methods evaluated here (particularly of WMA_AG or its earlier versions sometimes denoted with the acronym “WAVA”).

Test Data: World, State, and Local Competition

The accuracy of the four methods is evaluated by measuring their performance against eleven different test venues representing a wide range of competitive abilities. For each gender, these eleven venues were split into 3 distinct classes of competitors representing the world, state, and local competitive levels.

Between the ages of 35 and 85, male and female single age speeds were obtained for each venue.   (Some venues did not have single age data that went up to age 85, in which case I used the maximum upper age available; only data for runners, avg. speed > 4.5 mph, were used.)

To insure that the speeds were equitable among the various ages within each venue, speeds were adjusted for the size of the relevant population as described in Age Handicapping Competitive Runners, Part1: Quantifying the Population Effect.  The data were also smoothed as described in the appendix to Age Handicapping Competitive Runners, Part 2: Tables for Speed Handicaps.

The test venues were as follows:

  1. The World single age records were obtained from the Association of Road Racing Statisticians, arrs.net, as follows:
  •       World Single Age Records for the Marathon
  •       World Single Age Records for the Half Marathon
  •       World Single Age Records for the 10K
  •       World Single Age Records for the 5K
  1. State Single Age Records for were obtained or linked from the StateRunningRecords.com for the following four venues:
  •       Marathon (AL,MO,MN,MS,NH,OK,TN)
  •       Half Marathon (AL,MO,MN,MS,NH,OK,TN)
  •       10K (AL,MO,MN,MS,NH,OK,TN)
  •       5K (AL,AZ,MO,MN,MS,NH,OK,TN)
  1. Local 5-year Age Group Winners in the 5K were obtained from data summarized in Racing Among the Ages. Based on race size, there were three Local venues:
  •     Very Large 5K races (302 local races, each with 1000 and 8641 finishers)
  •     Large 5K races (356 local races, each with 500 and 999 finishers)
  •     Medium 5K races (313 local races, each with 300 and 499 finishers)

[Note that the AZ state records were available for only the 5K due to a broken link at the time these data were compiled. More information about the local races can be obtained at Median Times for Top Finishers in 5K Road Races.]

I did not include local races with fewer than 300 finishers because of concerns that too many of the age groups would be won by non-competitive individuals (i.e. recreational walkers and runners) due to small age group sizes and the absence of a competitive runner in some groups.

Final GPA for Each Handicapping Method

Before we get too far into the weeds, here are the final Grade Point Averages (based on a 4 point scale) and overall letter grades for each of the four methods:

  •    YALE_AG:   1.27 D+
  •    WMA_AG:   2.45 C+
  •    BDR_AH:       3.64 A-
  •    UD_5KH:       0.00 F

Later I will provide more detail on the methodology to calculate grades and will break down the competitive levels where each method did or did not do well. I also discuss the technical reasons for the performances of each.

Lest the proponents of the WMA_AG methodology take some comfort in the “gentleman’s C+” received by their candidate, I point out here that the WMA_AG method received an “F” in three required courses. Also, our valedictorian, BDR_AH, will graduate with only an “A-“ average, indicating that there may be still room for improvement.

Coefficients of Determination, R2

Because the observed speeds for all of the single year data points within each of the eleven venues have been adjusted for the population at each age, these speeds represent equivalent (same population percentile) performances among the various ages (typically 35-85) within each venue.

Similarly, each of the four age-grading/handicapping models defines a family of performance isoquants, where an isoquant is a series of combinations of speed and age representing an equal level of performance. Thus, for example, using WMA_AG, a 35 year old male who completes a 5K road race at 11.3 mph (time=16:30) is on the same isoquant and has a performance equal to an 85 year old man running at approximately 6.3 mph (time=29:40).  On the other hand, using the isoquants from the BDR_AH method suggests approximately 5.1 mph (36:35) for an 85 year old is equivalent to 11.3 mph at age 35.

So the question becomes, how well does each of the four models fit the data in each of the 11 venues? To answer this question, the isoquant with the least mean squared error from the observed speeds (population adjusted) was determined for each model and each venue. Since YALE_AG is only applicable to males, this gives a total of 77 families of isoquants.  (Even though these models are, in general, non-linear in speed, each of the 77 families of isoquants is defined by a single parameter, so finding the optimum fit is relatively straight forward.)

The Coefficient of Determination (R2) was used to measure the fit of each method across the eleven venues. Tables 1a and 1b show these Coefficients of Determination.  The Coefficients of Determination are color coded as under 90% = Red, between 90% and 95% = Yellow, and over 95% = Green.  Note that only BDR_AH was Green across all venues.

The second best Coefficients of Determination were obtained by WMA_AG. Even though this method did well at the world and state levels, its performance was not very good at the local level. For example, in Table 1a, the Coefficient of Determination using the WMA_AG method for Age Group Winners in Local races with 500-999 finishers is 89%.  Thus the WMA_AG method explains only 89% of the variability in running speed across the various ages.  Figure 1 shows the best fit of the WMA_AG method to the male data for age group winners in Local races with between 500 and 999 finishers as well as for 5K world and state records.  As you can see from this figure, at the Local level, the WMA_AG method gives an “unfair” advantage to runners between the mid-forties and early fifties.  On the other hand, starting in the late sixties, this method progressively disadvantages the older runners.

Nevertheless, WMA_AG is substantially better than UD_5KH, which is the worst method across all venues. Figure 2 gives an example of the fit of this method.

Grading Age Grading

In the previous section we looked at the performance of four handicapping methods in terms of how well each fits the observed speeds between the ages of 35 and 85 and across eleven different venues representing a wide range of competitive ability.  In this section, we look directly at the handicapped speeds generated by each model.

Quite simply, an ideal handicapping system will meet two requirements:

  1. Performances which are equal, as they are within each venue, receive the same handicapped speed.
  2. The differences in competitive ability between the different venues is preserved in the handicapped speeds

Since all of the methods equate the handicapped speed and the actual speed at some point between the ages of 25 and 35, the second of the above requirements will be fulfilled provided the first is met. Consequently, I will focus on the first requirement and grade each method on its ability to achieve consistency among the handicapped speeds within each venue.

One measure of within venue consistency is the average deviation (i.e. average absolute error) among the handicapped speeds between the ages of 35 and 85.  Thus, if the handicapped speeds within a venue are all the same, the handicapping method performs ideally and the average error is zero.  On the other hand, as the differences among the handicapped speeds becomes larger, the average error grows proportionally.

For example, using the YALE_AG method, the average deviation among the male 5K state record handicapped speeds is 0.471 mph (miles per hour).

To make the interpretation of the average deviation in handicapped speeds somewhat more intuitive, miles per hour can be converted to the same scale as age (i.e. years) as follows:

Between the ages of 35 and 85, the male 5K actual state record speeds declined at an average rate of 0.125 mph/year.  Thus the average deviation of 0.471 mph in handicapped speeds is equivalent to (0.471 mph) ÷ (0.125 mph/year) = 3.77 years.

So for this venue, the average error in the handicapped speeds is 3.77 years. Is this good or bad and what letter grade should be assigned to the performance of the YALE_AG method in the male 5K state record venue?  To answer this question, consider the following two observations:

  1. Differences in age among individuals who have the same integer age are generally regarded as inconsequential by all race venues. Although we may record the exact age in days, state and world records are generally not maintained for intervals shorter than a year. Thus we do not see separate records maintained for ages 50.0, 50.1, 50.2, etc. Local race results generally show finishers ages in whole numbers. Consequently, for a particular venue, a handicapping system having an average absolute error rate of less than one year can be considered to have very good performance.
  2. Differences in age of 5 years or more are usually regarded as quite significant. Larger local races will most commonly separate individuals differing in age by 5 or more years into separate age groups. Even though ARRS reports single age records, above 40 they highlight the best performance in each 5-year interval. USATF maintains American individual masters records in 5 year intervals. If a handicapping system has an average absolute error rate greater than 5 years for a particular venue, we can conclude that it fails for that venue.

Having established that an average error of less than one year is an “A” and an average error of five or more years is an “F”, the intervening spread can be partitioned uniformly to provide the following grade scale:

  •    “A”: Error less than 1.00 years
  •    “B”: Error between 1.00 and 2.33 years
  •    “C”: Error between 2.33 and 3.67 years
  •    “D”: Error between 3.67 and 5.00 years
  •    “F”: Error equal to 5.00 years or more

Using this scale, report cards with the grades achieved are shown for each handicapping method in tables 2a and 2b.

Summary of Results

For the local races, only the BDR_AH method showed a reasonable performance, with grades of B and Coefficients of Determination at 97% or above for both males and females in each local venue. On the other hand, the other three methods all had a Coefficients of Determination at 90% or below on all local venues and received straight “F’s” on the male local venues.

The UD_5KH failed on all eleven venues for both genders. Nonetheless, at the world level, the other three methods all performed reasonably well with BDR_AH getting straight “A’s”, WMA_AG getting “A’s” and “B’s”, and YALE_AG receiving “C’s” and one “B”.  However, this might be expected since all three of these methods are directly derived from the same single age world records that were used in the evaluation process.  Possibly a useful question is “Why didn’t all three of these methods get straight “A’s” at the world level?”

The following discussion provides more detail on each of the methods.

UD_5KH Discussion

Even though the UD_5KH method performed poorly, with some methodological adjustments, it does have the potential to provide a significant insight into the aging process itself: Can a unified approach explain of the effect of age on individuals ranging in ability from the general population (measuring VO2max) to world class athletes achieving a single age record?

Among the methodological issues with the UD_5KH, as published by Vanderburgh and Laubach in 2007, is that it relies on two studies on the “Changes in Aerobic Power” of Men and Women by Jackson et.al, 1995 and 1996. The studies by Jackson et.al. assume a linear effect of age; however, a 2005 study by Fleg et.al. showed “Accelerated longitudinal decline of aerobic capacity in healthy older adults.”  (See also Ades and Toth, 2005).  Also, the Jackson age coefficients, as used by Vanderburgh, are “corrected” for physical activity and body composition, which are themselves highly correlated with age.  This causes the decline with age to be substantially underestimated, as is graphically illustrated in Figure 2.

It should additionally be noted that the Jackson studies, especially for women, underrepresent older adults in that the oldest woman was only 64.

Vanderburgh and Laubach also published “Validation of a 5K Age and Weight Run Handicap Model” in 2006.  Unfortunately this study is of limited value due to questionable statistical methodology.  Two assumptions should have been challenged during peer review.  First, the assumption is made that the absence of a linear correlation between age and handicapped run times implies there is no relation between age and handicapped run times.  Second, individuals whose times were outliers to the model were successively excluded for (supposed) lack of sufficient effort (i.e. when data that did not fit the model very well is excluded, the model fits the remaining data better!)

Nevertheless, this approach has significant potential, I hope it is revisited with the appropriate methodological corrections.

YALE_AG and WMA_AG Discussion

A fundamental assumption of both the YALE_AG and the WMA_AG methods is that for each age a unique frontier or upper biological limit to human performance exists, and that individuals of different ages who have performances at the upper biological limit for their age can be regarded as having equal performances.

A second assumption is that individuals of differing ages who perform below the biological limit but at the same percentage of the biological limit for their age can be regarded as having equal performance.

As far as I can tell, these assumptions and the resulting models have not been tested until this article. The WMA_AG performs somewhat better than the YALE_AG, primarily because it has more parameters which allow a closer fit to the data.

The second assumption is particularly problematic because not only has it not been tested, but the alternatives do not appear to have been considered. For example, with males, the YALE_AG 5K “biological limit” at 80  takes 44% more time and is about 4.36 mph slower than at 40 years. Now suppose a less than world class 40 year old can run a 5K in 18 minutes, what is the equivalent time for an 80 year old?  44% more time suggests about 26 minutes, whereas 4.36 mph slower suggests about 31 minutes.

As in the above example, I reworked the YALE_AG handicapped performances for the 3 male Local venues using absolute changes in speed rather than % change in time. This simple adjustment to the YALE_AG model resulted in a dramatic improvement, reducing the average deviation by more than 50% and changing the Local venue grades from three “F’s” to three “C’s”!

BDR_AH Discussion

Although BDR_AH did well for all of the venues considered here, even the least competitive venue (Age Group Winners in midsized 5K races) represents athletes who are well above average.  This begs the question, can the effect of aging on maximal athletic performance be modelled across the full spectrum of human ability, ranging from ordinary individuals up to the most elite world class athletes?  (Since, a large percentage of older individuals may not be able to complete an endurance race at a running speed, their maximal performance may need to be measured on a treadmill or some other more controlled venue.)

At the other end of the spectrum, what is the best way to estimate the frontier or upper limit of human performance at each age? Should it be directly estimated as with YALE_AG and WMA_AG using only “non-dominated” single age running records?  Or should it be extrapolated from models, such as BDR_AH, which are initially derived to fit the entire complement of single age running records?

Future research into all of these questions promises to deliver important insights into the aging process and human performance.

Age Handicapping Competitive Runners, Part 2: Tables for Speed Handicaps

<Go directly to age handicapping tables>

Introduction

Can the age-related decline in running speed seen in single age world record holders be meaningfully translated into an age handicapping system for local competitive runners?  I use the term “competitive” runners to designate the subset of runners in local races who prepare for and attempt to give their best performance in the race.  Competitors are essentially distinct from the relatively large group of social and recreational participants who are looking for a “fun” run, an opportunity to share an activity with a friend or friends, or to support some greater community cause.

When we consider the full spectrum of local race participants, whether social, recreational, or competitive, current models based on world records clearly do not work very well as was shown in Racing Among the Ages.  However, perhaps it is inherently less useful to age handicap the recreational and social participant subgroups than it is to age handicap the truly competitive runners who strive for the best performance that is possible for them.  One might suspect that five year age group winners, especially in larger local races, largely consist of truly competitive runners.  Certainly, not every competitive runner will win his or her age group.  However, as we go deeper into the order, it becomes progressively more difficult to distinguish between competitive and non-competitive participants based solely on their time.  Consequently, in this article, the word “local” runner or “local class” refers to data and models based on the records of age group winners in local races.  The term “world class” will refer to models and projections based on single age world records.

With this clarification, the initial question can be reframed as follows: Can the age related decline in speed among world class runners be used to generate an age handicapping system for local class runners (and everyone in between)?

Several popular web sites are constructed on this premise, which is largely untested. Two popular age grading calculators are Aging in Sports and Chess and the WMA Age-grading calculator.  Many other age grading sites are derived, directly or indirectly, from these two sites.  In a 2007 publication, the author of the first site, Ray C. Fair, has questioned “Does a person of average talent … who is in good shape slow down at a similar percent rate as elite athletes?”, p53, (italics added).  The second site also uses a model that assumes a comparable percent decline between world record and more average competitors.  In “Age-graded performances”, the principle author of this second site, Howard Grubb, has stated that “super-veteran (aged over 60 or so) athletes run more slowly at the moment than expected.”

So it is reasonable to be skeptical of the untested assumption that world and local athletes slow down at the same percent with age.  However, there are other ways to model the decline in speed.

A Metric Based on the Absolute Change in Speed.

This article examines a simple alternative to the “Percent for Age” method used by current age grading systems. With the proposed alternative, which I will call “Age Speed Addition”, age related performance changes are modelled as absolute differences in speed, whereas current age grading methods assume age related changes can be expressed on a relative (i.e. percent) scale.

To illustrate these two methods, I started with the single age world records for the male road 5K from the Association of Road Racing Statisticians, www.arrs.net.  The values in this dataset were equalized for the underlying single age population sizes as described in “Age Handicapping Competitive Runners, Part1: Quantifying the Population Effect”. The dataset was also smoothed using the Savitzky-Golay filter as described in the Appendix to this article to give the following equivalent speeds based on world records:

  • World 25 year old male: 14.11 mph
  • World 82 year old male: 8.28 mph

Note that the world 82 year old runs at 58.6% of the speed of the 25 year old and that he is 5.84 mph slower.

The “Percent by Age” method (as used by most current age grading systems) would suggest that the 82 year old competitive runner in a local race should run at 58.6% of the speed of his equivalent 25 year old competitor. The absolute speed method suggests the local 82 year old should run 5.84 mph slower.

To illustrate the application of these methods to local competitors, I will use the single year equivalent performance of male age group winners in 356 local 5K races having between 500 and 999 total participants (see Racing Among the Ages).  As with the world records, these local data were also equalized for population and smoothed per the Appendix.  From this we find that the equalized speed of local 25 year olds is 10.84 mph whereas the equivalent speed of a local 82 year old is 4.76 mph.  The following table summarizes these results:

The “Percent by Age” method suggests that the handicapped speed of the local 82 year old be calculated as  4.76/.586 = 8.12 mph.  On the other hand, the absolute “Age Speed Addition” method handicaps the speed of the 82 year old at 4.76 + 5.84 = 10.60 mph.  As you can see, in this case, the “age speed addition” model provides a handicapped speed that is much closer to the target 10.84 mph of the equalized 25 year old local competitor.

The graph below compares the handicapped speeds for local 5K male competitors between the ages of 25 and 85. The formulas described in Age Handicapping Competitive Runners, Part1: Quantifying the Population Effect were used to get speeds representing the same percentile among the populations for each age.  Consequently a perfect age handicapping system should produce handicapped speeds that are the same for all ages.

In the graph, note that the “Age Speed Addition” method gives handicapped speeds that stay approximately within +/-0.5 mph for the entire range of ages. However, even though it does very well prior to the mid-sixties, the “Percent by Age” method fails rapidly after the mid-sixties, confirming Howard Grubb’s earlier concern.  By way of comparison, the average deviation of speed handicapped by the “Percent by Age” method was 3 times larger than the average deviation of speed handicapped by the “Age Speed Addition” method.

A future article will provide an in depth comparison of the Age Speed Addition method proposed here versus current Age Grading methodology. Suffice it to say here that Age Speed Addition represents a substantial improvement on current methods.

****

Tables of Speed Additions for Age Handicapping Competitive Runners

Single age world records for the Road 5K, 10K, Half Marathon, and Marathon were combined to generate the tables shown below. This data was provided by the Association of Road Racing Statisticians, www.arrs.net.  Incidentally, with age, the absolute speed declines comparably for all of these distances, so, for each gender, a single table is applicable for all distances between 5K and the Marathon.  Note that the “Age Speed Additions” are expressed as MPH, Miles Per Hour.

Appendix: Data Smoothing

Alan Jones has done a good job of explaining the current Age Grading methodology in his article “Age grading running races”.  The methodology is used to create a curve which dominates all single age records and still comes as close to the data as possible.

On the other hand, for the “Age Speed Addition” tables developed here, I use a non-parametric (or, more accurately, pan-parametric) data smoothing methodology. This has the advantage of producing a more adaptive curve and also of incorporating information from every data point.  In the area of signal processing, this smoothing technique is called the Savitzky-Golay filter.  The graph below shows the population adjusted world records for the 5K smoothed with a quadratic S-G filter having a range of 9 below age 30 and a range of 21 for age 30 and above.   All population adjustments use the formulas developed in Part 1 of this series and adjust to the equivalent population at 30 years of age.

To get single year equivalent performances based on 5 year age group winners in local races, I used rolling 5 year intervals and interpolated to integer ages. The results were then adjusted for population and smoothed with an S-G filter as indicated above.

 

Age Handicapping Competitive Runners, Part1: Quantifying the Population Effect

Introduction

Handicapping sporting events has been applied to a wide range of human and animal competitive endeavors.  Wikipedia defines handicapping sporting events as “the practice of assigning advantage . . . to equalize the chances of winning.”

Equalization of performance is the essential feature of any handicapping system.  Most people have passing familiarity with “golf handicaps” which according to the USGA enable “players of differing abilities to compete on an equitable basis.”  Similarly, according to HorseRacing.com, handicapping involves “the practice of adding weight to horses in an effort to equalize their performance.”

Within the sport of human long distance running, handicapping the performances of runners according to age is sometimes referred to as “Age Grading.”   The goal of Age Grading is to equalize the performance and thus provide a “level playing field” for runners of differing ages.  For example, a 30 year old and an 80 year old can compare their Marathon performances to see who performed better for their age.  Or commonly, a 60 year old runner might compare his or her current speed with their speed from 20 years ago after adjusting for the effects of age.

Currently, the best known methods for age handicapping long distance running leverage single age world records in track and field and in road racing.  Several individuals who have loaned their expertise to this endeavor are Howard Grubb, R. C. Fair, Elmer Sterken, and Alan Jones.  Most of these systems for age-grading differ only slightly based on model assumptions and the date they were developed (i.e. some models may have had access to more recent world records.)   Two popular calculators are:    WMA Age-grading calculator and Aging in Sports and Chess.

Nevertheless, these methods of age-grading are not without controversy.  In “Age-graded performances” Howard Grubb has worried that “super-veteran (aged over 60 or so) athletes run more slowly at the moment than expected.”  In a 2003 article titled “From the cradle to the grave: How fast can we run?” Elmer Sterken reached a similar conclusion, as did I in a more recent large study of U.S. based 5K races.

Age Handicapping Based on Population

Since equalization is the essential feature of any handicapping system and for Age Grading in particular, we need to consider the sense in which the world record for, say, a 80 year old can be equated to the world record for a 30 year old. Or similarly how can the 30-34 age group winner in a local race be equated to the 75-79 age group winner?

Among adults, almost all of the single age world record holders for marathons, half-marathons, 10K road races, and 5K road races come from a country in the developed world, or from Kenya and Ethiopia. However, combining all of these countries shows that the male population of 80 year olds is only 30% of the population of 30 year olds.  Obviously there are two possible reasons for the smaller population of 80 year olds:  either fewer people were born 80 years ago than 30 years ago, or more of the older group has died.  In either case, the smaller number of potential competitors makes the older group somewhat less competitive.

Frequently (as in “How Fast Do Old Men Slow Down?” by R.C. Fair and “Age grading running races” by Alan Jones), the very best single age world records for each distance are fitted with a model in an attempt to estimate the upper, “biological limit” or frontier of human performance.  Factors derived from these models are then used by the above referenced Age Grading calculators.

However, another, and potentially more generalizable, way to understand these single age world records is to view them as the speeds attained by (single year) age group winners in an extremely large “race” consisting of everyone who has lived in the last 100 years or so.  Thus, for example, with the road 10K, the world record for 63 year old males was set in 1994 by Ed Whitlock.  Ed, then, is the winner among all 63 year olds who have ever been alive at some point in the past century.

Whether we consider age group winners in local races with 5-year age group intervals or single age world records, it is possible to formalize the impact of the size of the underlying age group population.  For example, to compare the winners of two age groups, symbolized by “j” and “k”, let

Pj, Pk = the total populations, summed over the relevant geography, that fall into the jth and kth age groups, respectively.

Wj(s), Wk(s) = Wj, Wk = the cumulative probability distribution functions (cdf) for the speeds, s, of the winners of the jth and kth age groups, respectively.

As is shown in the appendix, the winners of the two age groups will be at the same percentile among their peers and hence have equivalent age-adjusted performances when

Wj = Wk^(Pj/Pk)

where “^” is the power operator, i.e.   

For example, Racing Among the Ages  presented information on 1283 5K races from all across the U.S.  Included among these races were 356 which are classified into the “large race” category, having between 500 and 999 total finishers.  Letting “j” be males aged 75-79 and “k” be males 30-34, we can use these 356 large races to illustrate how age-group population size can be employed to provide an equalized comparison of the age group winners in these two age groups.

At the last census (2010), the U.S. Census Bureau estimated the U.S. population of males aged 75-79 was Pj = 3,182,388 and the population of males ages 30-34 was Pk = 9,996,500.  Therefore Pj/Pk = 0.32.

By definition, the median speed for age group winners among males 30-34 occurs at the 50th percentile, i.e. when Wk = 0.50. Substituting into the above formula shows

Wj = 0.50^0.32 = 0.80

Consequently, 80th percentile among the M75-79 age group winners is equivalent to the 50th percentile among the M30-34 age group winners.

The inclusive median speed among 356 races occurs at the midpoint between the 178th and 179th fastest age group winners. For M30-34 this value is 10.30 mph (18:05), and for M75-79 this median value is 4.94 mph (37:45).  However, the median of the M75-79 does not represent an equivalent performance among the peer group.  The 80th percentile for age group winners among M75-79 is 6.26 mph (29:49).  Thus, a time of 18:05 for M30-34 is equivalent to a time of 29:49 for M75-79.

In summary, the winners of the “j”th and “k”th age groups in a particular race will be at the same percentile among their peers and hence have equivalent performance if

Wj = Wk^(Pj/Pk)

Note that, in order to compare the performances of different age groups, it is not necessary to know anything about the distribution of speeds for individuals within either underlying age group population.  Nor is it necessary to know the precise sizes of the underlying age group populations, Pk and Pj.  All that is needed is the population ratio and the distribution of speeds among age group winners.

For both age group winners in local races and single age state running records, the distribution of the winning speeds for each age group can be determined by examining several local races or the single age records across several different states.   However, by definition, there is just one world record for each age.  Nevertheless, it is possible to look at the residuals from a fitted model to estimate the distribution of speeds among single age world record holders.  In doing this, we note that the standard deviations from the fitted model increase with age and must be estimated appropriately.

Future Article on Age Handicapping Competitive Running

In a future article, we will apply this simple but elegant formula to age handicap 11 different racing venues with distances ranging between the 5K and the Marathon and competitiveness ranging from age group winners in small local races to single age world record holders.   Moreover, the age handicapping system thus obtained is both simpler and substantially more accurate than current methods.

Appendix:  Computational Outline

By definition any event or venue that is open to all comers has a sampling intensity or “Reach” (R) that is similar for each age group in the applicable geography.  However, this does not mean that the expected number of actual participants in the race will be proportional to the population (Pi) for each age group.  The expected number of participants in a given age group will be proportional to the product of the underlying population and the fraction (Fi) of that population that is Fit and motivated enough to compete in a given race or venue.  Thus the expected number of participants in an age group is R(Fi)Pi.

In the earlier example, we saw that the U.S. population of males between 75 and 79 is 32% of the population aged between 30 and 34.  However, in 1283 U.S. based races, there were only 3% as many individuals in the M75-79 group as were in the M30-34 group.  Thus, among the older group approximately 10% as many are sufficiently Fit and motivated to participate in races.  Undoubtedly, physical limitations prevent many older adults from participating.

The sampling intensity or “Reach” factor, R, would not come into play for world records (except possibly for the impact of various international political considerations), i.e. it has a value of 1.  However, based on marketing, each local race can have its own value for R since some individuals who are fit and willing to participate in a race may not hear about it in time to register; or since some individuals may not participate in a particular event because they have chosen another more desirable event that occurs at the same time.

For any particular event, the number of individuals in the applicable age group population who are unfit or unwilling to compete is (1-Fi)Pi.  Had these individuals been fit and willing to participate, we would expect R(1-Fi)Pi of them to have participated in the event.   Nevertheless, in evaluating an age group winner’s performance among his peers, it reasonable to consider him faster than both all of his peers who participated in the race, plus the expected number of potential participants who did not participate because they are unable or too slow to complete the race successfully.  Thus the age-group winner is the fastest among R(Fi)Pi+ R(1-Fi)Pi = RPi peers.

Then for a given distance (e.g. marathon, half marathon, 10K, 5K) and gender, let

s = speed of an individual at the event.

R = the fraction of individuals in the entire population who participate in the event among all those who are fit and otherwise capable.

Pi = the total population, summed over the relevant geography, that falls into the “i”th age group.

Fi = the fraction of the “i”th age group that is fit and motivated enough to compete in a given race or venue

Ei(s) = Ei = the cumulative probability distribution function (cdf) for speed in the “i”th age group; i.e., it is the percent of the entire population falling into the “i”th age group that are slower than or equal to a speed of “s”.  Note that the fraction of individuals in the “i”th age group who are either unable or unwilling to compete in the race is simply Ei(0).

Wi(s) = Wi = the cumulative probability distribution function (cdf) for the speed of the winners in the “i”th age group.

Since Wi is the cdf of the maximum for a sample of size RPi with cdf Ei,

Wi = Ei^(RPi) 

 thus

Ei = Wi^(1/RPi)

Suppose two individuals belong to different age groups, the “k”th age group and the “j”th age group.  Among their peers, their performances will be equivalent if they each achieve the same percentile; i.e. if

Ek = Ej  

Consequently, the winners of these age groups will be at the same percentile among their peers when

Wj^(1/RPj) = Wk^(1/RPk)

Simplifying this expression yields

Wj^(1/Pj) = Wk^(1/Pk)

Wj = Wk^(Pj/Pk)

It is important to note that this result does not depend on the functional form of the population cdf, Ek(s) and Ej(s), for either age group.  Nor does it depend on knowledge of the exact population, Pk and Pj, of either age group.  All that is needed is the population ratio.

WMA Age-Grade Standards for Winners of the 30-34, 35-39, and 40-44 Age Groups

Current WMA Age-Grading Standards are extremely aggressive. In fact, in the entire modern era of sports statistics, only about a half dozen isolated performances have met the Standard for 5K road races.  The great majority of single age world record holders have never had a performance that met the Standard.  Consequently, ordinary athletes may have difficulty connecting with these Standards.  The purpose of this article is to re-express the WMA standards in terms that almost every 5K participant can relate to:  the age-group winner.

For road and track racing as well as other sports, World Masters Athletics (WMA) has developed and maintains performance standards for each single year of age. Athletes of various ages are evaluated in terms of how well they stack up against the event standard for their age.  For example, a runner in a 5K road race might be 70% as fast as the standard for his or her age.  Based on how well an athlete compares to the standard for his or her age, WMA has created labels for various levels of performance as follows:

  • Above 90%     World Class Level
  • Above 80%     National Class Level
  • Above 70%     Regional Class Level
  • Above 60%     Local Class Level

This begs the question of how we might relate these various levels, especially the Local, Regional, and National classes, to concrete performances that are familiar to the athletes that are somewhat below World class. For example, if you are a “Regional” class athlete, how often might you win your age group in a 5K road race?

One of the most complete sets of WMA age standards and probably the best known is the WMA Age-Grading Calculator.  This calculator is also the basis for many (almost all?) of the other on-line age-grading calculators.  And although the methodology produces significant biases for the youngest and oldest race participants (see Racing Among the Ages), it appears to be reasonably consistent within gender across a wide spectrum of abilities in the heart of the age range, i.e. between the ages of 30 and 44.  Consequently, in this post we will look at the following age groups:  30-34, 35-39, and 40-44.

Computational Example

[If you are not interested in the computational details, skip to the results section below.]

As an example, consider “Joe”, a 32 year old male who can run a 5K in 21:33. The WMA 2015 standard for a 5 km road race is 13:05.  With a time of 21:33, Joe will perform at 60.7% of the standard, and hence is just above the threshold for a “local class” athlete.  We will also note that a time of 21:33 also corresponds to the 87.29th percentile among 32 year old males.

Among all male and female 5K finishers of all ages, 4.9% are males between 30 and 34[ref].  Suppose “Joe” decides to participate in a very small race expected to have just 40 total runners in addition to himself.  He would then expect to compete against an average of 40 x 4.9% = 1.96 other runners in the M30-34 age group.  If the expected 40 runners are a representative sample of all runners, then the actual number of runners in the M30-34 age group will follow a Poisson distribution with mean 1.96.

Using Poisson distribution with a mean of 1.96 suggests there is a reasonable probability (0.141) that no other competitor shows up for Joe’s age group and he will then have a 100% chance of winning his age group. There is a probability of 0.276 that exactly one other competitor shows up in the M30-34 age group.  Since Joe is calculated to be at the 87.29th percentile among his peers, he will have a 0.8729 probability of defeating this competitor.  Thus the probability that exactly one other age group competitor shows up and that Joe beats him is 0.276 x 0.8729 = 0.241.  The Poisson probability that Joe has exactly 2 competitors is 0.271 and the probability that he beats both is 0.8729 x 0.8729 = 0.7620.  Thus, the combined probability that exactly 2 competitors show up and that Joe beats them both is 0.271 x 0.7620 = 0.206.

It is possible to make similar calculations for every possible number of competitors for Joe in the M30-34 age group, i.e. for 0,1,2,3,4,5,6 . . . etc. When we add up the probability that Joe wins his age group across all possible numbers of competitors, we can calculate that Joe’s overall probability of winning is 0.141+0.241+0.206+0.118+0.050+0.017+0.005+. . . = 0.779.  Thus, a male between 30 and 34 who competes at 60.7% of the WMA standard will have approximately 0.779 chance of winning his 5-yr age group in a race expected to have a total of only 40 other participants.  Since Joe, competing at 60.7% of the WMA standard, will usually win his age group in these very small races, we can correctly state that a 60.7% age-grade is significantly superior to the typical age-group winner in a race with just 40 participants.

Now suppose that Joe participates in another race expected to have 110 total participants in addition to himself. In this case, if we go thru the above calculations for the larger race, we find that Joe will have “only” a 0.501 probability of winning the M30-34 age group.  Thus, half of the time Joe will win his age group and half the time he will not.  Consequently, we can conclude that, among males 30-34, a 60.7% age-grade is equivalent to the median or typical age-group winner in races having a total of 110 participants.

Results_:_WMA Age Grade for Age-Group Winners in 5K Races

The graph below is based on the age group winners for six age groups: F30-34, F35-39, F40-44, M30-34, M35-39, and M40-44.  For each of these age groups the median WMA Age Grade was calculated for each of three race sizes: 110, 500, and 3,000.  The averages are as follows:

Total Race Participants                  Avg. WMA AG for Age Group Winners

  •            110                                                  60% (Local Class Level)
  •           500                                                  70% (Regional Class Level)
  •        3,000                                                  80% (National Class Level)

In the graph you will note that, although the combined average of males and females match the WMA classes very well, the WMA Age Grade assigned to female age group winners is consistently below that given to males. For example, for races with 110 total participants, the average AG assigned to males is 3.5 percentage points below that assigned to females.  For races with 500 participants the difference is 4.9% and for races with 3,000 participants the difference is 3.4%.

Consequently, within the range of abilities and the range of ages considered here, there is an average bias of about 4% against the females. If the WMA AG of males and females are directly compared, 4% should first be added to the female AG.  For example, a female with a 66% AG is performing at a level equivalent to a 70% male AG.

Optimum Age Groupings in 5K races

Summary

Based on the criteria suggested in this article, the most efficient age grouping structures have 3 awards per age group and use the following adult age group divisions:

  • Races with under 70 total finishers:   18,35,50,65+
  • Races with 70 to 129 total finishers:   18,30,40,50,60,70,80+
  • Races with over 129 finishers:  18,25,30,35,40,45,50,55,60,65,70,75,80+

Introduction

Most races divide participants into age groups within gender. Awards are then given for the first place and (frequently) for the second and third places in each age group.  (Rarely, some larger races may award more than three places within each age group.)  Typically each age group may span 5, 10, 15 or another number of years.

Race participants place significant value on award ceremonies where the top finishers in each age group are recognized. However, there is a limit to how much time participants are willing to devote to an awards ceremony.  Generally, the interest among participants tends to wane if the ceremony extends beyond about 45 minutes to an hour.

This begs the questions: What is the best way to structure age groups and how many awards should be offered in each age group?  As we will see, the answers depend heavily on the size of the race; i.e. it depends on the total number of finishers in each race.

Example

Let’s look at an example of age grouping – a bad example. Since this is an example of what can go wrong when you have poorly structured age groups, I will not give the identity of the race.  Suffice it to say, several of my friends participated in this race and there was significant dissatisfaction with the way age groups and awards were handled.

Among adults, the age groupings were: 18-24, 25-29, 30-34, 35-39, 40-44, 45-54, 55 and over (This age grouping can be abbreviated as 18,25,30,35,40,45,55+).  The first and second place in each age group received an award.  There was a combined total of 193 finishers which includes Youth, Adult Females, and Adult Males..

In this race, a 79 year old man had a rather remarkable 10K time of 53:05 – but he received no award since he had to complete with much younger men in the 55+ age group. This man’s 10K time (and all other participant’s times) can be converted to their 5K equivalent using the MCMILLAN RUNNING CALCULATOR.   In this case, the 5K equivalent time for this 79 year old is 25:34.  We see from bigdatarunning.com/5k_percentiles/ that this performance places him at the 99.9th percentile for his age.  By contrast the percentiles for the 7 adult males actually given first place awards ranged between the 83rd and 96th percentiles.  There were also two individuals at the 97th percentile, one received no award and one was given a second place award, but both out-performed all of the individuals receiving a first place award.  Clearly, in this case, the age groups and award schedule selected by the race director were problematic.

Age equivalent performance

In order to quantify the differences among runners after adjusting for age, all performances are converted to a 25 year old equivalent basis. This is the age at which top athletes peak and is the average age of Olympic medalists [see Peak Performance, part 2].  For example, a 25 year old male at the 99.9th percentile has a 5K time of 14:02 corresponding to an average speed of 13.28 miles per hour.

Metric

A discrepancy occurs whenever two runners in a race have different age adjusted speeds but both receive the same award, (or equivalently both receive no award at all). A natural way to quantify the discrepancy between two runners in the same award category is to look at the squared difference in their age adjusted speeds.  With this definition, the average discrepancy across all pairs of runners is mathematically equivalent to twice statistical variance among the runners, i.e. it is twice Mean Squared Error (MSE) among the age adjusted speeds.  Consequently, for consistency with conventional statistical terminology, I will define discrepancy in terms of ½ the squared difference in speeds.

The giving of awards for 1st, 2nd, etc. in each age group is intended to correct or reduce the discrepancy among race participants.  Thus the discrepancy between two runners is eliminated when the faster runner receives a more prestigious award than the slower runner.

On the other hand, and especially with poorly designed age groupings, a slower runner may actually be given a better award than a faster runner. In this case, the overall discrepancy is increased in proportion to the squared difference in the rank of the awards given.  For example, suppose someone running at an age adjusted speed of 7 mph was given a 1st place award and another runner travelling at 11 mph receives a 3rd place award.  The magnitude of this discrepancy is then ½*[(11-7)*(3-1)]2 .

Using these definitions of “discrepancy”, an age group efficiency can be defined based on reduction in variance caused by the awarding of metals. For example, if, for a particular age group schedule, the awarding of metals reduces the variance by 30%, then we would say that the age group schedule has an efficiency of 70%.  The tables at the end of this article represent the average of male and female age group efficiencies for hundreds of races.

Data

The data from the 1283 5K races discussed in the book Racing Among the Ages was used to evaluate the relative efficiency of various age grouping schedules.  Based on the total number of finishers, several different race sizes were examined for each age group schedule:

  1. 50: 50 finish records randomly selected from each of 1283 races
  2. 100: 100 finish records randomly selected from each of 1283 races
  3. 200: 203 races (161-256 total finishers); median race size was 200
  4. 400: 204 races (323 and 458 total finishers); median race size was 400
  5. 800: 202 races (645 and 977 total finishers); median race size was 802

Age grouping efficiency is very much dependent on the number of awards  given. However, the aforementioned time constraints as well as a desire not to “cheapen” the awards puts limits on the numbers of awards.  For present purposes, I only look at schedules where less than 50% of finishers receive an award, three or less awards are given per age group, and an average of 36 or fewer total awards are given to each adult gender (18 and over).  Including the awards for the youth, this will be about as many awards as can be given within a ceremony not exceeding an hour.   (Note that the average number of awards given may be slightly less than the number of awards actually offered because some age groups may have fewer participants than the number of awards offered to each age group.)

Results

The tables below show the efficiency for selected adult age grouping schedules. (All of the schedules shown start at 18 years; however, starting them at 20 years gives essentially the same conclusions.)

For races with 50 total finishers, age groups 15 years wide are optimal; for races with 100 finishers, 10 year age groups are optimal; and for races with 200, 400, or 800 finishers, 5 year age groups are optimal. For races of all sizes, the optimal age grouping schedule was associated with 3 awards per age group rather than 1 or 2.

In addition, for races with 50 finishers the top age group should be 65+. For races all other sizes, the top age group should be either 75+ or 80+.

Based on race size, the best age grouping schedules were as follows:

  • 50 Finishers:        18,35,50,65+
  • 100 Finishers:     18,30,40,50,60,70,80+
  • 200 Finishers:     18,25,30,35,40,45,50,55,60,65,70,75,80+
  • 400 Finishers:     18,25,30,35,40,45,50,55,60,65,70,75,80+
  • 800 Finishers:     18,25,30,35,40,45,50,55,60,65,70,75,80+

Perhaps these results may seem intuitively obvious and in fact many races use grouping schedules that are consistent with these results. However, there are many other races that still use very inefficient age grouping schedules.

TABLES: THE EFFICIENCY OF SELECTED AGE GROUPING SCHEDULES

50 Finishers:

100 Finishers:

200 Finishers:

400 Finishers:

800 Finishers:

 

 

Median Times for Top Finishers in 5K Road Races

The tables below show the median times for the top finishers in 5K road races. The races are broken down into size categories based on the total number of finishers (male plus female) in each race.   The categories are as follows: small races with 100-299 finishers, medium races with 300-499 finishers, large races with 500-999 finishers, and very large races with 1000 or more finishers.  The number of races and other statistics for each race category are as follows:

race size categories

These results are based on data reported in Racing Among the Ages.

overall MALE

overall FEMALE

Also see a related article on the Median 5K Times of Age Group Winners.

 

 

Peak Performance Part 2: At What Age Do We Run the Fastest?

At what age does 5K performance peak? To address this question and as a continuation of a two part series, I look in more depth at the dataset previously reported in Racing Among the Ages.  This dataset consisted of records of more than a million 5K finishers from almost 1300 races all across the United States.

The graphs and table presented here are based on “percentiles”. Most people are somewhat familiar with the concept of percentiles since percentiles are used in many standardized academic achievement tests.  Basically, as used here, the percentile tells an individual what percentage of same age peers are slower.  For example, if a 35 year old  female is at the 60th percentile, this means that 60% of other 35 year old females are slower than she is; and 40% are faster.  For a person at the 50th percentile, half of his or her same age peers are faster and half are slower.  Thus the 50th percentile is the median performance.

The graphs show the speed in miles per hour for 5K participants in the 50th, 90th, 99th, and 99.9th percentile.  For the years ’96, ’00, ’04, ’08, and ’12, the average age and speed of Olympic medalists (5000m) is also plotted.  The age at peak performance is shown for each of the four selected percentiles by small black triangles.

Age at peak performance GRAPH females

Age at peak performance GRAPH males

Athletes in the 99.9th percentile are very elite and most likely would be considered world class.  Athletes in the 9oth percentile are faster than 9 out of 10 of their peers and certainly should be considered very good athletes.

Consequently, the graphs suggest the following conclusion for both male and female 5K participants: Average and even very good athletes peak in their late teens, but elite, world class, and Olympic athletes peak at around twenty-five years of age.

Details are shown in the table below.

Age at peak performance TABLE

*Age shown for community runners is the average of the whole year age plus 0.5 years. (e.g. someone listing their age as 16 in a community 5K race is between exactly 16 years and 16 years plus 364 days.  Thus the average age of all 16 year olds is 16.5 years)
*The age of the Olympians is based on the difference between their date of birth and the date of the Olympic competition.
**Average of Gold, Silver, and Bronze Medalists for '96, '00, '04, '08, and ‘12 games

Peak Performance Part 1: Do We Run Faster at 17 or at 25?

At what age does athletic performance peak? As a first cut at this question, one might ask “who can run faster in a 5K race, a 17 year old or a 25 year old?”

When I have asked friends and relatives this second question, the opinions are split about evenly between the 17 year old and the 25 year old. However, a number of articles and studies of world class athletes, Olympians, and world record holders have uniformly concluded that for events requiring physical exertion comparable to the 5K, the age of peak performance occurs in the mid-twenties.  For example:

For Athletes Peak Performance, Age is Everything, in Wired

Athletes and age of peak performance, by Axon Sports

Peak Performance and Age Among Superathletes, in The Journal of Gerontology

So are my friends who think a 17 year old is faster than a 25 year old just uninformed? The answer appears to be “it depends”.  The dataset reported in Racing Among The Ages allows us to explore this question in more depth.  In this large dataset of 5K finishers, there are approximately 7,600 seventeen year old males and 9,000 twenty-five year old males.  Among females the numbers of seventeen and twenty-five year olds are approximately 7600 and 15300, respectively.

For males, the median 5K time for 17 year olds was 23:57, whereas the median time for 25 year olds was considerably greater at 26:38. As Table 1 shows, almost 40% of 17 year olds can run a 5K in under 22 minutes, but only 20% of 25 year olds can run this fast.  Clearly, among typical male 5K participants, the 17 year olds are much faster than 25 year olds.

5K Participants Achieving Selected Time Thresholds

Although less dramatic, females show a similar pattern.   The median time for 17 year old females is 30:49 whereas the median for 25 year olds is over a minute slower at 31:52.  5.4% of 17 year old females can beat 22 minutes, but only 2.7% of 25 year olds can beat this mark.

So how can we reconcile this observed superiority of seventeen year old athletes with the almost universal finding that world class athletes peak in their mid-twenties?

The answer is hinted at in Table 1. If we look at the very fastest athletes, e.g. males completing a 5K in less than 16 minutes, we see that the numbers are reversed from what is seen with more typical athletes. For example, among this elite group, the older athletes are much better represented (1.4%) than are the younger athletes (0.3%).

Age Related changes in 5K Participation Rates: Implications for Age-Grading

Have you ever noticed how few older individuals participate in 5K races? Have you noticed how many races don’t even have separate age groups for the oldest individuals?  Typically these races might advertise five year age groups which cut off abruptly at 60 years of age, e.g.:

“. . . . . . 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, and 60+”

Why would this be? In terms of athletic ability and running speed, the difference between a 70 year old and a 60 year old is much greater than Continue reading “Age Related changes in 5K Participation Rates: Implications for Age-Grading”