Reflections on a racing form: Research study, part I: The future success of first-out winners, as moderated by time of year and winning track

                                      What I was thinking and what I did

I am now going to embark on a series of posts about debut winners. The general topic is, how can we tell if one is going to be good? And what is the general outlook for them?

I had to create a database. When you create a database, the general rule of thumb is to include as many variables as you can, because you'll be surprised when one of them will come in handy. You don't know what your final analysis is going to look like, and the focus may shift. On the other side, in the slow, largely manual way that I compile databases, I have learned that too many variables make the data entry unmanageable. Additionally, while the array of variables leaves the world open to you, that also can be a curse, and the analysis, while, far-reaching and powerful in its conclusions, takes a long time to complete.

So in this study, I limited the number of variables I collected, fully aware of all of the questions I would be leaving open. I was also interested in a very specific aspect of debut winners' prospects: namely, which running style(s) translated to future success? Countless times I have looked at a Racing Form, and fancied this second-time starter or that, depending on whether leading the field from the gate seemed impressive to me at the moment, or coming on like a Zenyatta or a Grindstone. Horses run fastest at the beginning of races, at least on dirt, so perhaps that is how true abilty is most often signaled. But the horse who comes from behind and wins does something rare, and may be the one with potential down the road. The closer may be the late bloomer, the diamond in the rough. The truth was that, without doing research, I just didn't know which style was more impressive.

In choosing the parameters of my study, I wanted them to mirror the exact conundrum I had been in countless times at the races. I restricted myself to studying dirt, sprint maiden special weight winners. State-bred races were excluded, as were maiden claimers. I only looked at horses who won at the 'A' meets in the country, which I defined as Gulfstream in the first three months of the year; Santa Anita in the first three months of the year (plus the 12/26 - 12/31 period); all NYRA racing, June through the Thanksgiving weekend; Churchill in June and July; Keeneland in October; Churchill in occasionally October, and always November, through Thanksgiving weekend; Hollywood June and July; Del Mar; and Santa Anita and Hollywood in the fall. Certainly, Keeneland in April is an 'A' meet, for instance, but I was interested in tracking the best 2-year-olds, and April is very early for good 2-year-old racing. I studied 2-year-olds, plus 3-year-old-only maiden special weight winners for the winter Santa Anita and Gulfstream meets. I had a couple of other picayune rules: no off-the-turf races; no fields of fewer than five horses. Sprints were defined as 7f or less (the 'about' 7f distance at Keeneland is 7.28 furlongs and so was not included).

I wanted to study the most recent data possible, but I needed to use a number of years to obtain adequate sample size, and I also needed enough years to have passed since the winners' debuts to see how they turned out. I needed them to be retired. So I used the years 1996-2005.

I believe some truths that I found to be timeless, basic realities of racing, and others to be of questionable longevity. Not being current is a problem with longitudinal research.

I used Equibase's on-line historical charts, found the debut winners who fit my criteria, and then recorded their career accomplishments as shown by "Equibase Horse Search." My outcome variables were whether the horse was a graded stakes winner; whether he or she was a graded stakes winning in races up to a mile; whether he or she was a graded stakes winner in races over a mile; and the horse's career earnings. I did not record whether the horse was a plain stakes winner; there are lots of phony stakes winners going around. Graded-stakes-placed might have been nice to have. But I don't think other outcome variables were going to improve but very marginally over graded stakes-winning status, particularly not when earnings supplemented it.

Grade I winners are just too rare to be able to make comparisons across groups; plus, again, I had earnings. This enabled me to look at the very high end of accomplishment if I wanted to.

Debut winners were placed in one of four categories, according to the following method. Only the first call of the chart was used (in other words, the first one where there's a rendering of how far back or in front the horse is). Horses on the lead went into the "led" category; horses who were 2nd at the first call in fields of 8 or fewer horses, or 2nd or 3rd in fields of 9 or more horses, went into the pressed category; horses who were last or second-to-last at the first call in fields of 8 or fewer horses, or last, second-to-last, or third-to-last in fields of 9 or more horses, went into the "behind" category. All other debut winners went into the "middle" category. These divisions were decided from the beginning of the data collection, and were not a post-hoc rendering to make particular points. I in fact did not record the precise data of the horse's initial position, instead just the category.

A final variable I collected was margin of victory.

                                                Overall success rate, and success rate by meet

A first question does not compare debut winners but instead looks at the general prognosis: what is the ordinary outlook for the types of good prospects I collected? We have all heard bearish statistics, and know the chances of a graded stakes winner for a starter never hits 10% no matter who his sire is, but how does this percentage change with a successful debut?

From a sample of 830, my overall finding was 23.4% graded stakes winners, $251,375 mean earnings, and $133,942 median earnings. I think some owners will find these numbers encouraging, and others will find them discouraging. I suppose I should have asked racing-enthusiast friends for predictions, so that I could understand which way the surprise factor leans.

Perhaps you are wondering how to tweak the earnings statistics for the 2012 environment....Looking at how much money the debut winners went on to make from each year, there isn't a pattern of more recent debut winners going on to do better than the overall mean and median. Data for the 2005 graduates were a $236,314 mean, for instance, and a $135,720 median, with 21.3% eventual graded stakes winners. The years above the overall median in earnings were 1997, 1999, 2002, 2003, and 2005; the years above the overall mean, 1999, 2000, 2002, and 2003. Multiple years in the first half of the study made the top half for both mean and median.

While I'm on the division by years, the range in the number of qualifying debut winners was narrow, ranging from 75 to 89 a year. So we would expect each year to represent 10% of the sample, and in fact the range is from 9.0% to 10.7% of the sample. Changing foal crop numbers, and the emergence of turf racing, have not appreciably changed the number of maiden special weight sprints on the dirt for young horses. Or perhaps these races are more infrequent, but there are more debut wins now.

If I had continued the study past 2005, I would have had a decision to make about including synthetic races, and if I had decided that in the negative, the pool of races would have decreased. But none of the meets I studied had synthetic tracks in the '96-'05 period.

16.7% of the debut winners eventually won short graded stakes, and 11.8% eventually won long graded stakes (see opening section for definitions of short and long). 5.1% were dual short and long graded stakes winners. So horses beginning successfully in the classic way, by winning their sprint debut, were more likely to win short graded stakes than long graded stakes.

It would be worthwhile to compile all graded stakes by distance, and see if there are also more short than long. If not, then does winning at a sprint distance signal a sprint proclivity, setting aside style? Does winning first out indicate that a horse is almost "too fast" for his own good, and is not likely to matter much at longer distances? Or maybe, while beginning in routes is comparatively rare, perhaps that is the pathway for horses who most often do best in consequential routes down the road?

The issue would be hard to assess. I saw few winners in my sample logging top turf careers, and the overall ratio of long to short graded stakes is raised greatly by turf races. The fact that dirt sprint debut winners almost always stay on dirt is interesting in itself, particularly if this trend is as extreme as it seemed to me. The idea that a talented turf horse may sometimes start as a dirt horse, or fool his trainer first out with a bang-up dirt performance, may not be play out often in reality....

This is not specific to debut winners, and may be better addressed with more expansive datasets, but if you can get a graded stakes winner, what kind of money are you looking at earning? Median earnings for graded stakes winners were $484,761.50. Horses who only won graded stakes at a mile or less had median earnings of $358,148.50, while those who only won graded stakes at over a mile had median earnings of $586,789. Part of the advantage for graded stakes routers probably consists of higher purses in those races; another possibility is just that the graded stakes routers are better, more consistent, or more dominant overall than their counterparts. Short and long graded stakes winners, guaranteed of being multiple graded stakes winners, had the highest-earning careers of all, with median earnings just shy of a million ($965,346).

Despite my hopes of including tracks all of a certain level, there is evidence that the 23.4% graded stakes winners rate is not always 23.4%, and does depend where the horse wins. Indeed, using Chi-Square, graded stakes status by Meet was significant at the .05 level. Obviously, I wasn't necessarily expecting Aqueduct Fall debut winners to grade out as well as Saratoga ones, but I wanted general comparability so that Meet would not be an overriding factor. To the extent that it is, at least it is one I can account for.

                                         Rated by percentage of graded stakes winners

1. Hollywood Spring/Summer 35.4%
2. Saratoga 34.7%
3. Del Mar 28.8%
4. Belmont Fall 25.4%
5. Hollywood Fall 25.0%
5. Santa Anita Fall (Oak Tree) 25.0%
7. Santa Anita Winter 22.5%
8. Keeneland Fall 21.7%
9. Belmont Spring/Summer 19.2%
10. Churchill Spring/Summer 18.2%
11. Gulfstream 16.8%
12. Churchill Fall 16.4%
13. Aqueduct Fall 8.5%

Although sample sizes are small, the order is strikingly consist with reputation. It doesn't always seem to me that Saratoga debut winners run faster than Belmont debut winners, for instance, but either they do, or they have qualities that don't come out in the race time that allow them to do better later.

The surprise may be Hollywood Spring/Summer. Winners from qualifying races from the meet who went on to be graded stakes winners were A. P. Warrior, Buffythecenterfold, Came Home, Diplomat Lady, Dixie Union, Double Honor, Essence of Dubai, Exchange Rate, Hookedonthefeelin, Inspiring, Miss Houdini, Purely Cozzene, Roman Ruler, Ruler's Court, September Secret, What a Song, and Worldy Manner. It's natural to think the success of the graduates may have had more to do with the time of the year than anything else, but if you can find a pattern with time of year and eventual success, looking at the other meets, you're seeing something I don't.

Although only 36 maiden winners from the meet made the study, Hollywood Fall also rated very well. When we shift from graded stakes winners to earnings, Hollywood Fall is on top for both mean and median (barely, but still). In general, the ratings with earnings followed the ratings with graded stakes winners closely. For instance, the meets 2nd, 3rd, and 4th in mean and median earnings (although not in the same order) were Hollywood Spring/Summer, Saratoga, and Del Mar -- 1, 2, and 3 on graded stakes winner percentage.

An exception to the correspondence between earnings and graded stakes winners was Oak Tree; tied for 5th with Hollywood Fall in graded stakes winner percentage, Oak Tree was 10th among the meets in average earnings, and last in median earnings. The smallest sample size in the study (n of 32) is just one reason I wouldn't necessarily expect the poor performance to repeat if I examine later years.

Sprint/long graded stakes success is difficult to make much sense of on a meet by meet basis (again because of sample size), so I tended to aggregate in meaningful ways to find trends. One may be that the California tracks had a higher percentage of their graded stakes graduates earn the distinction sprinting. Aggregating the five California meets, 56 debut winners proceeded to win short graded stakes, and 32 proceeded to win long graded stakes (a ratio of short to long graded stakes winners of 1.75). At non-California tracks, the corresponding numbers were 82 and 66, for a ratio of 1.24.

The ratio is different, but not overwhelmingly different, to my eye. The most probable cause is simply the ratio of short and long graded stakes in California run compared to at other tracks over the past 15 years. Another possibility, and an intriguing one, is that the trainers who win first out in California (i.e., Bob Baffert) are more speed than distance trainers (not that Baffert's nine Triple-Crown race wins are something to gloss over when assessing his distance-training ability).

When I speak of non-California tracks, Saratoga is far and away the biggest contributor of graded stakes winners, so must be assigned much of the credit (or blame) for the sprint/long graded stakes winner ratio for the whole sample. (The Saratoga ratio was 29-24).

Saratoga's rate of 19.8% qualifying winners proceeding to win graded stakes over a mile at some point in their careers is very impressive. It also makes it clear that Saratoga's 34.7% 2nd-place overall rate of graded stakes winners is not a function of horses returning in one of the couple of graded stakes that complete the meet. Those races are sprints, and the 19.8% long graded stakes winners displays that Saratoga is not making its own, phony graded stakes winners (I guess my worry here was a little absurd, franky, since plenty of other meets have 2-year-old stakes intended for previous maiden winners).

Getting back to differences in sprint/route success in California vs. elsewhere, an interesting comparison is the Santa Anita winter meet vs. the Gulfstream winter meet. Santa Anita Winter had 13 sprint graded stakes winners, and just 7 long ones; Gulfstream had 13 of both.

There is additional data that I don't have that could certainly enable us to better understand the findings. It would be nice to know the relative chances of a firster and a non-firster winning at each meet, and it would be nice to know the future success of the non-firsters winning at each meet. Which meets have the best 2-year-olds and early 3-year-olds in their maiden special weights, and which just have the most formful maiden special weights (i.e. have the eventual top horse win first out the most often)? These are fascinating questions to me, but more descriptive of a process, than revealing about future prospects.

Reflections on a racing form

Sunday, June 24, 2012

Research study, part I: The future success of first-out winners, as moderated by time of year and winning track

No comments:

Post a Comment