Thursday, June 28, 2012

Research study, part IV: Interpreting debut margin of victory

This section of the presentation considers margin of victory of the selected debut winners -- what it can tell us about them going forward, and what it reveals about maidens with young horses in general.

First, the sample drops by eight to 822 because I did not register a margin of victory for horses who won by disqualification. There was also the question of what to do with margins of under a half length: I counted nose and head wins as wins by a tenth of a length, neck wins as wins by a quarter of a length, and deadheats as wins by 0 lengths. Those conversions went into calculations involving margin.

Average margin of victory was 2.90 lengths and median margin of victory was 2 lengths. The most lopsided win was eventual Breeders' Cup Juvenile Fillies' winner Cash Run's 15.75-length score at Belmont in 1999.

Below I display wins by "lengths truncated." What this means is that wins by 1, 1 1/4, 1 1/2, and 1 3/4 all go into the 1-length bucket, wins by 2 lengths, 2 1/4 lengths, 2 1/2 lengths, and 2 3/4 lengths go into the 2-length bucket, etc. Percentage of total wins is the first number to the right of the group, followed by percentage of total wins by that margin or less (also known as cumulative frequency). The final column is the percentage of graded stakes winners among horses who won by the particular margin. Taking you through one row, 11.9% of debut winners won by 3-3.75 lengths, and 72.3% of debut winners won by 3.75 lengths or less. The horses in the 3-3.75-length bucket were graded stakes winners 19.4% of the time.

Table 1

<1 25.9 25.9 16.9
1 and 1+ 19.2 45.1 24.1
2 and 2+ 15.2 60.3 25.6
3 and 3+ 11.9 72.3 19.4
4 and 4+ 9.4 81.6 24.7
5 and 5+ 5.0 86.6 39.0
6 and 6+ 4.3 90.9 20.0
7 and 7+ 2.7 93.6 31.8
8 and 8+ 1.5 95.0 8.3
9 and 9+ 1.2 96.2 30.0
10 and 10+ 1.0 97.2 37.5
11 and 11+ 1.8 99.0 60.0
12, 13, 14, 15+ 0.9 100.0 50.0

This is surprising to me, but horses who give their backers anxious moments and win their debuts by less than a length may really be a lesser group than horses who simply win by 1-2 lengths. As the first column after the group indicates, the sample sizes are greatest for those two situations, and the difference in percentage of graded stakes winners relating to them provides food for thought.

A strict reading of the graded-stakes percentages says that winning by 2-4+ lengths does not bode better down the road than just winning by 1-2 lengths, but the actual best read is tricky. The samples in the individual buckets, ranging from 77 to 158 there, are not big enough that we can take the individual graded stakes percentages too seriously. On the other hand, wins of more than 2 lengths and less than 5 comprise 36.5% of the data (300 cases), and wins of 1-5 lengths comprise 55.7% of the data (458 cases). The absence of movement in the GSW % between 1+ and 4+ lengths convinces me that with the desired added data, I would at least find no more than a weak trend between margin of victory and graded stakes winning between 1 and 5 lengths.

I'll offer a couple of other breakdowns: horses who won by 5 lengths or more became graded stakes winners a third of the time (33.1%); horses who won by 8 lengths or more became graded stakes winners 37.7% of the time (20 for 53). Horses who won by 5-8 lengths became graded stakes winners 30.6% of the time.

Beginning the breakdowns with 5 lengths is not arbitrary. Not only did the percentage of graded stakes winners shoot up in the 5 and 5+ bucket, but the steep drop in the 6 and 6+ bucket makes it indisputable that the sample sizes have become small enough to render bucket-by-bucket comparisons completely unreliable at that point on the chart. Picking 8 lengths  as the cut-off for the top margins was mostly arbitrary (or if it wasn't, I can't remember why it wasn't), but was not picked to jack up the graded stakes winning percentage; you can see the 8 and 8+ length horses went just 1-12 by the graded-stakes-winning measure. The sample size dropped from 22 to 12 from 7+ to 8+ lengths, and I think that, in addition to considering how many horses I might want to look at in the top group, guided my decision. The good thing is that you have all of the data, and you can cut and dice it anyway that you like, and I frankly suspect some of you will be far less disciplined and thoughtful than I was.

I'm a little bit surprised that margin of victory by itself cannot support more than maybe a 38% prediction of graded-stakes winning success. After all, narrowing to Hollywood Spring/Summer and Saratoga, without knowing margin of victory, can get us to 35%.

Part of the reason that margin of victory didn't factor more heavily is that it's not the best measure of performance. Speed figures are probably much better, and margin over horses back in the field probably is, too.

My study focuses on style in the debut win, not strength of win. That's why what I'm doing is an interesting approach. I'm asking whether wins that look of equivalent quality really are of different value because of how they are achieved. I included margin because it was so easy to record, and with enough sample size, could serve as a control for strength of win.

Another way of viewing the relationship between margin and success is to start with graded stakes-winning status rather than with margin. If you do that, you see that the average eventual graded stakes winner won by 3.53 lengths, and the average eventual non graded stakes winner won his or her debut by 2.70 lengths. That looks like a reasonable difference, but using something we statisticians and statistical dabblers call effect size, it actually comes out as small.

In Part II I showed that chances of winning decrease dramatically and consistently as horses get farther away from the early lead. Well, horses who do overcome the bias of being well behind early and win, win by less than horses who set or attend the pace. The exact trend seen in winning percentage reproduces itself in terms of margin.

Table 2. Mean margin of victory by style

Led 3.82 lengths
Pressed 3.09 lengths
Middle 2.29 lengths
Behind 1.58 lengths

Table 3. Median margin of victory by style

Led 3 lengths
Pressed 2.5 lengths
Middle 1.75 lengths
Behind 1.13 lengths

Table 4. Largest margin of victory by style

Led 15.75 lengths
Pressed 13 lengths
Middle 11 lengths
Behind 6 lengths

During the couple of months that I retrieved and compiled all of this data, I needed diversions to keep up my spirits. One was charting the biggest-margin "behinder" winner, and another was noting the top margin, which wasn't very large, by conventional standards.Don't Get Mad won, with a 6-length debut win. He was followed by Bella Bellucci, at 5 3/4. None of the other 84 behinders won by more than 4 1/2 lengths.

To get an idea of how stark the difference in the most one-sided wins was between "behind" and the other styles, wins of more than 6 lengths represented 11.8% of the sample over all. From 86 races won by behind-horses, we might have expected 10 wins of more than 6 lengths. Instead, we didn't get any.

Part III showed that the "behind" group has the highest percentage of graded stakes winners and the highest average earnings. We know now that horses who win by more lengths turn out to be better, on average, than horses who win by fewer lengths. We know that the "behind" horses often make it close when they win. So the success of the "behind" group is surprising in light of their typical margins of victory. Since future performance does not show strong signs of differing by winning style, but average margin of victory does differ by style, I think it would be reasonable to assess margin of victory (and probably speed figures too) entirely in the light of style. To say, o.k., this horse won by 3 lengths from behind, but from a normative standpoint, that's as good as winning by 6 lengths on the lead (the frequency of 3+ length wins from "behind" is the same as 6+ length wins among horses who won after leading at the first call).

We need to do another analysis. You might be wondering if margin of victory is really more important than I'm saying -- if the success that the "behind" have despite their low margins of victory is depressing the overall relationship between margin and success. The solution (at least the clunky one without benefit of a statistical model) is to compare the average margin of graded stakes winners and non graded stakes winners not just overall, but within each style.

Table 5.

Led

GSW 5.13 lengths
Non GSW 3.41 lengths

Pressed

GSW 3.54 lengths
Non GSW 2.95 lengths

Middle

GSW 2.51 lengths
Non GSW 2.23 lengths

Behind

GSW 2.03 lengths
Non GSW 1.41 lengths

This is a crude comparison, but the overall difference in margin for GSWs and Non GSWs was 0.83 lengths, and if we average the difference in the four groups above, the difference is 0.80 lengths. So the importance of margin in future success is not obviously greater than I found it to be before.

Making the comparisons of GSWs and Non GSWs by style had one unexpected benefit, though. It reaffirms that margin does matter for behinders, as it does for typical debut winners. And margin seems much more important for the wire-to-wire pattern than for the other progressions.

This makes sense. It is only when a horse is let go from the beginning that we see how fast he can really run, and not in part how fast his opponents were enabling him to go. It's as if the issue shifts to brilliance, or lack thereof, when horses lead from the start, and winning big is a sign of brilliance (effect sizes again suggest not to overstate the relationship, however).

A regional difference emerges charting average margin of victory by meet. Margin of victory is greatest at the New York tracks, and smallest at the California tracks. The pattern is almost perfect.

Table 6. Average margin of victory by meet

1. Saratoga 3.51
2. Aqueduct 3.51
3. Belmont Fall 3.31
4. Belmont Spring/Summer 3.14
5. Keeneland Fall 2.99
6. Gulfstream 2.90
7. Churchill Fall 2.75
8. Del Mar 2.70
9. Churchill Spring/Summer 2.63
10. Hollywood Fall 2.42
11. Hollywood Spring/Summer 2.31
12. Santa Anita Winter 2.25
13. Santa Anita Fall 2.16

Of the 260 debut winners at California tracks, none won by more than 11 lengths. Seven of the other eight meets had winners of more than 11 lengths, and six had more than one. Yet it was not just fewer lopsided or less lopsided wins that were depressing the averages in California, because median margin of victory also has three California meets ranked at the bottom, with another California meet tied for fourth-lowest.

Something seems to be going on here, but I'm not sure I know what it is. I can well believe that the Belmont maidens might have small fields, leading to large margins of victory. But would small fields be an accurate characterization of Saratoga's maidens, or even Aqueduct's? And if field size is the main explanation, are California's maiden races more dense with horses than Kentucky's or Florida's? That comes as news to me.

I entertained the idea that something I didn't track, percentage of debut winners from debut starters, might be lower in California, and that the winning margins the first-time starters register might in turn be more modest. But why then does Hollywood Spring/Summer have such a low average margin of victory, when the races there often pit first-time starters against first-time starters?

In general, the close nature of California maidens just speaks to me of their competitiveness. In California, you have the most graded-stakes-winning graduates, and the closest races. A win in a California maiden special weight means something.

I next tackled whether margin of victory differed in importance by meet. Again, there were indications of a regional difference. It's not a significant difference, but the California graduates who failed to win graded stakes won their debuts by more lengths than the California graduates who did win graded stakes (2.43 to 2.33). Outside California, graded-stakes winners outperformed non graded stakes winners on length of maiden victory, 4.22 to 2.82.

Again, this is a bit of a puzzler. Are there (or were there, since this was 1996-2005) differences in attitudes about runaway wins in California than there were at prominent meets in other parts of the country? Was it more common for winners to be eased up in California, rendering margin meaningless? You might think we're looking at effects of synthetic in California vs. non-synthetic elsewhere, but remember, these maiden wins came before there were any synthetic tracks in California.







No comments:

Post a Comment