SkyKing162's Baseblog

A fan of the Yankees, Red Sox, and large sample sizes.


Studes mentioned this blog in the reference section of his Angels article over The Hardball Times and there have been a lot of extra hits here today. (That is, more than 7.) Thanks for the mention, and I'm glad to have "inspired" somebody else. I used to dislike the Angels, but the attitude taken by Arte Moreno, their new owner, has really impressed me. Their fast start has my prediction of them clearly winning the AL West looking pretty good right now. (And no, I'm not going to mention my bad predictions, like the fact that Miguel Cabrera will have a poor sophomore year.)


I'm not quite ready to type up a full entry on the idea, but I've been thinking a lot lately about fantasy valuation. For a long time I've been a big fan of the replacement model, where players get credit for their stats above and beyond positional replacement level, as a percentage of the overall pool. I'm changing my mind. To what, I'm not exactly sure yet.

Replacement makes sense. If everyone has to have a shortstop, the 8 homeruns that every available shortstop will hit aren't worth anything. If there are 500 homeruns available above replacement, 10 homeruns above replacement should be worth 10/500 of the money allocated to homeruns. You pay for stats based on supply and demand.

But here's the issue. Every category should have the same amount of money spent on it. (Well, not necessarily, but that's just a different, independent modification, so I'll ignore it for now.) Therefore the average money spent on homeruns by all teams should be the same as the average money spent on stolen bases should be the same as the average money spent on ERA, etc. And if a team spends that average amount of money in a category, they should expect to earn average roto points on that category.

Now, what if a team spends more money that average on one category, and that same amount less than average on another category? In theory, the number of points gained in category one should cancel out with the number of points lost in category two. But does replacement theory guarantee that? No. In fact, it's quite possible that spending an extra $15 on one category will get you 3 points, while spending an extra $15 on another category will only get you 2 points.

If you think of the number of stats needed to get a certain number of points in terms of the money you need to spend to get those stats, then you're really dealing with with the same unit for every category, that unit being money. (I think of this as normalization - but in a non-statistical way). Not all categories are are spread out in the same way. In some categories, it taks a lot of money to move from middle-of-the-pack to first place. In other categories, the distance is a lot closer.

To quote from this good thread over at Mastersball.

Traditionally, different categories tend to be spread out differing amounts. Why? I'm not exactly sure, but probably because the stats of some categories are spread between a lot of players (runs, rbis, strikeouts) while others are dominated by a small few (sbs, saves). So let's say the twelve teams are distributed like so in the HR and SB categories (the average team will spend 400/12 = $33 per category). These example distributions are both more dense towards the mean, and spread out towards the ends, which is typical. It's just that SBs are spread out more.
HR: 18, 22, 25, 28, 30, 32, 34, 36, 38, 41, 44, 48

SB: 13, 18, 22, 26, 29, 32, 34, 37, 40, 44, 48, 53

Middle of the road value gets you middle of the road points in both categories. If you finish $11 below average in HRs, you get 2 points, while $11 below average in SBs gets you 3 points. If you finish $11 above average in HRs, you get 11 points, while $11 above average in SBs gets you 10 points.

Thus, changing from a $22 HR/$44 SB split to a $44 HR/$22 SB split gains your team two points in the standings for doing next to nothing.

What a SD valuation model does is account for the distribution of each category, and motivate you to spend more money on the tighter categories, while spending less on the more spread out categories. Once each stat is normalized (by subtracting out the mean and dividing by the standard deviation -- a z-score for those that have taken stats) then you can just add up each player's z-scores and use one big replacement level. If you want to weight different categories differently (hitting/pitching for example, or strikeouts more than wins because they're a more efficient category), then just multiply each category's z-score above replacement by some factor.

So yes, I'm likely to eat a lot of my words that I've spewed about the replacement model over the years. But I think there might be some merit to combining the two techniques. It may makes sense to take into account replacement level before normalizing each statistic. I'm not sure. It may turn out not to matter.

Comments: Post a Comment