SkyKing162's Baseblog |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A fan of the Yankees, Red Sox, and large sample sizes.
- My Links - 2003 DIPS Roto Values
- Useful Stats Links -
- Places I Visit Daily -
- Article Hall of Fame - Atom Feed - Archives -
|
6.28.2003
NETruns I've been thinking a lot lately about the correct theoretical concepts behind win shares. Here's the state of the union: Claim points for hitting win shares are runs created above replacement. I call them extra runs created. ExtraRC = (RC/out - .5*lgRC/out) * outs. Claim points for hitting loss shares are runs created below ideal. I call them lacking runs created. LackRC = (1.5*lgRC/out - RC/out) * outs. Claim points for hitting game shares are expected runs produced. That is, the number of runs expected to be produced given a player's outs and a league-average RC/out rate. I call them total runs created. TotRC = lgRC/out * outs. Note that extraRC + lackRC = totRC. This is good. This means that win shares + loss shares = game shares. What's the best metric to use to value players? It should reward players for extraRC and punish them for lackRC. So what better calculation than simply win shares - loss shares, or extraRC - lackRC. I call these net runs created. NETruns = (RC/out - lgRC/out) * outs * 2. That 2 pops into the equation when you run through the algebra of subtracting lackRC from extraRC. It doesn't make a difference when comparing players, just changes the scale a little bit. The issue of leaving the 2 in the equation versus taking it out is comparable to a team being 10 games over .500, but only being 5 games ahead of a .500 team. Both are correct; they just present the information in slightly different ways. So, we've got our metric of NETShares. It rewards players for quantity and quality compared to average. Let's see some 2003 numbers through June 26: Top 10 Overall:
Bottom 10 Overall:
Top 10 Firstbasemen
Top 10 Secondbasemen
(continued in next post... stupid blogger errors) Top 10 Shortstops
Top 10 Thirdbasemen
Top 20 Outfielders and Designated Hitters
Top 10 Catchers
For the full list of players and their NETruns, click the link on the sidebar. 6.27.2003
Beating the Ratio Conversion Horse Motivated by my recent interest in win shares, loss shares, and game shares, I had an idea concerning the conversion of ratio stats into counting stats when valuing fantasy baseball players. Traditionally, a baseline ratio is chosen, and the counting stat becomes how many hits/runs/whatever a player is better than that baseline would be, given the same playing time opportunity. For example, if Derek Jeter hits .280 in 600 ABs, he has 12 extra hits compared to a .260 hitter with 600 ABs: (.300-.280)*600 = 12. There's nothing magical about the baseline - it could be "anything." Many people choose the anticipated AVG of the last place roto team. Thus, extra hits becomes a measure of how many hits each player is helping your team do better than last place. I think this is because players are often compared to replacement level during baseball analysis. But since the next step in the valuation process is to find the replacement level number of extra hits, using the replacement level AVG to find extra hits is not a requirement. It may turn out to be the best, but it's not necessarily the best just by definition. Win shares compares a player to replacement level. Loss shares compares a player to ideal level. Both are needed to get the full picture of a player's performance. So why not do the same for roto values? Let's compare a batter's AVG to both a replacement level (anticipated last place AVG) and an ideal level (anticipated first place AVG). This would measure how much a player helps pull you up from the dredges of last place, but also how much he's preventing your team from finishing in first. Here's an example: Derek Jeter AVG: .280 in 600 ABs Erubiel Durazo AVG: .270 in 450 ABS last place AVG: .260 first place AVG: .290 DJ netHits = (.280-.260)*600 - (.290-.280)*600 = 12-6 = 6 Ruby netHits = (.270-.260)*450 - (.290-.270)*450 = 4.5-9 = -4.5 What does this number really mean? Well, let's do some algebra (oooooh): netHits = (plAVG-repAVG)*AB-(idealAVG-plAVG)*AB netHits = [(plAVG-repAVG)-(idealAVG-plAVG)]*AB netHits = [2*plAVG-(repAVG+idealAVG)]*AB netHits = [2*plAVG-2*(meanAVG)]*AB where meanAVG is the mean of repAVG and idealAVG netHits = 2*(plAVG-meanAVG)*AB This is simply twice the number of extra hits when using meanAVG as the baseline. It would have to be tested, but I'm pretty sure the mean of the repAVG and idealAVG is pretty close the mean AVG of the draftable player pool. Since multiplying counting stats by constants doesn't affect value, using this method is the same as using extra hits compared to meanAVG. This excercise is just one more thing that makes me think using meanAVG as the baseline has some merit over repAVG. I remember Todd Zola claiming repAVG is better because the empirical results turn out "better," but I wonder if that isn't more an issue of making up for faulty projections versus theoretically correct. I guess it should be tested using year-end stats and values. 6.25.2003
Win Shares + Loss Shares = Game Shares This was originally a post at RotoJunkie. It's been modified to seem more like an article for my blog. If you want to comment, head on over to the Sabrmetrics forum. Enjoy... As a burgeoning Sabrmetrics groupie a few years ago when Win Shares hit the market, I ate it up. I thought it was the coolest ranking method. Well, almost the coolest. You see, there were things that just bugged me about Bill James' Win Shares system, mostly the (many parts) where James made decisions more subjectively than objectively. For example, those weird 40/30/20/10 weighted scales for valuing individual defense, the 52/48 argument (I think pitchers are undervalued, so I'll give 'em more points), and the fact that a team's Win Shares are directly proportional to wins, when there is a lot of variability in wins given a certain ability level. But it wasn't until I really started doing lots of my own baseball analysis that something else started to bug me. I couldn't explain it until I read a pdf document put together by TangoTiger and Rob Wood from Baseball Primer. Plain and simple... Win Shares just aren't a complete, useful metric without their counterpart, Loss Shares. Take two pitchers, Bob and Nolan. They don't seem like pitchers of equal ability, except that Bill James (via Win Shares) says they are. So, are they? Oh, you want some stats...ok, here you go: Bob: 200 IP, 3.00 ERA, 0 BB, 0 SO, 0 HR (yes, every batter puts the ball in play - I love extreme examples) The common assumption in the post-DIPS world is to say that half the credit for the results of balls in play go to the fielders and half to the pitchers, so I'll stick with that. Thus, since all of the batters Bob faced put the ball in play, Bob gets half the credit (half of all is half) for runs saved during his time on the mound: (6.75 ERA - 3.00 ERA)/9*200*.5 = 41.7 runs prevented, where 6.75 is the replacement level ERA of 1.5 times league average ERA. Nolan: 200 IP, 4.00 ERA, with peripheral stats such that Nolan garners 75% of the credit for runs prevented (this doesn't mean half the batters he faces put the ball in play - it only means that half the runs scored are a result of balls put in play and the other half are a result of balls not put in play) Nolan's credit = (6.75 - 4.00)/9*200*.75 = 45ish runs prevented. Let's assume that both Bob and Nolan pitch in front of fielding teams with the same ability. Win Shares says Nolan deserves more credit, because he prevented more runs, even though he pitched the same number of innings with an ERA a full run higher than Bob. Seriously, would you want Nolan as your pitcher, or Bob? Do you want 200 IP with a 4.00 ERA or a 3.00 ERA? Seems like there's something fishy going on... The idea behind Win Shares is that it assigns credit (Win Shares) on the basis of responsibility. Nolan's responsible for preventing more runs than Bob, thus he receives more Win Shares. But he's also responsible for allowing more runs than Bob, which Win Shares ignores. What's need is Loss Shares. Bill James even alludes to the fact that he thought about Loss Shares, but left them out because he couldn't figure out how to calculate them. But they're critical if you want to look at the whole picture. In order to compare Bob and Nolan, we also need to know how many extra runs more than Bob Nolan was responsible for allowing, in addition to the number of extra runs Nolan was responsible for preventing. I don't claim to have come up with a way to calculate Loss Shares (although I believe Tango and Rob did in their pdf file), but let's do a little calculation that's on the right track. Let's define "runs allowed^" as runs allowed above the "ideal pitcher" (the positive version of the replacement pitcher). Where the replacement pitcher has an ERA of 1.5*lgERA, the ideal pitcher has an ERA of .5*lgERA = 2.25 in our example. (Yes, pitchers often have better ERAs than this ideal ERA, but the point will still get across - plus, pitchers can, and do, have ERAs worse than replacement level.) Bob gets charged with allowing^ (3.00-2.25)*200/9*.5 = 8.3 runs worse than ideal. Nolan gets charged with allowing^ (4.00-2.25)*200/9*.75 = 28.5 runs worse than ideal. Hmmm, Bob's responsible for allowing way fewer runs below ideal than Nolan. Let's combine runs prevented and runs allowed^: Bob NET = 41.7 - 8.3 = 33.4 Nolan NET = 45 - 28.5 = 17.5 Thus, while Win Shares gives equal credit to Bob and Nolan, "NET Shares" would say Bob's performace was about twice as valuable as Nolan's. Why? Because while Nolan prevents more runs, he's also given more responsibility (aka opportunity) to prevent runs. And with more opportunity, Nolan's also allowing more runs, so much more that his advantage in Win Shares gets negated. It's like saying the Braves are better than the Tigers because they won 94 games to the Tigers' 68, without knowing how many games both teams played. If both played 162 games, the Braves are better, but if the Braves played 200 games and the Tigers played 100, the Tigers aren't more impressive? The same idea holds for Win Shares. Nolan uses a bigger chunk of the defensive opportunities (Game Shares) than Bob and thus should be held accountable for it. So in addition to Win Shares, we need Loss Shares. Together the two imply an all-encompassing stat - Game Shares. Because Nolan's more responsible than Bob for runs while he pitches, Nolan has more Game Shares. 6.23.2003
DIPS Numbers Through June 15 I've finally figured out how to create and post html at an actual website, so now my "daily" DIPS numbers will be available for all to see. Here's a quick rundown of what each number is: ERA: tried and true Earned Run Average XERA: Extrapolated Earned Run Average - what you'd expect a pitcher's ERA to be based on his actual component stats (e.g. singles, walks, strikeouts) and league average ER/R rate $ERA: expected ERA using a pitcher's unadjusted $BB, $SO, and $HR rates combined with his team's average $H, $2B, and $3B rates dERA: expected ERA using a pitcher's adjusted (to a neutral park) $BB, $SO, and $HR rates combined with MLB-average $H, $2B, and $3B rates. This is the "DIPS ERA." rdERA: same thing as dERA, but with a pitcher's $BB, $SO, and $HR rates regressed "appropriately" towards the mean. Currently, appropriately means about .3 for $BB, .2 for SO, and .5 for $HR. Only the current season is considered. In the future, I hope to work in a regression rate that's a function of BFP for the current season. And, in the grand scheme of things, the goal is also to incorporate past season performance with current performance to get a good predictor of future performance. $H=(H-HR)/(AB-SO-HR) $BB=BB/PA $SO=SO/AB $HR=HR/(AB-SO) 6.20.2003
Bret Boone Versus Alfonso Soriano 2003
Ok, so make me an argument that says Soriano should start the All-Star team over Boone. The only possible things I see in favor of Soriano are these: - Soriano's had more plate appearances (about 35 it looks like) - Soriano's stealing many more bases at the same success rate My rebuttal (after laughing profusely) is thus: - 35 PA isn't very many and compard to the quality difference, is insignificant - The stolen base difference equals about 3 runs according to linear weights - not a big deal. Points in favor of Boone, put simply: - .040 OBP advantage - .090 SLG advantage (for a .130 difference in OPS for those scoring at home) - Half the time Boone hits at Safeco - Boone plays kick-ass defense, whereas Soriano merely holds his own. Let's do a quick runs created analysis: Soriano: .341*.513/338 = 59 Boone: .381*.606/304 = 70 Definitely an advantage for Boone, but here's the kicker - consider outs (AB-H+CS) and RC/27 outs: Soriano: 229 outs yields 6.7 runs/game Boone: 189 outs yields 10 runs/game Ok, it's an extremely rough analysis, but much more accurate than the general argument of "Soriano's such a great athlete and can hit any pitch." Sure, he can probably do athletic things than Boone (or most others) can't, but that's not the issue - the issue is which player is doing more to help his team win. And it should be obvious that it's Boone. |