There are two breeds of vanilla, free-as-in-beer zone rating available in the world: STATS and BIS. I already have a dumb projection system for STATS ZR, which could be refined (aging curves and speed/tools scores are the two major refinements I’m musing over.)
But first I wanted to introduce BIS’s RZR into it. And therein lies a dilemma, folks. Here’s the averages for RZR and OZR (OOZ divided by BIZ) over the years available at The Hardball Times:
POS | YEAR | Plays | OOZ | BIZ | RZR | OZR |
1B | 2004 | 4070 | 1783 | 5406 | .753 | .330 |
1B | 2005 | 4343 | 1940 | 5493 | .791 | .353 |
1B | 2006 | 3877 | 2012 | 4851 | .799 | .415 |
1B | 2007 | 4963 | 1048 | 6695 | .741 | .157 |
1B | 2008 | 2871 | 847 | 3815 | .753 | .222 |
1B | Total | 20124 | 7630 | 26260 | .766 | .291 |
2B | 2004 | 9863 | 1203 | 12129 | .813 | .099 |
2B | 2005 | 10403 | 1478 | 12825 | .811 | .115 |
2B | 2006 | 10401 | 1211 | 12679 | .820 | .096 |
2B | 2007 | 10120 | 1412 | 12192 | .830 | .116 |
2B | 2008 | 6313 | 649 | 7693 | .821 | .084 |
2B | Total | 47100 | 5953 | 57518 | .819 | .103 |
SS | 2004 | 9872 | 1919 | 11995 | .823 | .160 |
SS | 2005 | 10484 | 1948 | 12821 | .818 | .152 |
SS | 2006 | 10809 | 1659 | 13218 | .818 | .126 |
SS | 2007 | 10625 | 1912 | 13019 | .816 | .147 |
SS | 2008 | 6353 | 999 | 7627 | .833 | .131 |
SS | Total | 48143 | 8437 | 58680 | .820 | .144 |
3B | 2004 | 6215 | 2074 | 9007 | .690 | .230 |
3B | 2005 | 6813 | 2396 | 9271 | .735 | .258 |
3B | 2006 | 7686 | 1636 | 10880 | .706 | .150 |
3B | 2007 | 7221 | 1717 | 10623 | .680 | .162 |
3B | 2008 | 4444 | 1003 | 6344 | .701 | .158 |
3B | Total | 32379 | 8826 | 46125 | .702 | .191 |
CF | 2004 | 9478 | 2034 | 11905 | .796 | .171 |
CF | 2005 | 10266 | 1963 | 12590 | .815 | .156 |
CF | 2006 | 10316 | 2002 | 11534 | .894 | .174 |
CF | 2007 | 10886 | 1944 | 12264 | .888 | .159 |
CF | 2008 | 5922 | 1583 | 6468 | .916 | .245 |
CF | Total | 46868 | 9526 | 54761 | .856 | .174 |
LF | 2004 | 7710 | 847 | 12242 | .630 | .069 |
LF | 2005 | 8686 | 718 | 13712 | .633 | .052 |
LF | 2006 | 7723 | 1634 | 8971 | .861 | .182 |
LF | 2007 | 8014 | 1614 | 9373 | .855 | .172 |
LF | 2008 | 4475 | 1076 | 5060 | .884 | .213 |
LF | Total | 36608 | 5889 | 49358 | .742 | .119 |
RF | 2004 | 8736 | 781 | 13442 | .650 | .058 |
RF | 2005 | 9181 | 695 | 14161 | .648 | .049 |
RF | 2006 | 8376 | 1686 | 9436 | .888 | .179 |
RF | 2007 | 8418 | 1575 | 9597 | .877 | .164 |
RF | 2008 | 4802 | 1205 | 5321 | .902 | .226 |
RF | Total | 39513 | 5942 | 51957 | .760 | .114 |
(2008 numbers will be slightly different from Studes’ numbers, as these are a few days old.) The projections for infielders are doable. But, as it stands, those outfield numbers are a horror show, taken by themselves.
So before we can make projections based upon RZR data, we first need to normalize it. I’m sure there are better ways than the one I’m using, but I don’t think I’m using the worst way either and it’s very expedient for my needs.
What I’m doing is dividing Plays, OOZ and BIZ by the totals for that season, and then multiplying by the averaged totals of all five years.
And, since I was rather short with the explanation the last time out, I’ll go ahead and spell out what I’m doing in full:
- First, as above, every player’s performance is “normalized” to an average of the past five seasons.
- Then, a weighted average of their past four seasons (05-08) is taken, with the most recent season being given a weight of 5, then 4, then 3, then 2.
- Two weights worth of a full season’s average defensive performance of the season is added as a regression to the mean component.
- 5 + 4 + 3 + 2 + 2 = 16, so everything gets divided by 16. I wouldn’t exactly call it a playing time projection, but it’s a rough guide to how much playing time a player might be expected to receive.
- Plays and Runs above average are figured for a full season’s performance, given the number of chances of the average player at that position from 04 through 08.
And… here are the projections. You can compare them to the STATS ZR projections, if you’d like.
(Note: Currently only players with a Baseball Databank ID who have appeared in 2008 are included in either projection set. The next step is to take the rest of the players in the RZR set, map them to the appropriate STATS ID, and run both projections side by side for all players who played in 2008, and maybe some who haven’t yet but could.)
So what’s next? Like I said before, these could really benefit from aging curves. (While I’m on the topic, Jon Shepherd over at Camden Depot has published RZR aging curves which are worth taking a look at. I have my own ZR aging curves which I should really try and get straightened out.) I really should probably run “projections” for seasons past and see how they match up with what actually happened.
And I want to work on combining data from multiple positions; I’ve done some comparisons of players who have played multiple positions, and my feeling from looking at the data is that in projecting a player’s zone rating, there really isn’t a lot of difference in difficulty in playing the different outfield positions – it’s not really much harder to catch fly balls in center field than it is anywhere else, but there’s a lot more fly balls to catch and so a good fielder is worth a lot more. But that’s worth exploring more, and there are some noteworthy sampling issues in that data; I find it hard to believe that a center fielder is below average as a first baseman defensively, for example. I should rerun this query on the RZR dataset here soon, see what that looks like.
Labels: Defense, Projections
0 Responses to “Projecting RZR”