tag:blogger.com,1999:blog-88011223664769638142024-03-07T21:49:43.896-08:00The Other FifteenEighty-five percent of the f---in' world is working. The other fifteen come out here.Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.comBlogger173125tag:blogger.com,1999:blog-8801122366476963814.post-58269701231301177442008-08-14T23:45:00.001-07:002008-08-14T23:45:32.443-07:00Another site announcement<p>There's been quite a bit of radio silence around here recently, and hopefully that'll be corrected soon.</p> <p>In the meantime, if you're interested in more data on Kosuke Fukudome than you can shake a stick at, well, <a href="http://www.goatriders.org/node/2412">find a stick and head on over to Goatriders</a>.</p> <p>Also, I've joined the crew over at <a href="http://mvn.com/mlb-stats/">Statistically Speaking</a>, along with Brian Cartwright. I'm real excited to be working with Eric, Pizza and Brian. <a href="http://mvn.com/mlb-stats/2008/08/15/creating-a-dynamic-fip-with-baseruns/">My first post is up</a>, and I should be on a Friday schedule from here on out. If you're interested in things like FIP and BaseRuns, you should definitely take a look.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com0tag:blogger.com,1999:blog-8801122366476963814.post-33273029478691131062008-08-01T22:49:00.001-07:002008-08-01T22:49:00.698-07:002008 pitcher Marcels<p><a href="http://otherfifteen.blogspot.com/2008/07/marcels-for-hitters.html">Like the hitters.</a> Witness the power of this <em>fully-operational</em> widget!</p> <iframe title="An EditGrid spreadsheet created by user/cwyers" style="border-right: #666 9px solid; border-top: #666 9px solid; border-left: #666 9px solid; width: 100%; border-bottom: #666 9px solid; height: 380px" src="http://www.editgrid.com/publish/html_book/user/cwyers/pitcher_marcels_inseason_2008?bgcolor=%23ffffff&fgcolor=%23000000&version=2&frame_style=border%3A9px%20solid%20%23666%3Bheight%3A380px%3Bwidth%3A100%25" frameborder="0" longdesc="http://www.editgrid.com/user/cwyers/pitcher_marcels_inseason_2008"> </iframe> <p>I haven't updated my table since the 29th, in case you were wondering. New for the pitcher Marcels is a reliability score; higher is better. None of this is park-adjusted. ERA and FIP-ERA are both provided; I’d be happier with either Component ERA or BaseRuns converted to earned runs, I suspect, but both of those sound like work. (And, in fairness to me, would require a lot more data than I’m capturing now – I suspect I could get more pitching categories if I really, really desperately wanted to, but it would increase the amount of work involved exponentially. Remember – spidering these from Baseball-Reference gives me ID mappings and ages easily, both of which are insanely important to doing the projections.)</p> <p>Remember: I am not projecting playing time, I am extrapolating playing time.</p> <p>SQL code:</p> <blockquote> <p>CREATE TABLE pitching <br />AS <br />SELECT ( CASE WHEN p.playerID is not null <br />        THEN p.playerID <br />        ELSE p.Player END ) AS playerID <br />    , 2008 AS yearID <br />    , 1 AS weight <br />    , p.Ag AS Age <br />    , SUM(p.G) AS G <br />    , SUM(p.GS) AS GS <br />    , SUM(p.H) AS H <br />    , SUM(p.ER) AS ER <br />    , SUM(p.HR) AS HR <br />    , SUM(p.BB) AS BB <br />    , SUM(p.SO) AS SO <br />    , SUM(p.HBP) AS HBP <br />    , SUM(p.IP) AS IP <br />    , SUM(p.BFP) AS BFP <br />from ( SELECT * from 7_29_08_pitching p <br />    LEFT JOIN ( select Player AS BPlayer, (AB+BB+SH+SF) AS PA from 7_29_08_batting ) b <br />    ON p.Player = b.BPlayer <br />    LEFT JOIN ( select bbrefID, playerID from bdb.master ) m <br />    ON p.Player = m.bbrefID ) p <br />WHERE ( p.PA < (p.BFP) OR p.PA IS NULL ) <br />    AND p.BFP > 0 <br />GROUP BY Player <br />UNION ALL <br />SELECT p.playerID <br />    , p.yearID <br />    , POW(0.999,(2008-p.yearID)*365) AS weight <br />    , (CASE WHEN m.birthMonth < 7 THEN ( p.yearID - m.BirthYear ) ELSE ( p.yearID - m.BirthYear - 1 ) END) AS Age <br />    , SUM(p.G) AS G <br />    , SUM(p.GS) AS GS <br />    , SUM(p.H) AS H <br />    , SUM(p.ER) AS ER <br />    , SUM(p.HR) AS HR <br />    , SUM(p.BB) AS BB <br />    , SUM(p.SO) AS SO <br />    , SUM(p.HBP) AS HBP <br />    , ROUND(SUM(p.IPouts)/3,1) AS IP <br />    , SUM(p.BFP) AS BFP <br />FROM <br />    ( SELECT * from bdb.pitching p <br />    LEFT JOIN ( select playerID AS bplayerID, yearID AS byearID, (b.AB+b.BB+b.SH+b.SF+b.HBP) AS PA <br />        from bdb.batting b WHERE b.yearID > 2004 ) b <br />        ON p.playerID = b.bplayerID AND p.yearID = b.byearID <br />        WHERE p.yearID > 2004) p, bdb.master m <br />WHERE p.playerID = m.playerID <br />    AND ( (p.PA) < (p.BFP) OR p.BFP IS NULL ) <br />    AND (p.BFP) > 0 <br />GROUP BY yearID, playerID; </p> <p>CREATE TABLE average_pitch <br />AS <br />SELECT yearID <br />    , POW(0.999,(2008-p.yearID)*365) AS weight <br />    , SUM(p.G) AS G <br />    , SUM(p.GS) AS GS <br />    , SUM(p.H) AS H <br />    , SUM(p.ER) AS ER <br />    , SUM(p.HR) AS HR <br />    , SUM(p.BB) AS BB <br />    , SUM(p.SO) AS SO <br />    , SUM(p.HBP) AS HBP <br />    , SUM(p.IP) AS IP <br />    , SUM(p.BFP) AS BFP <br />FROM pitching p <br />GROUP BY yearID; </p> <p>CREATE TABLE pitcher_league_average <br />AS <br />SELECT p.playerID <br />    , (SUM(a.weight)*SUM(a.G)*SUM(p.IP)) / (SUM(a.weight)*SUM(p.IP)) AS G <br />    , (SUM(a.weight)*SUM(a.GS)*SUM(p.IP)) / (SUM(a.weight)*SUM(p.IP)) AS GS <br />    , (SUM(a.weight)*SUM(a.H)*SUM(p.IP)) / (SUM(a.weight)*SUM(p.IP)) AS H <br />    , (SUM(a.weight)*SUM(a.ER)*SUM(p.IP)) / (SUM(a.weight)*SUM(p.IP)) AS ER <br />    , (SUM(a.weight)*SUM(a.HR)*SUM(p.IP)) / (SUM(a.weight)*SUM(p.IP)) AS HR <br />    , (SUM(a.weight)*SUM(a.BB)*SUM(p.IP)) / (SUM(a.weight)*SUM(p.IP)) AS BB <br />    , (SUM(a.weight)*SUM(a.SO)*SUM(p.IP)) / (SUM(a.weight)*SUM(p.IP)) AS SO <br />    , (SUM(a.weight)*SUM(a.HBP)*SUM(p.IP)) / (SUM(a.weight)*SUM(p.IP)) AS HBP <br />    , (SUM(a.weight)*SUM(a.BFP)*SUM(p.IP)) / (SUM(a.weight)*SUM(p.IP)) AS BFP <br />    , (SUM(a.weight)*SUM(a.IP)*SUM(p.IP)) / (SUM(a.weight)*SUM(p.IP)) AS IP <br />FROM pitching p, average_pitch a <br />WHERE p.yearID = a.yearID <br />GROUP BY playerID; </p> <p>CREATE TABLE pitcher_league_average_prorated <br />AS <br />SELECT playerID <br />    , ( G / IP * 318 ) AS G <br />    , ( GS / IP * 318 ) AS GS <br />    , ( H / IP * 318 ) AS H <br />    , ( ER / IP * 318 ) AS ER <br />    , ( HR / IP * 318 ) AS HR <br />    , ( BB / IP * 318 ) AS BB <br />    , ( SO / IP * 318 ) AS SO <br />    , ( HBP / IP * 318 ) AS HBP <br />    , ( BFP / IP * 318 ) AS BFP <br />    , 318 AS IP <br />FROM pitcher_league_average; </p> <p>CREATE TABLE player_Age_2008_pitching <br />AS <br />SELECT playerID <br />    , yearID <br />    , MAX(Age) <br />    , ( CASE <br />        WHEN ( MAX(Age)+(2008-yearID) ) > 29 <br />        THEN 1 + ( 29 - ( MAX(Age)+(2008-yearID) ) )*0.003 <br />        ELSE 1 + ( 29 - ( MAX(Age)+(2008-yearID) ) )*0.006 END ) AS Curve <br />FROM pitching <br />GROUP BY playerID; </p> <p>CREATE TABLE player_ip_2008 <br />AS <br />SELECT playerID, G, IP, (IP/110) AS IP_G, ROUND((IP/110)*52*3)/3 AS IP_LEFT from pitching WHERE yearID = 2008; </p> <p>CREATE TABLE pitching_marcels_2008 <br />AS <br />SELECT p.playerID <br />    , ROUND(( SUM(p.weight)*SUM(p.G) + w.G ) / ( SUM(p.weight)*SUM(p.IP) + 318 ) * i.IP_LEFT ) AS G <br />    , ROUND(( SUM(p.weight)*SUM(p.GS) + w.GS ) / ( SUM(p.weight)*SUM(p.IP) + 318 ) * i.IP_LEFT ) AS GS <br />    , ROUND(( SUM(p.weight)*SUM(p.H) + w.H ) / ( SUM(p.weight)*SUM(p.IP) + 318 ) * i.IP_LEFT / c.Curve) AS H <br />    , ROUND(( SUM(p.weight)*SUM(p.ER) + w.ER ) / ( SUM(p.weight)*SUM(p.IP) + 318 ) * i.IP_LEFT / c.Curve) AS ER <br />    , ROUND(( SUM(p.weight)*SUM(p.HR) + w.HR ) / ( SUM(p.weight)*SUM(p.IP) + 318 ) * i.IP_LEFT / c.Curve) AS HR <br />    , ROUND(( SUM(p.weight)*SUM(p.BB) + w.BB ) / ( SUM(p.weight)*SUM(p.IP) + 318 ) * i.IP_LEFT / c.Curve) AS BB <br />    , ROUND(( SUM(p.weight)*SUM(p.SO) + w.SO ) / ( SUM(p.weight)*SUM(p.IP) + 318 ) * i.IP_LEFT * c.Curve) AS SO <br />    , ROUND(( SUM(p.weight)*SUM(p.HBP) + w.HBP ) / ( SUM(p.weight)*SUM(p.IP) + 318 ) * i.IP_LEFT / c.Curve) AS HBP <br />    , ROUND(( SUM(p.weight)*SUM(p.BFP) + w.BFP ) / ( SUM(p.weight)*SUM(p.IP) + 318 ) * i.IP_LEFT ) AS BFP <br />    , i.IP_LEFT AS IP <br />    , (SUM(p.weight)*SUM(p.IP)) / ( SUM(p.weight)*SUM(p.IP) + 318 ) AS R <br />FROM pitching p, pitcher_league_average_prorated w, player_Age_2008_pitching c, player_ip_2008 i <br />WHERE p.playerID = w.playerID <br />    AND p.playerID = i.playerID <br />    AND p.playerID = c.playerID <br />GROUP BY playerID;</p> </blockquote> <p>I’m less confident that I’m implementing Marcels correctly with these, simply because Tango has published less about them. But, outside of any errors I’ve made, the only thing left to do – I think - would be to implement real playing time projections. (I also have to backport the reliability calculation to the hitter projections, but that’s easy enough.) Past that… well, they wouldn’t be Marcels anymore. (Okay, so I should be projecting pitching stats by league as well.)</p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p>Later this weekend my intent is to take a couple of players and actually show you the calculations going on behind this code, so that anyone that’s unsure on exactly what’s going on here gets to see the guts of the system.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com0tag:blogger.com,1999:blog-8801122366476963814.post-80550898385969488872008-07-31T00:35:00.001-07:002008-07-31T00:35:51.485-07:00Marcels for hitters<p>About the simplest forecasting system available - that are worthy of the term “projection system,” at least - are the <a href="http://www.tangotiger.net/marcel/">Marcel projections</a>. As simple as they are, they <a href="http://www.insidethebook.com/ee/index.php/site/comments/community_forecast_2007_preliminary_results/">match up very well</a> with the results of the more complex forecasters.</p> <p>Sal Baxamusa over at The Hardball Times has kindly provided us with <a href="http://www.hardballtimes.com/main/article/is-this-guy-for-real/">Excel spreadsheets to calculate a player’s in-season Marcel</a>. But I wanted the ability to bulk produce forecasts of players for the remainder of the season. So I set about reimplementing Sal’s spreadsheet in SQL.</p> <p>Data prior to this season was taken from the <a href="http://www.baseball-databank.org/">Baseball Databank</a>. Data from this season was screenscraped from the <a href="http://www.baseball-reference.com/leagues/NL_2008_bat.shtml">Baseball Reference league pages</a>. The convenient thing about that is, Baseball Reference player IDs are mapped in the BDB to the BDB IDs.</p> <p>If you have the Baseball Databank in MySQL, and can handle scraping the data from BBRef yourself, then you can generate your own Marcels like so:</p> <blockquote> <p>CREATE TABLE batting_pos <br />AS <br />SELECT ( CASE WHEN b.playerID is not null <br />        THEN b.playerID <br />        ELSE b.Player END ) AS playerID <br />    , 2008 AS yearID <br />    , 1 AS weight <br />    , b.Ag AS Age <br />    , SUM(b.G) AS G <br />    , SUM(b.H) AS H <br />    , SUM(b.2B) AS 2B <br />    , SUM(b.3B) AS 3B <br />    , SUM(b.HR) AS HR <br />    , SUM(b.BB) AS BB <br />    , SUM(b.SO) AS SO <br />    , SUM(b.IBB) AS IBB <br />    , 0 AS HBP <br />    , SUM(b.SB) AS SB <br />    , SUM(b.CS) AS CS <br />    , SUM(b.AB+b.BB+b.SH+b.SF) AS PA <br />from ( SELECT * from 7_29_08_batting b <br />    LEFT JOIN ( select Player AS PPlayer, BFP from 7_29_08_pitching ) p <br />    ON b.Player = p.PPlayer <br />    LEFT JOIN ( select bbrefID, playerID from bdb.master ) m <br />    ON b.Player = m.bbrefID ) b <br />WHERE ( (b.AB+b.BB+b.SH+b.SF) > (b.BFP) OR b.BFP IS NULL ) <br />    AND (b.AB+b.BB+b.SH+b.SF) > 0 <br />GROUP BY Player <br />UNION ALL <br />SELECT b.playerID <br />    , b.yearID <br />    , ( 5.62 * EXP( -0.00066 * 365 * ( 2008 - b.yearID) ) ) / 5.62 AS weight <br />    , (CASE WHEN m.birthMonth < 7 THEN ( b.yearID - m.BirthYear ) ELSE ( b.yearID - m.BirthYear - 1 ) END) AS Age <br />    , SUM(b.G) AS G <br />    , SUM(b.H) AS H <br />    , SUM(b.2B) AS 2B <br />    , SUM(b.3B) AS 3B <br />    , SUM(b.HR) AS HR <br />    , SUM(b.BB) AS BB <br />    , SUM(b.SO) AS SO <br />    , SUM(b.IBB) AS IBB <br />    , SUM(b.HBP) AS HBP <br />    , SUM(b.SB) AS SB <br />    , SUM(b.CS) AS CS <br />    , SUM(b.PA) AS PA <br />FROM <br />    ( SELECT * from bdb.batting b <br />    LEFT JOIN ( select playerID AS pplayerID, yearID AS pyearID, BFP <br />        from bdb.pitching p WHERE p.yearID > 2004 ) p <br />        ON b.playerID = p.pplayerID AND b.yearID = p.pyearID <br />        WHERE b.yearID > 2004) b, bdb.master m <br />WHERE b.playerID = m.playerID <br />    AND ( (b.PA) > (b.BFP) OR b.BFP IS NULL ) <br />    AND (b.PA) > 0 <br />GROUP BY yearID, playerID; </p> <p>CREATE TABLE average_pos <br />AS <br />SELECT yearID <br />    , ( 5.62 * EXP( -0.00066 * 365 * ( 2008 - yearID) ) ) / 5.62 AS weight <br />    , SUM(G) AS G <br />    , SUM(H) AS H <br />    , SUM(2B) AS 2B <br />    , SUM(3B) AS 3B <br />    , SUM(HR) AS HR <br />    , SUM(BB) AS BB <br />    , SUM(SO) AS SO <br />    , SUM(IBB) AS IBB <br />    , SUM(HBP) AS HBP <br />    , SUM(SB) AS SB <br />    , SUM(CS) AS CS <br />    , SUM(PA) AS PA <br />FROM batting_pos bp <br />GROUP BY yearID; </p> <p>CREATE TABLE player_league_average <br />AS <br />SELECT b.playerID <br />    , (SUM(a.weight)*SUM(a.H)*SUM(b.PA)) / (SUM(a.weight)*SUM(b.PA)) AS H <br />    , (SUM(a.weight)*SUM(a.2B)*SUM(b.PA)) / (SUM(a.weight)*SUM(b.PA)) AS 2B <br />    , (SUM(a.weight)*SUM(a.3B)*SUM(b.PA)) / (SUM(a.weight)*SUM(b.PA)) AS 3B <br />    , (SUM(a.weight)*SUM(a.HR)*SUM(b.PA)) / (SUM(a.weight)*SUM(b.PA)) AS HR <br />    , (SUM(a.weight)*SUM(a.BB)*SUM(b.PA)) / (SUM(a.weight)*SUM(b.PA)) AS BB <br />    , (SUM(a.weight)*SUM(a.SO)*SUM(b.PA)) / (SUM(a.weight)*SUM(b.PA)) AS SO <br />    , (SUM(a.weight)*SUM(a.IBB)*SUM(b.PA)) / (SUM(a.weight)*SUM(b.PA)) AS IBB <br />    , (SUM(a.weight)*SUM(a.HBP)*SUM(b.PA)) / (SUM(a.weight)*SUM(b.PA)) AS HBP <br />    , (SUM(a.weight)*SUM(a.SB)*SUM(b.PA)) / (SUM(a.weight)*SUM(b.PA)) AS SB <br />    , (SUM(a.weight)*SUM(a.CS)*SUM(b.PA)) / (SUM(a.weight)*SUM(b.PA)) AS CS <br />    , (SUM(a.weight)*SUM(a.PA)*SUM(b.PA)) / (SUM(a.weight)*SUM(b.PA)) AS PA <br />FROM batting_pos b, average_pos a <br />WHERE b.yearID = a.yearID <br />GROUP BY playerID; </p> <p>CREATE TABLE player_league_average_prorated <br />AS <br />SELECT playerID <br />    , ( H / PA * 214 ) AS H <br />    , ( 2B / PA * 214 ) AS 2B <br />    , ( 3B / PA * 214 ) AS 3B <br />    , ( HR / PA * 214 ) AS HR <br />    , ( BB / PA * 214 ) AS BB <br />    , ( SO / PA * 214 ) AS SO <br />    , ( IBB / PA * 214 ) AS IBB <br />    , ( HBP / PA * 214 ) AS HBP <br />    , ( SB / PA * 214 ) AS SB <br />    , ( CS / PA * 214 ) AS CS <br />    , 214 AS PA <br />FROM player_league_average; </p> <p>CREATE TABLE player_Age_2008 <br />AS <br />SELECT playerID <br />    , yearID <br />    , MAX(Age) <br />    , ( CASE <br />        WHEN ( MAX(Age)+(2008-yearID) ) > 29 <br />        THEN 1 + ( 29 - ( MAX(Age)+(2008-yearID) ) )*0.003 <br />        ELSE 1 + ( 29 - ( MAX(Age)+(2008-yearID) ) )*0.006 END ) AS Curve <br />FROM batting_pos <br />GROUP BY playerID; </p> <p>CREATE TABLE player_pa_2008 <br />AS <br />SELECT playerID, G, PA, (PA/G) AS PA_G, ROUND((PA/G)*52) AS PA_LEFT from batting_pos WHERE yearID = 2008; </p> <p>CREATE TABLE hitter_marcels_2008 <br />AS <br />SELECT b.playerID <br />    , ROUND(( SUM(a.weight)*SUM(b.H) + w.H ) / ( SUM(a.weight)*SUM(b.PA) + 214 ) * p.PA_LEFT * c.Curve) AS H <br />    , ROUND(( SUM(a.weight)*SUM(b.2B) + w.2B ) / ( SUM(a.weight)*SUM(b.PA) + 214 ) * p.PA_LEFT * c.Curve) AS 2B <br />    , ROUND(( SUM(a.weight)*SUM(b.3B) + w.3B ) / ( SUM(a.weight)*SUM(b.PA) + 214 ) * p.PA_LEFT * c.Curve) AS 3B <br />    , ROUND(( SUM(a.weight)*SUM(b.HR) + w.HR ) / ( SUM(a.weight)*SUM(b.PA) + 214 ) * p.PA_LEFT * c.Curve) AS HR <br />    , ROUND(( SUM(a.weight)*SUM(b.BB) + w.BB ) / ( SUM(a.weight)*SUM(b.PA) + 214 ) * p.PA_LEFT * c.Curve) AS BB <br />    , ROUND(( SUM(a.weight)*SUM(b.SO) + w.SO ) / ( SUM(a.weight)*SUM(b.PA) + 214 ) * p.PA_LEFT * c.Curve) AS SO <br />    , ROUND(( SUM(a.weight)*SUM(b.IBB) + w.IBB ) / ( SUM(a.weight)*SUM(b.PA) + 214 ) * p.PA_LEFT * c.Curve) AS IBB <br />    , ROUND(( SUM(a.weight)*SUM(b.HBP) + w.HBP ) / ( SUM(a.weight)*SUM(b.PA) + 214 ) * p.PA_LEFT * c.Curve) AS HBP <br />    , ROUND(( SUM(a.weight)*SUM(b.SB) + w.SB ) / ( SUM(a.weight)*SUM(b.PA) + 214 ) * p.PA_LEFT * c.Curve) AS SB <br />    , ROUND(( SUM(a.weight)*SUM(b.CS) + w.CS ) / ( SUM(a.weight)*SUM(b.PA) + 214 ) * p.PA_LEFT * c.Curve) AS CS <br />    , p.PA_LEFT AS PA <br />FROM batting_pos b, average_pos a, player_league_average_prorated w, player_Age_2008 c, player_pa_2008 p <br />WHERE b.yearID = a.yearID <br />    AND b.playerID = w.playerID <br />    AND b.playerID = p.playerID <br />    AND b.playerID = c.playerID <br />GROUP BY playerID;</p> </blockquote> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p>I hate publishing code because most of what I write would get me hunted down and burned at the stake in any CS department in the country, but there it is. If you don’t grog SQL, here’s what’s going on here:</p> <ol> <li>First we combine the data from the Databank with the data from this year, excluding hitters who have had fewer plate appearances than plate appearances against - in other words, pitchers. The other thing we do is compute a weight for each year – more recent seasons are worth more in the projection, and the weight is what governs that. </li> <li>We calculate the average of non-pitchers hitting from those seasons. </li> <li>Each player is then given a weighted average of the league from those four seasons, prorated out to 214 plate appearances. That’s our regression to the mean component. </li> <li>We figure out an aging curve for each player. </li> <li>We guesstimate how many plate appearances a player will receive the rest of 2008. If you think you know better than my guesstimate, you’re probably right. It’s the rates that I’m projecting. </li> <li>This is the part that actually does a projection – it takes a weighted average of the past four seasons, mixing in the regression component from step three, and prorates it out to our guesstimated playing time. </li> </ol> <p>If you don’t want to put in that amount of effort, well, Mr. Widget, take us home!</p> <iframe title="An EditGrid spreadsheet created by user/cwyers" style="border-right: #666 9px solid; border-top: #666 9px solid; border-left: #666 9px solid; width: 100%; border-bottom: #666 9px solid; height: 380px" src="http://www.editgrid.com/publish/html_book/user/cwyers/hitter_marcels_inseason_2008?bgcolor=%23ffffff&fgcolor=%23000000&version=2&frame_style=border%3A9px%20solid%20%23666%3Bheight%3A380px%3Bwidth%3A100%25" frameborder="0" longdesc="http://www.editgrid.com/user/cwyers/hitter_marcels_inseason_2008"> </iframe> <p>I've done a cursory look at the projections, but I haven't done any serious validation. All projections are provided as-is. The good news is, now that I have the code ironed out, all I need is half an hour or so and I can generate Marcels for over 500 players. (Quicker if I could automate the screenscraping – I’ll have to look into that.)</p> <p>I’m hopeful but not optimistic that I’ll have pitcher projections done similarly by the weekend.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com0tag:blogger.com,1999:blog-8801122366476963814.post-84606768849714570502008-07-25T23:40:00.001-07:002008-07-25T23:40:03.552-07:00Jeff Samardzija in Pitch F/X<p>If you want actual, well, <em>good</em> analysis, <a href="http://www.cubsfx.com/2008/07/jeff-samardzija-debut-pitchfx.html">go over to Harry’s and take a look</a>. He’s been doing this pitch ID stuff a lot longer than I have.</p> <p>But I think I was able to duplicate one of the graphs from Harry’s page, or at least come close.</p> <p>I used <a href="http://kuiper.alal.com/~mkovach/glb-0.0prealpha.zip">Mat Kovach’s parser</a> to download data from MLB’s servers. (It seems to work fine for me, but it’s “pre-alpha” and not documented as of yet, so caveat emptor. Also, I Am Not A Programmer, so all code samples that follow are to be taken with more than a hint of salt.)</p> <p>Then, in MySQL, I ran the following query against the data:</p> <blockquote> <p>SELECT a.*, p.* <br />FROM gameday_atbat a, gameday_pitch p <br />WHERE a.gameid = p.gameid <br />    AND a.num = p.atbat_num <br />    AND a.pitcher = 502188;</p> </blockquote> <p>Not the prettiest SQL I’ve ever written, and it returns more data than I need, but that’s fine. Then I export the data to a CSV file. There’s one pitch out in the dataset that I remove.</p> <p>Well, now what? I use <a href="http://www.r-project.org/">GNU R</a>, personally, for all my graphing and K-means clustering needs. Code:</p> <blockquote> <p>Samardzija <- read.table("C:/Retrosheet/saved queries/pitchfx/Samardzija first start.csv", header=TRUE, sep=",") <br />cl <- KMeans(model.matrix(~-1 + pfx_x + pfx_z, Samardzija), centers = 3, iter.max = 10, num.seeds = 10) <br />plot(Samardzija$pfx_z~Samardzija$pfx_x, col=cl$cluster, xlim=c(-20,20), ylim=c(-20,20))</p> </blockquote> <p>Which produces the following graph:</p> <p><a href="http://lh4.ggpht.com/pontifexexmachina/SIrGwGXB7FI/AAAAAAAAAPE/GkbeBJ9LQDA/s1600-h/samardzija_072508%5B3%5D.png"><img title="samardzija_072508" style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="578" alt="samardzija_072508" src="http://lh5.ggpht.com/pontifexexmachina/SIrGwpssMVI/AAAAAAAAAPI/xZg2rNzq6Cw/samardzija_072508_thumb%5B1%5D.png?imgmax=800" width="579" border="0" /></a> </p> <p>In fairness to Harry, I cheated – in the second line of the program, I tell the clustering algorithm how many “center” to look for – in this case, how many pitches I want it to look for. I told it three. Why? Because that’s what Harry’s graph shows. I don’t really know how to determine the “right” number of centers as of yet.</p> <p>Even so, I have one pitch that differs from his – I think he changed that ID manually, but I’m not sure. I can tell you that one cluster is green and one is black, but as far as calling one a splitter and one a slider, that’s something I have to work on.</p> <p>(That graph, by the way, is ugly, and I know it’s ugly. I know I can make it look better, but in this case it’s a question of how much time I really want to invest in prettying up Pitch F/X graphs before I figure out what it is I’m actually doing with them. It’s called premature optimization.)</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com4tag:blogger.com,1999:blog-8801122366476963814.post-63250886339310445742008-07-24T23:21:00.001-07:002008-07-24T23:21:17.207-07:00Projecting RZR<p>There are two breeds of vanilla, free-as-in-beer zone rating available in the world: STATS and BIS. I already have <a href="http://otherfifteen.blogspot.com/2008/07/projecting-zone-rating.html">a dumb projection system for STATS ZR</a>, which could be refined (aging curves and speed/tools scores are the two major refinements I’m musing over.)</p> <p>But first I wanted to introduce BIS’s RZR into it. And therein lies a dilemma, folks. Here’s the averages for RZR and OZR (OOZ divided by BIZ) over the years available at The Hardball Times:</p> <table cellspacing="0" cellpadding="0"><colgroup><col width="64" /></colgroup><tbody> <tr bgcolor="#c0504d" height="20"> <td width="64" height="20"> <div align="center"><strong>POS</strong></div> </td> <td width="64"> <div align="center"><strong>YEAR</strong></div> </td> <td width="64"> <div align="center"><strong>Plays</strong></div> </td> <td width="64"> <div align="center"><strong>OOZ</strong></div> </td> <td width="64"> <div align="center"><strong>BIZ</strong></div> </td> <td width="64"> <div align="center"><strong>RZR</strong></div> </td> <td width="64"> <div align="center"><strong>OZR</strong></div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">1B</div> </td> <td> <div align="center">2004</div> </td> <td> <div align="center">4070</div> </td> <td> <div align="center">1783</div> </td> <td> <div align="center">5406</div> </td> <td> <div align="center">.753</div> </td> <td> <div align="center">.330</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">1B</div> </td> <td> <div align="center">2005</div> </td> <td> <div align="center">4343</div> </td> <td> <div align="center">1940</div> </td> <td> <div align="center">5493</div> </td> <td> <div align="center">.791</div> </td> <td> <div align="center">.353</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">1B</div> </td> <td> <div align="center">2006</div> </td> <td> <div align="center">3877</div> </td> <td> <div align="center">2012</div> </td> <td> <div align="center">4851</div> </td> <td> <div align="center">.799</div> </td> <td> <div align="center">.415</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">1B</div> </td> <td> <div align="center">2007</div> </td> <td> <div align="center">4963</div> </td> <td> <div align="center">1048</div> </td> <td> <div align="center">6695</div> </td> <td> <div align="center">.741</div> </td> <td> <div align="center">.157</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">1B</div> </td> <td> <div align="center">2008</div> </td> <td> <div align="center">2871</div> </td> <td> <div align="center">847</div> </td> <td> <div align="center">3815</div> </td> <td> <div align="center">.753</div> </td> <td> <div align="center">.222</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center"><strong>1B</strong></div> </td> <td> <div align="center"><strong>Total</strong></div> </td> <td> <div align="center"><strong>20124</strong></div> </td> <td> <div align="center"><strong>7630</strong></div> </td> <td> <div align="center"><strong>26260</strong></div> </td> <td> <div align="center"><strong>.766</strong></div> </td> <td> <div align="center"><strong>.291</strong></div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">2B</div> </td> <td> <div align="center">2004</div> </td> <td> <div align="center">9863</div> </td> <td> <div align="center">1203</div> </td> <td> <div align="center">12129</div> </td> <td> <div align="center">.813</div> </td> <td> <div align="center">.099</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">2B</div> </td> <td> <div align="center">2005</div> </td> <td> <div align="center">10403</div> </td> <td> <div align="center">1478</div> </td> <td> <div align="center">12825</div> </td> <td> <div align="center">.811</div> </td> <td> <div align="center">.115</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">2B</div> </td> <td> <div align="center">2006</div> </td> <td> <div align="center">10401</div> </td> <td> <div align="center">1211</div> </td> <td> <div align="center">12679</div> </td> <td> <div align="center">.820</div> </td> <td> <div align="center">.096</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">2B</div> </td> <td> <div align="center">2007</div> </td> <td> <div align="center">10120</div> </td> <td> <div align="center">1412</div> </td> <td> <div align="center">12192</div> </td> <td> <div align="center">.830</div> </td> <td> <div align="center">.116</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">2B</div> </td> <td> <div align="center">2008</div> </td> <td> <div align="center">6313</div> </td> <td> <div align="center">649</div> </td> <td> <div align="center">7693</div> </td> <td> <div align="center">.821</div> </td> <td> <div align="center">.084</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center"><strong>2B</strong></div> </td> <td> <div align="center"><strong>Total</strong></div> </td> <td> <div align="center"><strong>47100</strong></div> </td> <td> <div align="center"><strong>5953</strong></div> </td> <td> <div align="center"><strong>57518</strong></div> </td> <td> <div align="center"><strong>.819</strong></div> </td> <td> <div align="center"><strong>.103</strong></div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">SS</div> </td> <td> <div align="center">2004</div> </td> <td> <div align="center">9872</div> </td> <td> <div align="center">1919</div> </td> <td> <div align="center">11995</div> </td> <td> <div align="center">.823</div> </td> <td> <div align="center">.160</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">SS</div> </td> <td> <div align="center">2005</div> </td> <td> <div align="center">10484</div> </td> <td> <div align="center">1948</div> </td> <td> <div align="center">12821</div> </td> <td> <div align="center">.818</div> </td> <td> <div align="center">.152</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">SS</div> </td> <td> <div align="center">2006</div> </td> <td> <div align="center">10809</div> </td> <td> <div align="center">1659</div> </td> <td> <div align="center">13218</div> </td> <td> <div align="center">.818</div> </td> <td> <div align="center">.126</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">SS</div> </td> <td> <div align="center">2007</div> </td> <td> <div align="center">10625</div> </td> <td> <div align="center">1912</div> </td> <td> <div align="center">13019</div> </td> <td> <div align="center">.816</div> </td> <td> <div align="center">.147</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">SS</div> </td> <td> <div align="center">2008</div> </td> <td> <div align="center">6353</div> </td> <td> <div align="center">999</div> </td> <td> <div align="center">7627</div> </td> <td> <div align="center">.833</div> </td> <td> <div align="center">.131</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center"><strong>SS</strong></div> </td> <td> <div align="center"><strong>Total</strong></div> </td> <td> <div align="center"><strong>48143</strong></div> </td> <td> <div align="center"><strong>8437</strong></div> </td> <td> <div align="center"><strong>58680</strong></div> </td> <td> <div align="center"><strong>.820</strong></div> </td> <td> <div align="center"><strong>.144</strong></div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">3B</div> </td> <td> <div align="center">2004</div> </td> <td> <div align="center">6215</div> </td> <td> <div align="center">2074</div> </td> <td> <div align="center">9007</div> </td> <td> <div align="center">.690</div> </td> <td> <div align="center">.230</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">3B</div> </td> <td> <div align="center">2005</div> </td> <td> <div align="center">6813</div> </td> <td> <div align="center">2396</div> </td> <td> <div align="center">9271</div> </td> <td> <div align="center">.735</div> </td> <td> <div align="center">.258</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">3B</div> </td> <td> <div align="center">2006</div> </td> <td> <div align="center">7686</div> </td> <td> <div align="center">1636</div> </td> <td> <div align="center">10880</div> </td> <td> <div align="center">.706</div> </td> <td> <div align="center">.150</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">3B</div> </td> <td> <div align="center">2007</div> </td> <td> <div align="center">7221</div> </td> <td> <div align="center">1717</div> </td> <td> <div align="center">10623</div> </td> <td> <div align="center">.680</div> </td> <td> <div align="center">.162</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">3B</div> </td> <td> <div align="center">2008</div> </td> <td> <div align="center">4444</div> </td> <td> <div align="center">1003</div> </td> <td> <div align="center">6344</div> </td> <td> <div align="center">.701</div> </td> <td> <div align="center">.158</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center"><strong>3B</strong></div> </td> <td> <div align="center"><strong>Total</strong></div> </td> <td> <div align="center"><strong>32379</strong></div> </td> <td> <div align="center"><strong>8826</strong></div> </td> <td> <div align="center"><strong>46125</strong></div> </td> <td> <div align="center"><strong>.702</strong></div> </td> <td> <div align="center"><strong>.191</strong></div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">CF</div> </td> <td> <div align="center">2004</div> </td> <td> <div align="center">9478</div> </td> <td> <div align="center">2034</div> </td> <td> <div align="center">11905</div> </td> <td> <div align="center">.796</div> </td> <td> <div align="center">.171</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">CF</div> </td> <td> <div align="center">2005</div> </td> <td> <div align="center">10266</div> </td> <td> <div align="center">1963</div> </td> <td> <div align="center">12590</div> </td> <td> <div align="center">.815</div> </td> <td> <div align="center">.156</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">CF</div> </td> <td> <div align="center">2006</div> </td> <td> <div align="center">10316</div> </td> <td> <div align="center">2002</div> </td> <td> <div align="center">11534</div> </td> <td> <div align="center">.894</div> </td> <td> <div align="center">.174</div> </td> </tr> <tr height="20"> <td bgcolor="#f2dddc" height="20"> <div align="center">CF</div> </td> <td bgcolor="#f2dddc"> <div align="center">2007</div> </td> <td bgcolor="#f2dddc"> <div align="center">10886</div> </td> <td bgcolor="#f2dddc"> <div align="center">1944</div> </td> <td bgcolor="#f2dddc"> <div align="center">12264</div> </td> <td bgcolor="#f2dddc"> <div align="center">.888</div> </td> <td bgcolor="#f2dddc"> <div align="center">.159</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">CF</div> </td> <td> <div align="center">2008</div> </td> <td> <div align="center">5922</div> </td> <td> <div align="center">1583</div> </td> <td> <div align="center">6468</div> </td> <td> <div align="center">.916</div> </td> <td> <div align="center">.245</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center"><strong>CF</strong></div> </td> <td> <div align="center"><strong>Total</strong></div> </td> <td> <div align="center"><strong>46868</strong></div> </td> <td> <div align="center"><strong>9526</strong></div> </td> <td> <div align="center"><strong>54761</strong></div> </td> <td> <div align="center"><strong>.856</strong></div> </td> <td> <div align="center"><strong>.174</strong></div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">LF</div> </td> <td> <div align="center">2004</div> </td> <td> <div align="center">7710</div> </td> <td> <div align="center">847</div> </td> <td> <div align="center">12242</div> </td> <td> <div align="center">.630</div> </td> <td> <div align="center">.069</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">LF</div> </td> <td> <div align="center">2005</div> </td> <td> <div align="center">8686</div> </td> <td> <div align="center">718</div> </td> <td> <div align="center">13712</div> </td> <td> <div align="center">.633</div> </td> <td> <div align="center">.052</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">LF</div> </td> <td> <div align="center">2006</div> </td> <td> <div align="center">7723</div> </td> <td> <div align="center">1634</div> </td> <td> <div align="center">8971</div> </td> <td> <div align="center">.861</div> </td> <td> <div align="center">.182</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">LF</div> </td> <td> <div align="center">2007</div> </td> <td> <div align="center">8014</div> </td> <td> <div align="center">1614</div> </td> <td> <div align="center">9373</div> </td> <td> <div align="center">.855</div> </td> <td> <div align="center">.172</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">LF</div> </td> <td> <div align="center">2008</div> </td> <td> <div align="center">4475</div> </td> <td> <div align="center">1076</div> </td> <td> <div align="center">5060</div> </td> <td> <div align="center">.884</div> </td> <td> <div align="center">.213</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center"><strong>LF</strong></div> </td> <td> <div align="center"><strong>Total</strong></div> </td> <td> <div align="center"><strong>36608</strong></div> </td> <td> <div align="center"><strong>5889</strong></div> </td> <td> <div align="center"><strong>49358</strong></div> </td> <td> <div align="center"><strong>.742</strong></div> </td> <td> <div align="center"><strong>.119</strong></div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">RF</div> </td> <td> <div align="center">2004</div> </td> <td> <div align="center">8736</div> </td> <td> <div align="center">781</div> </td> <td> <div align="center">13442</div> </td> <td> <div align="center">.650</div> </td> <td> <div align="center">.058</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">RF</div> </td> <td> <div align="center">2005</div> </td> <td> <div align="center">9181</div> </td> <td> <div align="center">695</div> </td> <td> <div align="center">14161</div> </td> <td> <div align="center">.648</div> </td> <td> <div align="center">.049</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">RF</div> </td> <td> <div align="center">2006</div> </td> <td> <div align="center">8376</div> </td> <td> <div align="center">1686</div> </td> <td> <div align="center">9436</div> </td> <td> <div align="center">.888</div> </td> <td> <div align="center">.179</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center">RF</div> </td> <td> <div align="center">2007</div> </td> <td> <div align="center">8418</div> </td> <td> <div align="center">1575</div> </td> <td> <div align="center">9597</div> </td> <td> <div align="center">.877</div> </td> <td> <div align="center">.164</div> </td> </tr> <tr height="20"> <td height="20"> <div align="center">RF</div> </td> <td> <div align="center">2008</div> </td> <td> <div align="center">4802</div> </td> <td> <div align="center">1205</div> </td> <td> <div align="center">5321</div> </td> <td> <div align="center">.902</div> </td> <td> <div align="center">.226</div> </td> </tr> <tr bgcolor="#f2dddc" height="20"> <td height="20"> <div align="center"><strong>RF</strong></div> </td> <td> <div align="center"><strong>Total</strong></div> </td> <td> <div align="center"><strong>39513</strong></div> </td> <td> <div align="center"><strong>5942</strong></div> </td> <td> <div align="center"><strong>51957</strong></div> </td> <td> <div align="center"><strong>.760</strong></div> </td> <td> <div align="center"><strong>.114</strong></div> </td> </tr> </tbody></table> <p>(2008 numbers will be slightly different from <a href="http://www.hardballtimes.com/main/blog_article/rzr-averages/">Studes’ numbers</a>, as these are a few days old.) The projections for infielders are doable. But, as it stands, those outfield numbers are a horror show, taken by themselves. </p> <p></p> <p></p> <p>So before we can make projections based upon RZR data, we first need to normalize it. I’m sure there are better ways than the one I’m using, but I don’t think I’m using the worst way either and it’s very expedient for my needs.</p> <p>What I’m doing is dividing Plays, OOZ and BIZ by the totals for that season, and then multiplying by the averaged totals of all five years.</p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p>And, since I was rather short with the explanation the last time out, I’ll go ahead and spell out what I’m doing in full:</p> <ol> <li>First, as above, every player’s performance is “normalized” to an average of the past five seasons.</li> <li>Then, a weighted average of their past four seasons (05-08) is taken, with the most recent season being given a weight of 5, then 4, then 3, then 2.</li> <li>Two weights worth of a full season’s average defensive performance of the season is added as a regression to the mean component.</li> <li>5 + 4 + 3 + 2 + 2 = 16, so everything gets divided by 16. I wouldn’t exactly call it a playing time projection, but it’s a rough guide to how much playing time a player might be expected to receive.</li> <li>Plays and Runs above average are figured <em>for a full season’s performance,</em> given the number of chances of the average player at that position from 04 through 08.</li> </ol> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p>And… <a href="http://www.editgrid.com/user/cwyers/2008_asb_rzr_projection">here are the projections</a>. You can compare them to the <a href="http://www.editgrid.com/user/cwyers/2008_ASB_ZR_Projections">STATS ZR projections</a>, if you’d like.</p> <p>(Note: Currently only players with a Baseball Databank ID who have appeared in 2008 are included in either projection set. The next step is to take the rest of the players in the RZR set, map them to the appropriate STATS ID, and run both projections side by side for all players who played in 2008, and maybe some who haven’t yet but could.)</p> <p></p> <p></p> <p>So what’s next? Like I said before, these could really benefit from aging curves. (While I’m on the topic, Jon Shepherd over at Camden Depot has published <a href="http://camdendepot.com/analysis_infield_age_curves.html">RZR aging curves</a> which are worth taking a look at. I have my own <a href="http://www.editgrid.com/user/cwyers/zr_age_curve_second_pass">ZR aging curves</a> which I should really try and get straightened out.) I really should probably run “projections” for seasons past and see how they match up with what actually happened.</p> <p>And I want to work on combining data from multiple positions; I’ve done some <a href="http://www.editgrid.com/user/cwyers/moving_between_positions">comparisons of players who have played multiple positions</a>, and my feeling from looking at the data is that in projecting a player’s zone rating, there really isn’t a lot of difference in difficulty in playing the different outfield positions – it’s not really much harder to catch fly balls in center field than it is anywhere else, but there’s a lot more fly balls to catch and so a good fielder is worth a lot more. But that’s worth exploring more, and there are some noteworthy sampling issues in that data; I find it hard to believe that a center fielder is below average as a first baseman defensively, for example. I should rerun this query on the RZR dataset here soon, see what that looks like.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com0tag:blogger.com,1999:blog-8801122366476963814.post-81083973990701464802008-07-17T23:53:00.001-07:002008-07-17T23:53:39.950-07:00Projecting zone rating<p>So, you want to talk about a player’s defense? </p> <p>Remember: a good sabermetrician is like a good hunter when cleaning his kill: he throws away as little as possible, taking care to use most of the animal. We have decades of information about players; why should we ever use only three and a half months worth of data in evaluating a player?</p> <p>My process is based heavily off of Tango’s <a href="http://www.tangotiger.net/archives/stud0346.shtml">Marcels</a> forecasting system; that said, he had nothing to do with this, and screwups in it are mine, not his. (For background on how a projection system works, <a href="http://otherfifteen.blogspot.com/2008/02/how-projection-systems-work-part-i.html">here’s a decent writeup</a>. If I don’t say so myself.)</p> <p>Before going any further, I should note that I made this in about two hours. And I also made dinner in those two hours. And I had a side dish. So don’t expect anything on the order of PECOTA as far as complexity goes.</p> <p>Here’s how it works. Every player’s zone rating data from 2005-2008 (yep, everything pre-All Star break from this year) is thrown into a mixer and weighted. I used a 5/4/3/2 weighting; I have no empirical basis for these weights other than it’s what Marcel uses. Then throw in two season’s worth of the league average for the position. There’s your regression to the mean.</p> <p>Aging curves are… forthcoming. Maybe. I’m still hashing out the details. (I’ve started work on <a href="http://www.editgrid.com/user/cwyers/zr_age_curve_second_pass">zone rating based aging curves for fielders</a>, but there are questions about how accurate they are, and before they can be used in a projection system they need to be smoothed out a bit more.)</p> <p>So, <a href="http://www.editgrid.com/user/cwyers/2008_ASB_ZR_Projections">data</a>. Plays and runs above or below average are figured using <a href="http://www.baseballthinkfactory.org/files/dialed_in/discussion/dr_strangeglove_or_how_i_learned_to_stop_worrying_and_love_zone_rating1/">the Dial method</a>. For that, each player is assumed to have a full season’s worth of chances at the position, not the number of chances used to compute zone rating.</p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p></p> <p>The next step beyond aging curves would probably be to incorporate at least <a href="http://www.hardballtimes.com/main/article/speed-and-defense/">some measure of speed scores</a> into the projection. But I was hungry, and so instead you have the best projection system I could make in two hours, while still making dinner. It’s a start, at least.</p> <p>(Also, lemme take this chance to plug my <a href="http://www.goatriders.org/taking-stock-at-the-break-part-i">hitter</a> and <a href="http://www.goatriders.org/taking-stock-part-ii">pitcher</a> evaluations on GROTA, if you have an interest in such things regarding the Cubs. Hitter and pitcher projections are next on my plate.)</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com3tag:blogger.com,1999:blog-8801122366476963814.post-39229173988822271242008-07-11T22:23:00.001-07:002008-07-11T22:23:23.383-07:00A little experiment<p>I'm going to try a little experiment here. I wish my problem was writer's block - instead, there's just so many ideas that I've got running around in my head that I'm having a hard time prioritizing them. And with the All-Star Break coming up, there'll be time to catch up on some things.</p> <p>So, if you'd like, you can help me figure out what I'll be writing about during the All-Star Break, with this handy widget here:</p> <div><iframe src="http://skribit.com/widget.html?blogid=1aad6d04b9e2eb8c88625e69350d30f3" frameborder="0" width="220" scrolling="no" height="330"></iframe><center><a href="http://skribit.com">Skribit: Social Suggestions</a></center></div> <p>Isn't it exciting? (It’s also available in the right-hand sidebar.) <a href="http://skribit.com/blogs/other-fifteen">Here's a link to my Skribit page</a>, if for some reason the widget doesn't do it for you. I may start seeding it with some topics of my own, but for right now I’m simply interested in seeing what – if anything – y’all are interested in. I'd like, obviously, as much participation as possible; that said, asking things like "Why do you hate Christmas, Santa Clause and Ryan Theriot?" aren't likely to be answered immediately. In fact, non-Ryan Theriot suggestions are probably your best bet.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com0tag:blogger.com,1999:blog-8801122366476963814.post-78817878419795177252008-06-22T14:15:00.001-07:002008-06-24T21:31:03.646-07:00What’s a shortstop made of?<p><em><strong>UPDATE:</strong> Please to disregard this for now. I've discovered an issue with the IDs in the zone rating database. I'm working on fixing the issue, but until then, this is fraught with issues. My sincerest apologies for the error.</em></p><p>Tango’s <a href="http://www.tangotiger.net/scouting/">Fan Scouting Report</a> tells us something that we can’t tell from the stats alone – what physical tools and skills a player has. But does that tell us anything meaningful? I hooked up the Fan Scouting Report to a database of STATS, Inc. zone rating. (Shamefully, I eliminated all players with fewer than 30 in zone chances.)</p><p>I want to note right off the bat that I'm puttering here, just poking the data with a stick. Do not take this as gospel.</p><p>Here's a set of scatterplots between each of the individual tools assessed and zone rating:</p><p><a href="http://lh3.ggpht.com/pontifexexmachina/SGCgbokYT9I/AAAAAAAAAOU/KozHlPS2ltQ/s1600-h/shortstop_zr_fsr%5B9%5D.png"><img title="shortstop_zr_fsr" style="BORDER-TOP-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-RIGHT-WIDTH: 0px" height="578" alt="shortstop_zr_fsr" src="http://lh5.ggpht.com/pontifexexmachina/SGCgcRMHvhI/AAAAAAAAAOY/kELPYg-PPyc/shortstop_zr_fsr_thumb%5B5%5D.png?imgmax=800" width="579" border="0" /></a> </p><p>I’ll also present the correlations – the dotted lines, if you were wondering – as a table:</p><table cellspacing="3" cellpadding="3"><tbody><tr bgcolor="#ff3333" height="20"><td width="89" height="20"><div align="center"></div></td><td width="89"><div align="center"><strong>OVERALL</strong></div></td><td width="89"><div align="center"><strong>INSTINCTS</strong></div></td><td width="89"><div align="center"><strong>FIRSTSTEP</strong></div></td><td width="89"><div align="center"><strong>SPEED</strong></div></td><td width="89"><div align="center"><strong>HANDS</strong></div></td><td width="89"><div align="center"><strong>RELEASE</strong></div></td><td width="89"><div align="center"><strong>ACCURACY</strong></div></td><td width="89"><div align="center"><strong>STRENGTH</strong></div></td><td width="89"><div align="center"><strong>ZR</strong></div></td></tr><tr height="20"><td bgcolor="#ff3333" height="20"><div align="center"><strong>OVERALL</strong></div></td><td bgcolor="#ff9933"><div align="center">1.000</div></td><td><div align="center">0.953</div></td><td><div align="center">0.898</div></td><td><div align="center">0.706</div></td><td><div align="center">0.885</div></td><td><div align="center">0.905</div></td><td><div align="center">0.817</div></td><td><div align="center">0.651</div></td><td><div align="center">0.280</div></td></tr><tr height="20"><td bgcolor="#ff3333" height="20"><div align="center"><strong>INSTINCTS</strong></div></td><td><div align="center">0.953</div></td><td bgcolor="#ff9933"><div align="center">1.000</div></td><td><div align="center">0.856</div></td><td><div align="center">0.626</div></td><td><div align="center">0.838</div></td><td><div align="center">0.850</div></td><td><div align="center">0.741</div></td><td><div align="center">0.580</div></td><td><div align="center">0.288</div></td></tr><tr height="20"><td bgcolor="#ff3333" height="20"><div align="center"><strong>FIRSTSTEP</strong></div></td><td><div align="center">0.898</div></td><td><div align="center">0.856</div></td><td bgcolor="#ff9933"><div align="center">1.000</div></td><td><div align="center">0.846</div></td><td><div align="center">0.660</div></td><td><div align="center">0.674</div></td><td><div align="center">0.551</div></td><td><div align="center">0.576</div></td><td><div align="center">0.250</div></td></tr><tr height="20"><td bgcolor="#ff3333" height="20"><div align="center"><strong>SPEED</strong></div></td><td><div align="center">0.706</div></td><td><div align="center">0.626</div></td><td><div align="center">0.846</div></td><td bgcolor="#ff9933"><div align="center">1.000</div></td><td><div align="center">0.402</div></td><td><div align="center">0.424</div></td><td><div align="center">0.306</div></td><td><div align="center">0.522</div></td><td><div align="center">0.137</div></td></tr><tr height="20"><td bgcolor="#ff3333" height="20"><div align="center"><strong>HANDS</strong></div></td><td><div align="center">0.885</div></td><td><div align="center">0.838</div></td><td><div align="center">0.660</div></td><td><div align="center">0.402</div></td><td bgcolor="#ff9933"><div align="center">1.000</div></td><td><div align="center">0.931</div></td><td><div align="center">0.869</div></td><td><div align="center">0.411</div></td><td><div align="center">0.286</div></td></tr><tr height="20"><td bgcolor="#ff3333" height="20"><div align="center"><strong>RELEASE</strong></div></td><td><div align="center">0.905</div></td><td><div align="center">0.850</div></td><td><div align="center">0.674</div></td><td><div align="center">0.424</div></td><td><div align="center">0.931</div></td><td bgcolor="#ff9933"><div align="center">1.000</div></td><td><div align="center">0.920</div></td><td><div align="center">0.468</div></td><td><div align="center">0.299</div></td></tr><tr height="20"><td bgcolor="#ff3333" height="20"><div align="center"><strong>ACCURACY</strong></div></td><td><div align="center">0.817</div></td><td><div align="center">0.741</div></td><td><div align="center">0.551</div></td><td><div align="center">0.306</div></td><td><div align="center">0.869</div></td><td><div align="center">0.920</div></td><td bgcolor="#ff9933"><div align="center">1.000</div></td><td><div align="center">0.424</div></td><td><div align="center">0.289</div></td></tr><tr height="20"><td bgcolor="#ff3333" height="20"><div align="center"><strong>STRENGTH</strong></div></td><td><div align="center">0.651</div></td><td><div align="center">0.580</div></td><td><div align="center">0.576</div></td><td><div align="center">0.522</div></td><td><div align="center">0.411</div></td><td><div align="center">0.468</div></td><td><div align="center">0.424</div></td><td bgcolor="#ff9933"><div align="center">1.000</div></td><td><div align="center">0.024</div></td></tr><tr height="20"><td bgcolor="#ff3333" height="20"><div align="center"><strong>ZR</strong></div></td><td><div align="center">0.280</div></td><td><div align="center">0.288</div></td><td><div align="center">0.250</div></td><td><div align="center">0.137</div></td><td><div align="center">0.286</div></td><td><div align="center">0.299</div></td><td><div align="center">0.289</div></td><td><div align="center">0.024</div></td><td bgcolor="#ff9933"><div align="center">1.000</div></td></tr></tbody></table><p>Arm strength doesn’t seem to correlate well with zone rating – in fact it has the lowest correlation of any of the tools - which surprised me at first. It doesn’t make sense that having a better arm doesn’t make you a better shortstop. But before you go throwing out the analysis as piece of lunacy, give me a second.</p><p>In the center is a histogram of each graph. We’ll go ahead and zoom in for a detail. First, overall (average of all tools):</p><p><a href="http://lh5.ggpht.com/pontifexexmachina/SGCgc9Dhx5I/AAAAAAAAAOc/GMFA0RKgXtk/s1600-h/OVERALL_HISTOGRAM%5B3%5D.png"><img title="OVERALL_HISTOGRAM" style="BORDER-RIGHT: 0px; BORDER-TOP: 0px; BORDER-LEFT: 0px; BORDER-BOTTOM: 0px" height="578" alt="OVERALL_HISTOGRAM" src="http://lh3.ggpht.com/pontifexexmachina/SGCgdOQuOvI/AAAAAAAAAOg/vWf0rtCNTog/OVERALL_HISTOGRAM_thumb%5B1%5D.png?imgmax=800" width="579" border="0" /></a> </p><p>It tilts just a shade to the right, but for our sample it seems to do a pretty good job of fitting to a standard “bell curve” shape. Now, look at the histogram for arm strength:</p><p><a href="http://lh3.ggpht.com/pontifexexmachina/SGCgdrnoG1I/AAAAAAAAAOk/YZLbgpFeS0I/s1600-h/STRENGTH_HISTOGRAM%5B3%5D.png"><img title="STRENGTH_HISTOGRAM" style="BORDER-RIGHT: 0px; BORDER-TOP: 0px; BORDER-LEFT: 0px; BORDER-BOTTOM: 0px" height="578" alt="STRENGTH_HISTOGRAM" src="http://lh4.ggpht.com/pontifexexmachina/SGCgd7BMdkI/AAAAAAAAAOo/RLQqfIiufzc/STRENGTH_HISTOGRAM_thumb%5B1%5D.png?imgmax=800" width="579" border="0" /></a> </p><p>See how it bunches together? (Unfortunately, R takes the number of groups in a histogram as a suggestion rather than as a hard-and-fast rule, or the graphs would be a little easier to compare.)</p><p>If we were to look at how well all baseball players did defensively at shortstop, we’d likely find that arm strength matters a great deal. But looking at players selected to play shortstop, it’s a different story; anyone who doesn’t have the arm for the position has been weeded out, and knowing a player’s arm strength gives us little additional information.</p><p>Of course, we’re barely scratching the surface with this. More reading is available <a href="http://www.insidethebook.com/ee/index.php/site/comments/2007_fans_scouting_report_results/">here</a> and <a href="http://www.insidethebook.com/ee/index.php/site/comments/how_reliable_are_fans/">here</a>, for starters.</p>Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com0tag:blogger.com,1999:blog-8801122366476963814.post-21415728010640381792008-06-20T10:54:00.001-07:002008-06-24T21:30:43.518-07:00How good of a shortstop is a second baseman?<p><em><strong>UPDATE:</strong> Please to disregard this for now. I've discovered an issue with the IDs in the zone rating database. I'm working on fixing the issue, but until then, this is fraught with issues. My sincerest apologies for the error.</em></p><p>I’ve been doing a lot of thinking about <a href="http://www.insidethebook.com/ee/index.php/site/comments/even_more_about_replacement_level/">replacement level and positional adjustments</a> recently; about the only thing I learned from that was that Gary Gaetti was the biggest free agent bargain of the past 20 years. That’s not very useful, I know.</p><p>But it got me thinking about the relative value of different positions. Consider this table, <a href="http://www.baseballthinkfactory.org/files/dialed_in/discussion/dr_strangeglove_or_how_i_learned_to_stop_worrying_and_love_zone_rating1/">based off of Chris Dial’s excellent work</a>:</p><table cellspacing="0" cellpadding="0"><colgroup><col span="span" width="64"></colgroup><tbody><tr height="20"><td width="64" height="20">POS</td><td width="64">LG_ZR</td><td width="64">ZR_CH</td><td width="64">ZR_RUNS</td></tr><tr height="20"><td height="20">1B</td><td align="right">.844</td><td align="right">281</td><td align="right">.798</td></tr><tr height="20"><td height="20">2B</td><td align="right">.825</td><td align="right">507</td><td align="right">.754</td></tr><tr height="20"><td height="20">3B</td><td align="right">.753</td><td align="right">430</td><td align="right">.800</td></tr><tr height="20"><td height="20">SS</td><td align="right">.843</td><td align="right">532</td><td align="right">.753</td></tr><tr height="20"><td height="20">LF</td><td align="right">.868</td><td align="right">348</td><td align="right">.831</td></tr><tr height="20"><td height="20">CF</td><td align="right">.884</td><td align="right">462</td><td align="right">.842</td></tr><tr height="20"><td height="20">RF</td><td align="right">.872</td><td align="right">365</td><td align="right">.843</td></tr></tbody></table><p>Lemme ‘splain – LG_ZR is the average <a href="http://www.baseballthinkfactory.org/files/dialed_in/discussion/what_is_zone_rating/">zone rating</a> (plays divided by chances) at the position. [I derived those values from <a href="http://www.replacementlevel.com/">SG’s</a> zone rating database; everything else in the table is straight from Dial.] ZR_CH is the average number of chances in a season at the position. ZR_RUNS is the average value in runs at the position.</p><p>Each of those measures different things; Zone Rating itself measures the difficulty of making a play. Chances and runs, on the other hand measures leverage of that skill. Tango has done <a href="http://www.insidethebook.com/ee/index.php/site/article/uzr_positional_adjustments/">some great work in this area in the past</a>, but at the runs level. That's fine for measuring value, but I'm going to look at it from a different perspective.</p><p>Right now, all I have is data for players who were at both second and short for the same team in the same season. (Well, at least, that's all the data I've looked at and processed.) <a href="http://www.editgrid.com/user/cwyers/SS_2B_Fielding_87-07">Here’s the data, if you're interested</a>.</p><p>Here's what I did - I took a weighted average of each player's numbers (plays made and chances) at each position, using the number of chances as the weight. For all players, the number of chances used as the weight comes from the position where the player had the fewest number of chances. Then, I calculated zone rating from the weighted averages. (From here on out, when I say average I'm referring to a weighted average, except for the league average.)</p><p>The average zone rating of a shortstop in our sample pool is .836, compared to .843 for the league. So the players that play both positions tend to be below-average shortstops. Makes sense, right? For second basemen, it's .827, compared to .825 for league. So second basemen who play shortstop are (slightly) above-average second basemen, according to our sample. Again, this makes sense, but the difference is small enough that it's probably not worth considering.</p><p>Now, here's the conclusion that doesn't seem to make sense on the face of it: players in our sample pool are making more plays per opportunity at shortstop, compared to second base. All this means is that, <span style="FONT-STYLE: italic">for a shortstop</span>, it's easier to make a play on a ball at short than it is to make a play on a ball at second.</p><p>There is a selective sampling issue here: the players in our pool, by and large, have the physical tools to play shortstop; there are many second basemen who don't posses those tools, which largely comes down to arm strength. From any of the available data - zone rating, Retrosheet play-by-play, etc. - it's, as far as I can tell, impossible to tell who those players are. [We do have that data available from Tango's <a href="http://www.tangotiger.net/scouting/">Fan Scouting Report</a>. That's the next obvious avenue of approach for studying this issue.]</p><p>I also split my sample into two groups - players that played mostly at shortstop and players that played mostly at second base. There wasn't a large difference between the two groups that I noticed.</p><p>The difference between the average second baseman playing shortstop (which is pretty close to the average second baseman overall) is about four plays or three runs over a full season. Again - this is for players who teams selected based on their ability to play shortstop. The difference between our shortstops-at-second and the league second baseman was only one play; bear in mind that these are below-average shortstops to begin with.</p><p>My hunch is that the best way to do a conversion factor for a player, assuming that he's able to play shortstop at all, is to use a multiplier to convert plays per chance between positions. I'll have to look into that - and, again, the next step is to look at physical tools using the Fan's Scouting Report.<br /></p>Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com4tag:blogger.com,1999:blog-8801122366476963814.post-90849549499408406302008-06-08T01:14:00.001-07:002008-06-08T01:14:38.242-07:00Team Defense in the NL, Season to Date<p>I've been away for a while, mostly working on stuff for <a href="http://www.goatriders.org">GROTA</a>. <a href="http://www.goatriders.org/node/2195">I do a lot of stuff with Zone Rating in a post there recently</a>, mostly having to do with how the Cubs line up defensively. But to get there, I had to figure out +/- ratings for every player in the NL. Funny how that works out, huh. <a href="http://www.editgrid.com/user/cwyers/nl_fielding_6-8-08">You can find that data on EditGrid, as usual</a>.</p> <p>But, as an added bonus, here are the totals by team for everyone in the NL:</p> <table cellspacing="0" cellpadding="0"><colgroup><col span="span" width="64" /></colgroup><tbody> <tr height="20"> <td width="64" height="20">STL</td> <td align="right" width="64">32.93151</td> </tr> <tr height="20"> <td height="20">ATL</td> <td align="right">17.72968</td> </tr> <tr height="20"> <td height="20">SD</td> <td align="right">16.24524</td> </tr> <tr height="20"> <td height="20">HOU</td> <td align="right">14.94331</td> </tr> <tr height="20"> <td height="20">LAN</td> <td align="right">13.56041</td> </tr> <tr height="20"> <td height="20">PHI</td> <td align="right">9.173859</td> </tr> <tr height="20"> <td height="20">CHN</td> <td align="right">6.625689</td> </tr> <tr height="20"> <td height="20">WAS</td> <td align="right">1.453572</td> </tr> <tr height="20"> <td height="20">NYN</td> <td align="right">-1.26417</td> </tr> <tr height="20"> <td height="20">MIL</td> <td align="right">-3.81383</td> </tr> <tr height="20"> <td height="20">ARI</td> <td align="right">-5.69194</td> </tr> <tr height="20"> <td height="20">COL</td> <td align="right">-8.97945</td> </tr> <tr height="20"> <td height="20">CIN</td> <td align="right">-10.5495</td> </tr> <tr height="20"> <td height="20">SF</td> <td align="right">-20.3379</td> </tr> <tr height="20"> <td height="20">FLA</td> <td align="right">-30.6143</td> </tr> <tr height="20"> <td height="20">PIT</td> <td align="right">-31.4122</td> </tr> </tbody></table> <p>If you're wondering where St. Louis' out-of-nowhere pitching performance has come from, well, there you go. The Brewers really are an upgraded team on defense. The Reds are not. I'll confess I find the Marlins surprising, given that Hanley Ramirez picked this season to start being an average defensive shortstop. (We'll see how long that lasts, won't we.)</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com0tag:blogger.com,1999:blog-8801122366476963814.post-41934035244602846632008-05-07T00:09:00.001-07:002008-05-07T00:09:55.433-07:00Outfield defense graphs<p>I'm way behind on working with the Enhanced Gameday hit location data - I'm working on adding the 2005, 2006 and 2008 seasons to the database, as well as trying to figure out if I can parse some extra data out of the 2008 data.</p> <p>I'm also trying to create a defensive rating metric based upon the data, similar to the <a href="http://stat.wharton.upenn.edu/~stjensen/research/safe.html">SAFE metric</a>. I'm pretty comfortable with converting everything to vectors at this point, and have a rough idea about how to handle the smoothing function. Mostly at this point it's learning enough about R to actually do the implementation.</p> <p>For those wondering about the accuracy of this data, it's not as good as the datasets from STATS, Inc. or BIS. I found an article <a href="http://www.hardballtimes.com/main/article/is-seeing-believing/">comparing the various hit location datasets</a>.</p> <p>In the meantime, <a href="http://www.goatriders.org/how-good-is-alfonso-sorianos-defensive-range">I've started doing some work with outfielders for the other site I blog at</a>. The good news is that I've got the routines I use to make the plots much more automated than I did in the past, and so I can crank out plots a lot quicker now. If anyone has a particular fielder they're interested in - infield or outfield - let me know.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com4tag:blogger.com,1999:blog-8801122366476963814.post-22869085465309502752008-04-30T22:36:00.001-07:002008-04-30T22:36:06.280-07:00Looking for defensive shifts, part II<p>Last night, <a href="http://otherfifteen.blogspot.com/2008/04/looking-for-defensive-shifts.html">I went looking for defensive shifts</a>. Well, here they are, broken down by base/out state and batter handedness:</p> <p><a href="http://lh4.ggpht.com/pontifexexmachina/SBlWwwK72HI/AAAAAAAAAN8/Lg2p6IyprgA/s1600-h/positioning%5B4%5D.png"><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="574" alt="positioning" src="http://lh6.ggpht.com/pontifexexmachina/SBlWxQK72II/AAAAAAAAAOE/2yZQQe96eS4/positioning_thumb%5B2%5D.png?imgmax=800" width="575" border="0" /></a></p> <p>There may be special cases, like the wishbone shift, but I think that's pretty representative of how fielders position themselves. (If you want the underlying data, <a href="http://www.editgrid.com/user/cwyers/by_situation_hit_locations">here it is</a>.)</p> <p>It's hard to read the graph without looking at the accompanying data - the fielders don't always shift in the same direction on any given shift. Random thoughts based on a cursory reading of the data:</p> <ol> <li>Corner infielders tend to shift more than middle infielders. (As a measure of distance, that is.)</li> <li>For corner infielders, most shifts involve depth; most middle infielder shifts involve lateral movement.</li> <li>There's no small amount of noise to that graph; if I was going to use this sort of data for a project (say, determining responsibility for a ground ball in a zone rating type of system) I'd want to smooth it out into, say, maybe three or four different positionings per position. (Just eyeballing it real quick, for first base there seems to be three basic positionings - base empty, runner on first, and playing the bunt. Same for third base.)</li> <li>A lot of people were asking if having a good defensive third baseman could "cut into" a shortstop's graph, making his range look smaller than it really was. I have a hunch that we might be able to find out exactly how much third basemen are cutting into the shortstop's area of responsibility if we look at how much the shortstop's range grows when the third baseman is playing in.</li> </ol> <p>My next project is going to be to figure out a way to estimate how many "chances" a player has at a position from the hit location data. (I haven't really done anything with it yet, but I have hit location data for balls hit into the outfield as well.) It's far less a conceptual problem than it is a judgement and labor problem - I have a good idea of how I want to delineate the zones of responsibility. The question is now how big to make the zones; then I have to actually go and write the code.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com1tag:blogger.com,1999:blog-8801122366476963814.post-56366226926304909882008-04-29T23:14:00.001-07:002008-04-29T23:14:55.397-07:00Looking for defensive shifts<p>Continuing with <a href="http://otherfifteen.blogspot.com/2008/04/can-we-measure-fielder-range.html">making defensive graphs</a>, I thought it might be illuminating to try and see exactly how much defensive positioning impacts where ground balls are fielded. The following are graphs of the average location of a fielded ground ball, color-coded by position.</p> <p>First off, every ball in the data set:</p> <p><a href="http://lh3.ggpht.com/pontifexexmachina/SBgOSwK715I/AAAAAAAAAMM/kN36T7_bonw/s1600-h/standard_defensive_alignment%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="standard_defensive_alignment" src="http://lh5.ggpht.com/pontifexexmachina/SBgOTQK716I/AAAAAAAAAMU/gabJ517dUaE/standard_defensive_alignment_thumb.png?imgmax=800" width="244" border="0" /></a> </p> <p>Makes sense, I suppose. We'll call that our reference graph.</p> <p>Now, let's take a look at just right handed batters with the bases empty:</p> <p><a href="http://lh3.ggpht.com/pontifexexmachina/SBgOTwK717I/AAAAAAAAAMc/Q0TZY9Thhbs/s1600-h/R%20bases_empty%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="R bases_empty" src="http://lh4.ggpht.com/pontifexexmachina/SBgOUAK718I/AAAAAAAAAMk/PtISzWqSKZo/R%20bases_empty_thumb.png?imgmax=800" width="244" border="0" /></a> </p> <p>Virtually identical - everybody plays "straight-up" in that situation. But now let's look at the same situation with a left-handed batter:</p> <p><a href="http://lh5.ggpht.com/pontifexexmachina/SBgOUQK719I/AAAAAAAAAMs/jAS1ssBbAgE/s1600-h/L%20bases_empty%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="L bases_empty" src="http://lh6.ggpht.com/pontifexexmachina/SBgOUgK71-I/AAAAAAAAAM0/ZrUhjQvm5O0/L%20bases_empty_thumb.png?imgmax=800" width="244" border="0" /></a> </p> <p>It's a subtle shift, but a shift nonetheless. The middle infielders seem to "cheat" a bit to their left, and the third baseman seems to be doing a heck of a lot of cheating, not only over but shallow.</p> <p>Now, with a runner on first (second and third empty):</p> <p><a href="http://lh3.ggpht.com/pontifexexmachina/SBgOUwK71_I/AAAAAAAAAM8/Rnna21OqpPE/s1600-h/runner%20on%20first%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="runner on first" src="http://lh5.ggpht.com/pontifexexmachina/SBgOVQK72AI/AAAAAAAAANE/WErx8MM9qFI/runner%20on%20first_thumb.png?imgmax=800" width="244" border="0" /></a></p> <p>Everybody but the first baseman seems to play straight up - the first baseman plays a lot shallower, though. With runners on first and second:</p> <p><a href="http://lh6.ggpht.com/pontifexexmachina/SBgOVgK72BI/AAAAAAAAANM/9hS_JLD2KK4/s1600-h/first%20and%20second%5B5%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="first and second" src="http://lh3.ggpht.com/pontifexexmachina/SBgOVwK72CI/AAAAAAAAANU/EE_NDEIIzN8/first%20and%20second_thumb%5B1%5D.png?imgmax=800" width="244" border="0" /></a> </p> <p>That looks pretty much like our reference graph - maybe the third baseman's playing a little in, but other than that everyone is playing straight up. With the bases loaded:</p> <p><a href="http://lh5.ggpht.com/pontifexexmachina/SBgOWQK72DI/AAAAAAAAANc/sdHu0TpKTdA/s1600-h/bases%20loaded%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="bases loaded" src="http://lh6.ggpht.com/pontifexexmachina/SBgOWgK72EI/AAAAAAAAANk/ubbXvy3kThE/bases%20loaded_thumb.png?imgmax=800" width="244" border="0" /></a> </p> <p>Looks like everyone is playing straight up. Now, let's take a look at how fielders position themselves when a handful of lefty sluggers (Thome, Ortiz, Fielder, Giambi and Hafner) are at bat:</p> <p><a href="http://lh4.ggpht.com/pontifexexmachina/SBgOXAK72FI/AAAAAAAAANs/G0tQrZmq90o/s1600-h/shift%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="shift" src="http://lh5.ggpht.com/pontifexexmachina/SBgOXQK72GI/AAAAAAAAAN0/XYqrzly_ejE/shift_thumb.png?imgmax=800" width="244" border="0" /></a> </p> <p>That's the wishbone shift, and I'm really pleased that the graph is capturing it so well.</p> <p>The next step is to look at the number of outs in the inning and see how that affects defensive positioning. (Unless something else shiny distracts me first.)</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com0tag:blogger.com,1999:blog-8801122366476963814.post-44999424315788237912008-04-25T23:32:00.001-07:002008-04-25T23:32:09.493-07:00Derek Jeter vs. Troy Tulowitzki<p>An extension of <a href="http://otherfifteen.blogspot.com/2008/04/can-we-measure-fielder-range.html">last night's fun with graphing</a>.</p> <p><a href="http://lh5.ggpht.com/pontifexexmachina/SBLMXwK713I/AAAAAAAAAL4/BJBW6p3MRyo/s1600-h/jetertulo%5B3%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="324" alt="jetertulo" src="http://lh5.ggpht.com/pontifexexmachina/SBLMZwK714I/AAAAAAAAAME/RkA132yHpuo/jetertulo_thumb%5B1%5D.png?imgmax=800" width="540" border="0" /></a> </p> <p>I'm not saying. I'm just saying.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com8tag:blogger.com,1999:blog-8801122366476963814.post-5038214133878048982008-04-24T23:45:00.001-07:002008-04-24T23:45:52.573-07:00Can we measure a fielder's range?<p>First of all, let's define range as a fielder's ability to get to baseballs. For the purposes of this analysis, we don't care what a fielder does with the ball when he gets to it. We're only interested in how far he can go to get to a baseball.</p> <p>Pretty much any defensive metric you see that attempts to measure range is really an approximation based upon the data available. The more data we have, the better (play-by-play data is better than seasonal totals; zone rating is even better) but the best thing would be to know:</p> <ol> <li>Where the ball was fielded, and</li> <li>Where the fielder was standing where the ball was hit.</li> </ol> <p>I am unaware of anyone who keeps track of #2, but there are several outlets that keep track of #1. MLB.com keeps track of it for their Enhanced Gameday functionality. MLB Advanced Media, at least currently, makes that data available to researchers. Most commonly it's used in PitchF/X analysis, but for right now I'm just using the hit location data.</p> <p>Rather than parse the remainder of the data I needed from the verbatim description (which is something I'd love to start doing, but let's just say that regular expressions scare me) I merged the hit location data with the Retrosheet play-by-play data. (The boilerplate: The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.) Dan Turkenkopf <a href="http://blog.stealingfirst.com/2008/03/07/how-to-link-pitchfx-to-retrosheet/">has a great parser that links the Gameday XML files up with the appropriate Retrosheet IDs.</a> A few SQL queries later, and bingo, data! I filtered for all ground balls and divided them up by what position was responsible for fielding them. The result of the play (hit, error or out) was also recorded, although not used in the presentation here.</p> <p>But what to do with the data? I decided to try plotting it with <a href="http://www.r-project.org/">GNU R</a>, a free statistics and graphing programming tool. The basics of this are covered in Baseball Hacks, by Joseph Adler. Additional functionality is provided by the <a href="http://www.bioconductor.org/repository/release1.5/package/html/geneplotter.html">Geneplotter library.</a></p> <p>Okay, here we go - plots!</p> <p>First base:</p> <p><a href="http://lh6.ggpht.com/pontifexexmachina/SBF-CQK71nI/AAAAAAAAAJ4/WtvGiXmzd4E/s1600-h/firstbase%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="firstbase" src="http://lh3.ggpht.com/pontifexexmachina/SBF-CgK71oI/AAAAAAAAAKA/PxbU8YUW3fY/firstbase_thumb.png?imgmax=800" width="244" border="0" /></a> </p> <p>Second base:</p> <p><a href="http://lh5.ggpht.com/pontifexexmachina/SBF-DAK71pI/AAAAAAAAAKI/aL2u8-5IfWA/s1600-h/secondbase%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="secondbase" src="http://lh6.ggpht.com/pontifexexmachina/SBF-DQK71qI/AAAAAAAAAKQ/C7VvTMFn8HI/secondbase_thumb.png?imgmax=800" width="244" border="0" /></a> </p> <p>Shortstop:</p> <p><a href="http://lh4.ggpht.com/pontifexexmachina/SBF-DwK71rI/AAAAAAAAAKY/yiYG_4bWzeo/s1600-h/shortstop%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="shortstop" src="http://lh5.ggpht.com/pontifexexmachina/SBF-EAK71sI/AAAAAAAAAKg/608z7Ze2r_Q/shortstop_thumb.png?imgmax=800" width="244" border="0" /></a> </p> <p>Third base:</p> <p><a href="http://lh3.ggpht.com/pontifexexmachina/SBF-EgK71tI/AAAAAAAAAKo/7j_Q3bywSXs/s1600-h/thirdbase%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="thirdbase" src="http://lh4.ggpht.com/pontifexexmachina/SBF-EwK71uI/AAAAAAAAAKw/S32Jj0hjdOQ/thirdbase_thumb.png?imgmax=800" width="244" border="0" /></a> </p> <p>Those are our reference images - all fielders from 2007 are included. Now, let's take a look at a handful of shortstops, just to see how a few different players look.</p> <p>Troy Tulowitzki:</p> <p><a href="http://lh6.ggpht.com/pontifexexmachina/SBF-FQK71vI/AAAAAAAAAK4/VTvPWqO62Co/s1600-h/tulowitzki%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="tulowitzki" src="http://lh4.ggpht.com/pontifexexmachina/SBF-FwK71wI/AAAAAAAAALA/BVWSChKRVrw/tulowitzki_thumb.png?imgmax=800" width="244" border="0" /></a> </p> <p>Adam Everett:</p> <p><a href="http://lh5.ggpht.com/pontifexexmachina/SBF-GAK71xI/AAAAAAAAALI/vPlrooxTShE/s1600-h/everett%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="everett" src="http://lh3.ggpht.com/pontifexexmachina/SBF-GgK71yI/AAAAAAAAALQ/AzR-eQXuEm8/everett_thumb.png?imgmax=800" width="244" border="0" /></a> </p> <p>Derek Jeter:</p> <p><a href="http://lh4.ggpht.com/pontifexexmachina/SBF-GwK71zI/AAAAAAAAALY/_DWywtrMtP0/s1600-h/jeter%5B2%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="jeter" src="http://lh5.ggpht.com/pontifexexmachina/SBF-HAK710I/AAAAAAAAALg/El9nbzY1pM0/jeter_thumb.png?imgmax=800" width="244" border="0" /></a> </p> <p>Ryan Theriot:</p> <p><a href="http://lh3.ggpht.com/pontifexexmachina/SBF-HgK711I/AAAAAAAAALo/YYl-xkFWnGE/s1600-h/theriot%5B5%5D.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="244" alt="theriot" src="http://lh4.ggpht.com/pontifexexmachina/SBF-HwK712I/AAAAAAAAALw/4X_9rGz-AGg/theriot_thumb%5B1%5D.png?imgmax=800" width="244" border="0" /></a> </p> <p>I'll be the first to admit - if it wasn't nearly 2 in the morning, I would spend a little more time on cleaning up these graphs. The pitches right on the edges seem a bit more difficult to "read," for one. And I don't really need the entire field like that for just infielders - we can do the same thing with outfielders, I just haven't yet.</p> <p>I'm just "doodling" with the data for now - if there is quantitative analysis to be done with this data, it'll have to wait at least until morning. I would love to put that Tulo graph next to that Jeter graph and show it to a bunch of Yankees fans, see what they have to say about it. Suggestions and requests are welcome.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com0tag:blogger.com,1999:blog-8801122366476963814.post-9630994081441726122008-04-21T11:31:00.001-07:002008-04-21T11:31:54.367-07:00With and without Aramis Ramirez<p>Consider this a lark with data; I wouldn't consider the conclusions definitive, or even necessarily meaningful. It's at least thought provoking.</p> <p>First, I cobbled together a fascimile of a zone rating system <a href="http://mvn.com/mlb-stats/2007/04/23/totalzone-a-new-defensive-measure/">based upon the work of Sean Smith on TotalZone</a>. Let me be clear here: my zone rating system is probably the worst zone rating system in existence. If you want a Zone Rating system based on Retrosheet data, <a href="http://www.hardballtimes.com/main/article/measuring-defense-for-players-back-to-1956-part-2/">TotalZone</a> or <a href="http://www.baseballprospectus.com/article.php?articleid=7072">SFR</a> are vastly superior; <a href="http://www.insidethebook.com/ee/index.php/site/comments/uzr_2007_complete_list/">UZR</a> and <a href="http://www.baseballmusings.com/archives/cat_probabilistic_model_of_range.php">PMR</a> are better still. And my system - let's call it SZR, for "Stupid Zone Rating" - only rates shortstops, making it even more, well, stupid. Or "special," if you're worried about hurting its feelings.</p> <p>Here's how it works:</p> <ol> <li>Shortstops are given credit every time they record an out or fielder's choice on a ground ball.</li> <li>An "opportunity" to make a play is assigned for errors, and half of all ground ball singles hit to left and center field.</li> </ol> <p>Stupid Zone Rating is simply Outs divided by Outs plus Opportunities.</p> <p>So why invent the worst possible zone rating system? Because it lets me play around with the data a bit. In this specific case, what I wanted to know was simple. Aramis Ramirez had easily the best defensive season of his career last season. He also missed no small amount of playing time, which gives us a healthy amount of "non-Aramis" opportunities to compare to.</p> <p>What I was curious about was, did A-Ram's big defensive season have an effect on the Cubs shortstops?</p> <p>The average SZR of a shortstop from 2004-2007 was .766. During that time, the average SZR of Cubs shortstops when Ramirez didn't play was .801; when Ramirez played, the average SZR of Cubs shortstops was .761.</p> <p>Now, we'll drill down to 2007. When Ramirez played in 2007, Cubs shortstops averaged a .762 SZR; when Ramirez was out of the lineup, the Cubs averaged a .797 SZR.</p> <p>I have to go to my actual paying job now, so I'll leave figuring out what that data means as an exercise to the reader. <a href="http://www.editgrid.com/user/cwyers/SZR_splits%2C_2004-2007">The full spreadsheet is available to peruse.</a></p> <p>(The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.)</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com1tag:blogger.com,1999:blog-8801122366476963814.post-1776471310076879132008-04-16T10:02:00.001-07:002008-04-16T10:02:30.651-07:00Moments in transition<p>I am now a member of the <a href="http://www.goatriders.org/new-goat-writer">Goat Riders of the Apocalypse</a>.</p> <p>What does that mean? As of yet, I'm not 100% certain. What I do know is that this is obviously going to cut down on the amount of activity here - anything Cubs-specific, especially anything non-mathy, is going to be over at GROTA.</p> <p>As for The Other Fifteen? For right now, I'm planning to keep this running as a less-Cuby, more SABRy blog for the time being. But I'm not sure exactly how often I'll be posting here in the future.</p> <p>So expect things to be tenative for the next week or so. I'm unsure of quite a bit going forward - how many Babylon 5 references and Excel spreadsheets the GROTA audience is clamoring for, how productive I can be writing for multiple outlets, how many jokes I can make about Ryan Theriot.</p> <p>In the meantime, <a href="http://www.goatriders.org/how-much-better-is-soriano-than-murton">here's a look at how a potential Soriano trip to the DL could affect us.</a> Enjoy.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com1tag:blogger.com,1999:blog-8801122366476963814.post-21888709626747641272008-04-15T21:25:00.001-07:002008-04-15T21:25:20.655-07:00How best to handle Soriano's injury?<p>Let's assume for a moment that he's only going to be out a few days. That assumption means no callups. How best to deal with Soriano's injury? (When I say wins, I mean WAR over a full season.)</p> <p><strong>DeRosa in left, Fontenot at second?</strong></p> <p>This should come as no surprise to you - DeRosa is less valuable as a left fielder than as a second baseman - by about half a win. Fontenot, meanwhile, is somewhere between 1 to 1.5 wins worse than DeRosa at second.</p> <p><strong>Ward in left?</strong></p> <p>Ward is below replacement as a left fielder, given his atrocious defense in the outfield. In fact, the more I think about it, the more I don't like having Ward on the team. He makes the bench seem a lot shorter than it really is.</p> <p><strong>Johnson in left?</strong></p> <p>Probably the best option; it's, after all, why teams carry fourth outfielders. That means keeping Pie in the lineup, however, which the team has seem reluctant to do so far this season.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com4tag:blogger.com,1999:blog-8801122366476963814.post-22954091428953741802008-04-14T21:50:00.001-07:002008-04-14T21:50:54.174-07:00Unsafe at any speed<p>Felix Pie is being freed from the bench for at least one afternoon of baseball, <a href="http://chicagosports.chicagotribune.com/sports/baseball/cubs/cs-080414-felix-pie-chicago-cubs,1,6070847.story">according to Paul Sullivan</a>. Not for good reasons like "defense" or "needing at-bats to develop." Nope. Because he's speedy!</p> <blockquote> <p>Manager Lou Piniella said Pie sat so much because the Cubs faced four left-handers on the trip, and he expects him back in the lineup Tuesday night against Cincinnati's Aaron Harang. Piniella said he wants more speed in the lineup and to be more aggressive on the basepaths.</p> <p>"The possibility there is to put as much speed as we can and force the action a little," Piniella said. "We've done that in a few games, but basically I've stayed with a set lineup."</p> <p>The speediest Cubs lineup would include Pie in center and Ronny Cedeno at second, giving the Cubs four legitimate base-stealing threats, along with Ryan Theriot and Alfonso Soriano.</p> <p>Of course, Pie and Soriano have to get on base to make use of their speed, and if Cedeno starts, Mark DeRosa would have to come off the bench.</p> </blockquote> <p>I have no idea if Sullivan's remarks about Cedeno are indicative of all about Lou's thinking; the goal is certainly not to have Theriot <strong>and</strong> Cedeno in the lineup, so I certainly hope he's just idly musing about such things.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com2tag:blogger.com,1999:blog-8801122366476963814.post-43303526394805926562008-04-14T11:53:00.000-07:002008-04-14T12:09:11.393-07:00Taking things a bit too farThere are a lot of things you can criticize Dusty Baker for, and there's even a kernel of truth to <a href="http://chicagosports.chicagotribune.com/sports/baseball/cubs/cs-080413-dusty-baker-chicago-cubs-cincinnati-reds,1,6678638.story">this</a>, but still - huh?<br /><br /><blockquote>With the Cubs out of the race in '06, Baker often played veteran Neifi Perez over rookie Ryan Theriot, leading to criticism he slowed Theriot's development.<br /><br />"I don't think that's a fair criticism," Theriot said. "Neifi was a great player, and a proven veteran, a guy that did a lot of good things for a long time. There's a flip side you never hear about. He could've thrown me to the wolves and it could've turned out bad. I learned a lot just sitting back watching. There wasn't very much pressure, so just sit there and understand the big-league life. I learned from guys like Neifi and Todd [Walker]. They taught me a lot. One thing Dusty did do, when he started seeing some confidence, a couple good games in a row, he kept throwing you back out there."</blockquote><br /><br />Yep, it's Dusty's fault that... that what? That we missed out on a half-season of The Riot? That Theriot isn't very good? What are you even saying here?Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com3tag:blogger.com,1999:blog-8801122366476963814.post-89275135837604764712008-04-13T08:22:00.001-07:002008-04-13T08:52:18.132-07:00Sunday game notes<p>Let's take a look at the accumulated press reports and see if we can't divine what Lou's plans are for the day, shall we?</p> <ul> <li><a href="http://chicago.cubs.mlb.com/news/article.jsp?ymd=20080412&content_id=2520774&vkey=news_chc&fext=.jsp&c_id=chc">Henry Blanco will start</a>. Soto's a young guy, but he hardly leads the team in days off at this point, and it's a long season. I'd prefer it to be J.D. Closser as the backup, but what do I know, right? </li> <li><a href="http://www.dailyherald.com/story/?id=172119">Hill is going to pitch in relief today</a>. I imagine this will change if Marquis is having a complete-game no hitter or something, but other than that the idea is to get a few innings out of Hill and hopefully see an improvement in his ability to pitch in the zone. </li> <li>Gordon Wittenmeyer hints that last night's odd arrangement (putting Fukudome in <s>left</s> center and DeRosa in right) <a href="http://www.suntimes.com/sports/baseball/cubs/892261,CST-SPT-cubnt13.article">might become more common</a>, as the Cubs try to work Mike Fontenot into the lineup so that we can have two lefties. (May I suggest Eric Patterson?) [<em>Sorry about the mixup - I typed left and meant center for Fukudome.</em> -CW]</li> <li>Expect to see either Johnson or Cedeno or both today; <a href="http://www.baseball-reference.com/previews/2008/PHI200804130.shtml">both have hit Moyers very well in their careers.</a> (Which is something I don't care about but Lou does.) Johnson was probably a given anyways. The other Cubs who have hit Moyers well are D-Lee and Rich Hill. (Maybe that's why he's starting today.)</li> </ul> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com4tag:blogger.com,1999:blog-8801122366476963814.post-7525119024903930622008-04-12T14:14:00.001-07:002008-04-12T14:14:52.826-07:00How much does admiring one's homers hurt a team?<p>I'll admit that I don't know, <a href="http://www.bleedcubbieblue.com/2008/4/12/392189/how-much-does-gazing-at-ho">but I'm going to try to find out</a>. Hopefully the fans at BCB (who, I'll admit, generally pay more attention to how much someone hustles than I do) will help me track this as well as possible.</p> <p>I'm also going to throw this open to fans of other teams - if you're willing to keep track of this with enough data for me to track the data at season's end, I'll go ahead and look at it for you as well. If a bunch of Red Sox fans want to know how much it hurts to have Manny being Manny, let me know how often he does it and I'll figure it out for you.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com4tag:blogger.com,1999:blog-8801122366476963814.post-45668544525482481902008-04-12T13:45:00.001-07:002008-04-12T13:45:09.884-07:00Fun with regression to the mean<p>Let's talk a bit about sample size and regression to the mean, shall we? I feel like I owe some actual examples of what I'm talking about given my <a href="http://otherfifteen.blogspot.com/2008/04/sermon-on-mound.html">recent screed on the topic</a>.</p> <p>Let's go ahead and do a throwdown-to-showdown, between Cubs centerfield options Felix Pie and Reed Johnson. (It's a <a href="http://www.bleedcubbieblue.com/2008/4/11/391929/the-great-pie-debate">hot topic of discussion among Cubs fans</a> who aren't too busy <a href="http://www.anothercubsblog.net/2008/04/12/rich-hill-to-the-bullpen#comments">pining after Ronny Cedeno</a>.) And let's use OBP as our stat of choice for comparing them for the moment. How much more often will Johnson get on base than Pie?</p> <p>We're simply going to focus on 2008 production for just a minute, because people seem to be much more excited about 2008 production to date than they are about the latest build of the ZiPS projections, for instance.</p> <p>According to Baseball-Reference.com, Reed Johnson has a .478 OBP in 23 plate appearances; Pie has a .217 OBP in 23 plate appearances. Obviously Johnson is a better choice as a center fielder, just based upon OBP, right?</p> <p>At 23 plate appearances apiece, we know no such thing. What we do know, on the other hand, is that both of them are major league baseball players, and the central tendancy of OBP talent among major league players is roughly .330. That's the mean; now we can regress to the mean. (A <a href="http://www.athleticsnation.com/2008/3/8/252165/staturday-small-sample-siz">great primer</a> on how to do that is available from Sal at Athletics Nation.)</p> <p>Once we go ahead and regress Reed Johnson's OBP to the mean, we end up with a .338 OBP. For Pie, once we regress to the mean, we end up with a .321 OBP.</p> <p>Fun, right?</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com0tag:blogger.com,1999:blog-8801122366476963814.post-78406840854539742872008-04-12T10:31:00.001-07:002008-04-12T10:31:19.707-07:00Attention, Chicago baseball writers!<p>It has to be hard to be a beat writer for the Cubs. Piniella mocks you in the post-game press conferences. The fans mock you on your own blog (unless you're Mariotti, in which case you can hide in your Hall of Doom while your lackeys do the dirty work). Sometimes you have to feel underappreciated.</p> <p>Well, here's your opportunity to make up for that, in cold hard cash!</p> <p><a href="http://lh4.ggpht.com/pontifexexmachina/SADx5KJX9aI/AAAAAAAAAJo/uvWdEqhO3E8/s1600-h/166432-525-220%5B2%5D.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="105" alt="166432-525-220" src="http://lh5.ggpht.com/pontifexexmachina/SADx5aJX9bI/AAAAAAAAAJw/gLH_OPlajl0/166432-525-220_thumb.jpg?imgmax=800" width="244" border="0" /></a> </p> <p>See this? This is the face of the President who freed the slaves and reunited the Union. He's valid legal tender, and he's all yours... IF you can do this simple favor for me.</p> <p>Next time you're in a room with Lou in some sort of a Q&A forum, ask him if he's concerned about Theriot's slow start. Pie and Hill have been demoted for less, after all.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com2tag:blogger.com,1999:blog-8801122366476963814.post-15426033537816662992008-04-10T23:03:00.001-07:002008-04-10T23:03:19.618-07:00The Sermon on the Mound<p>Okay, it's church-going time. Strictly nondenominational, so long as you are willing to worship at the Church of Baseball.</p> <p>Now, when you come to worship at the Church of Baseball, there are a few things that you need to know. If you know nothing else, then you must at least know this. Study it. Repeat it to yourself. Say it after every inning. Make it a part of yourself. Ready? Okay, here goes the Sabermetrician's Creed:</p> <blockquote> <p><font color="#444444">I cannot learn anything meaningful about a batter in less than 50 plate appearances.</font></p> <p><font color="#444444">I cannot learn anything meaningful about a pitcher in less than 20 innings pitched.</font></p> <p><font color="#444444">I cannot learn anything meaningful about a team in less than 500 plate appearances.</font></p> </blockquote> <p>This is not negotiable - this isn't something that you can argue over or find exceptions to. You absolutely cannot draw conclusions about a player or team's performance in ten games or less; you can not.</p> <p>This is not to say that you can draw conclusions about a team's performance in twenty games - you probably can't do that, either. But there is absolutely no way you can do it in less than ten.</p> <p>Again: this is not a matter of opinion. Anyone that tells you otherwise is lying to you, and that's the truth.</p> <p>[I will add one caveat - doctors are an exception to this rule. So you can learn that Pedro Martinez has a bum hamstring. I'm sure that really surprised you.]</p> <p>Right now, we see amazing things all around us: the Orioles are first in the AL East! The Tigers have the worst record in baseball! Jason Kendall leads the NL in batting average! Kyle Lohse leads the NL in ERA!</p> <p>And people are nearly desperate to ascribe meaning to all of those things. But <em>they don't matter</em>. They're meaningless. The literally have no meaning, none, whatsoever.</p> <p><em>But no team has ever gone 0-7 and made the playoffs!</em></p> <p>I don't care.</p> <p><em>Jason Kendall had eye surgery!</em></p> <p>I don't care.</p> <p><em>Dave Duncan is the greatest pitching coach ever!</em></p> <p>I don't care.</p> <p>Do you know why I don't care?</p> <p>Because you absolutely cannot draw conclusions about a player's or team's performance in ten games or less; you can not.</p> <p>Now go forth, and sin no more. Grace be with those who understand and appreciate the great importance of sample size in all things; amen.</p> Colin Wyershttp://www.blogger.com/profile/17189081667691281016noreply@blogger.com4