How best to handle Soriano's injury?
4 Comments Published by Colin Wyers on Tuesday, April 15, 2008 at 9:25 PM.
Let's assume for a moment that he's only going to be out a few days. That assumption means no callups. How best to deal with Soriano's injury? (When I say wins, I mean WAR over a full season.)
DeRosa in left, Fontenot at second?
This should come as no surprise to you - DeRosa is less valuable as a left fielder than as a second baseman - by about half a win. Fontenot, meanwhile, is somewhere between 1 to 1.5 wins worse than DeRosa at second.
Ward in left?
Ward is below replacement as a left fielder, given his atrocious defense in the outfield. In fact, the more I think about it, the more I don't like having Ward on the team. He makes the bench seem a lot shorter than it really is.
Johnson in left?
Probably the best option; it's, after all, why teams carry fourth outfielders. That means keeping Pie in the lineup, however, which the team has seem reluctant to do so far this season.
Labels: Alfonso Soriano, Baseball, Chicago Cubs, Felix Pie, Mark DeRosa, Reed Johnson, WAR
How much does admiring one's homers hurt a team?
4 Comments Published by Colin Wyers on Saturday, April 12, 2008 at 2:14 PM.
I'll admit that I don't know, but I'm going to try to find out. Hopefully the fans at BCB (who, I'll admit, generally pay more attention to how much someone hustles than I do) will help me track this as well as possible.
I'm also going to throw this open to fans of other teams - if you're willing to keep track of this with enough data for me to track the data at season's end, I'll go ahead and look at it for you as well. If a bunch of Red Sox fans want to know how much it hurts to have Manny being Manny, let me know how often he does it and I'll figure it out for you.
Labels: Alfonso Soriano, Aramis Ramirez, Baseball, Chicago Cubs
I'll go ahead and be honest; baserunning is something I don't pay a lot of attention to. If a guy can hit the ball and field the ball, that's usually good enough for me. I mean, usually a guy has to be on the bases to run them, right?
But something that sticks in my craw is when people tell me that baserunning is one of the things "your stats can't measure." First of all, they're not my stats - I don't know that I've contributed one thing to the larger body of knowledge about baseball. Second - my damned stats sure as hell can measure your baserunning! And so that's what we're going to do here.
I should note that what I said above still holds true - I rely greatly upon the work and ideas of other, better people. I am not a sabermetrician, just someone that writes about sabermetrics. So it behooves me to say that I'm standing on the shoulders of giants here.
First of all, none of this would be possible without a database of play-by-play data. Therefore, the required boilerplate text:
"The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711."
Truly, Retrosheet is gold-plated awesome.
Second, many acknowledgements to Dan Fox; you can read more about his (far superior) EqBRR stats on his blog or at Baseball Prospectus. Also of great help was Lee Panas of Tiger Tales; I highly recommend his excellent series on baserunning metrics. And in case you haven't notice already, I rely very heavily on the research of Tom Tango.
Parsing the Retrosheet logs, I took a look at twelve different common baserunning events. Only the lead runner was considered; it's not fair to judge a runner based on how well the guy on base in front of him is running. (Dusty Baker infamously referred to this as "clogging the bases.")
Event Code | Normal outcome | XB |
1B_2B | Runner on first advances to second on a single | Runner on first advances to third on a single |
2B_3B | Runner on first advances to third on a double | Runner on first advances to home on a double |
1B_GB | Runner on first stays on first after a groundout | Runner on first advances to second after a groundout |
1B_FB | Runner on first stays on first after a flyout | Runner on first advances to second after a flyout |
1B_NOTINPLAY | Runner on first stays on first on a ball not in play | Runner on first advances to second on a ball not in play |
2B_3B | Runner on second advances to third on a single | Runner on second advances to home on a single |
2B_GB | Runner on second stays on second after a groundout | Runner on second advances to third after a groundout |
2B_FB | Runner on second stays on second after a flyout | Runner on second advances to third after a flyout |
2B_NOTINPLAY | Runner on second stays on second on a ball not in play | Runner on second advances to third on a ball not in play |
3B_GB | Runner on third stays on third after a groundout | Runner on third advances to home after a groundout |
3B_FB | Runner on third stays on third after a flyout | Runner on third advances to home after a flyout |
3B_NOTINPLAY | Runner on third stays on third on a ball not in play | Runner on third advances to home on a ball not in play |
Every time one of those events occurred, several things got recorded into a table. First, I noted one of three outcomes: the "normal" outcome, the XB, or "extra base," outcome, or the runner being thrown out on the play. I also noted how many outs there were in the inning - the correct decision on whether or not to run is dependant upon the number of outs in the inning, and heads-up baserunners will change their baserunning strategy accordingly. All of those things are then assigned to a baserunner and totaled up. [More accurately, there's a set of SQL queries and an Excel spreadsheet that does most of the work.] Balls not in play includes stolen bases, wild pitches and passed balls.
Every play is then assigned a run value based upon its run expectancy. We'll use the scenario Tango uses to explain this with. let's say you have a runner on first base with one out. According to Tango's run expectancy chart, an average of .573 runs scores in an inning in that situation. Suppose that the next runner hits a single. (That would be an event code 1B_2B in my system.)
If the runner on first advances to second, then the run expectancy increases to .971. A better baserunner will, depending on the situation, advance to third instead; the run expectancy with runners at the corners and one out is 1.243. Sometimes, however, a baserunner will get thrown out advancing to third; with a runner at first and two outs, the run expectancy drops to .251.
[Note that we assume that the trail runner stays on first in either the extra base or thrown out scenarios; that's a convenient abstraction, one that introduces a small amount of inaccuracy in exchange for avoiding a large computational headache.]
So, using the run expectancy formula, we assign run values to all of our event outcomes, based on the number of outs in the inning. For each runner, we multiply his outcomes by their run expectancy values, and sum up. Then, for a final step, we take a look at how the average baserunner would have performed given the same opportunities, and subtract that number from the sum. That gives us our +/- rating of runs above/below average.
Since I'm a miserable computer programmer, all of my SQL queries take about forever to run, so I limited myself to the years 2004-2007. Fair warning: single season performances are subject to sample size issues, especially with things like baserunning.
Okay, let's take a look at the top 10 best and worst baserunning seasons:
Batter | Name | year | Opp. | +/- |
furcr001 | Rafael Furcal | 2004 | 298 | 12.46229 |
barfj003 | Josh Barfield | 2006 | 246 | 11.80434 |
pierj002 | Juan Pierre | 2007 | 368 | 11.08171 |
carrj001 | Jamey Carroll | 2006 | 295 | 10.95289 |
rollj001 | Jimmy Rollins | 2005 | 364 | 10.93031 |
milea001 | Aaron Miles | 2004 | 281 | 10.5188 |
crawc002 | Carl Crawford | 2004 | 400 | 10.2553 |
rollj001 | Jimmy Rollins | 2004 | 321 | 9.949088 |
durhr001 | Ray Durham | 2004 | 218 | 9.554747 |
milea001 | Aaron Miles | 2007 | 192 | 9.416688 |
Batter | Name | year | Opp. | +/- |
lawtm002 | Matt Lawton | 2005 | 245 | -7.7626 |
lee-d002 | Derrek Lee | 2007 | 221 | -7.92601 |
ortid001 | David Ortiz | 2005 | 157 | -8.12203 |
konep001 | Paul Konerko | 2004 | 161 | -8.21513 |
lowem001 | Mike Lowell | 2007 | 163 | -8.24073 |
sosas001 | Sammy Sosa | 2004 | 141 | -8.24097 |
sancf001 | Freddy Sanchez | 2005 | 170 | -8.45961 |
mattg002 | Gary Matthews, Jr. | 2006 | 282 | -9.21264 |
garkr001 | Ryan Garko | 2007 | 163 | -10.2327 |
berkl001 | Lance Berkman | 2004 | 213 | -10.3233 |
Makes sense, yes? Speedy middle infielders and outfielders run better than slow corner infielders and outfielders. Lee is a surprising entry on our trailers; he has a reputation for being a very good basestealer for a first baseman. But on the whole it doesn't seem very controversial.
What we also get here are the run values of being a good or poor baserunner. The difference between the best and worst over this four year period was about 23 runs, or roughly two wins. Now two wins is nothing to scoff at, but it pales in comparison to defense or offense; nobody would look at this chart and say, "Gee, I think that the Sox should try and see if they can trade David Ortiz for Aaron Miles."
And, since this is ostensibly a Cubs blog - your 2007 Chicago Cubs!
Name | Batter | Opp. | +/- |
Derrek Lee | lee-d002 | 221 | -7.92601 |
Michael Barrett | barrm003 | 106 | -7.43955 |
Mark DeRosa | derom001 | 153 | -1.8259 |
Aramis Ramirez | ramia001 | 154 | -7.01669 |
Alfonso Soriano | soria001 | 206 | -3.55499 |
Ryan Theriot | therr001 | 219 | -3.70184 |
Jacque Jones | jonej003 | 149 | -0.68092 |
Cliff Floyd | floyc001 | 75 | 0.153609 |
Matt Murton | murtm001 | 84 | 1.03073 |
Mike Fontenot | fontm001 | 94 | -0.82379 |
Cesar Izturis | iztuc001 | 152 | 0.30366 |
Felix Pie | pie-f001 | 71 | 1.143167 |
Jason Kendall | kendj001 | 161 | -1.61157 |
Angel Pagan | pagaa001 | 74 | 2.719269 |
Daryle Ward | wardd002 | 44 | -0.61667 |
Koyie Hill | hillk002 | 15 | 0.13925 |
Ronny Cedeno | ceder002 | 11 | -0.14754 |
Geovany Soto | sotog001 | 17 | -1.63394 |
This will come as a surprise to nobody who watched the Cubs last season, but good Lord the Cubs sucked at running the bases. Daryle Ward was saved from himself on the basepaths by aggressive pinch-running by Lou Piniella; the same couldn't be said for Lee, Ramirez and Barrett. Oh, and that number for Soto in just 17 opportunities? Ah, catchers and their appreciable lack of knees.
The numbers on Soriano and Lee surprise me. Soriano was below average in 2006 as well (possibly a byproduct of an excessive amount of caught stealings that occurred chasing the mythical 40/40), but was very solid the previous two seasons. Lee, on the other hand, has been a below-average baserunner the past four years.
Oh, and I just have to add - Ryan Theriot is a below-average baserunner. That is all.
Spreadsheet available for download.
The next step is to work out a Marcels-like projection system using this data, so that I can incorporate it into my WAR depth chart.
Labels: Alfonso Soriano, Baseball, Chicago Cubs, Derrek Lee, Ryan Theriot
Another look at Alfonso Soriano's splits
0 Comments Published by Colin Wyers on Wednesday, March 19, 2008 at 12:38 AM.
The other day I took a look at Soriano's splits with men on base and with the bases empty. If you want to follow along with this, it's advised that you read that piece first.
You back? Good.
This is seven years of data, 2000-2007. (1999 isn't available from Retrosheet, and so I decided that made a reasonable cutoff point for the time being.) First, let's take a look at the average over that timespan:
Overall: .301 BABIP, 14.9% LD
Men On: .306 BABIP, 14.7% LD
Bases Empty: .298 BABIP, 15.1% LD
Line drive rate and BABIP seem to be very consistent between the two states - there's a small variance, but nothing extraordinary.
Alfonso Soriano:
Overall: .317 BABIP, 16.8% LD
Men On: .307 BABIP, 16.0% LD
Bases Empty: .322 BABIP, 17.2% LD
The line drive rate swings seem a bit more pronounced than the average, but the shape of the distribution is about the same. The BABIP shift, while tracking the LD% pretty well, is counter to the league norms. In fact, with men on base Soriano's BABIP is essentially league average, despite a much better LD% and (presumably) better speed than the average hitter.
It seems like Soriano has been very unlucky with runners ahead of him during his career. I don't see any reason to think that BABIP differential is sustainable other than the fact that he's sustained it this long; I can't say for certain but I'm pretty confident it's not an issue for him to hit with runners on base ahead of him.
Which is why it's really too bad that Lou is committing to bat him behind the team's worst hitter.
UPDATE: I plan on seeing what other little nuggets the data contains at some later point; that said, I have a lot of plans for different things to look into at a later date, and time is a finite resource. In the meantime, my toys are your toys. The file is too large for EditGrid, so it's provided as an Excel spreadsheet. Enjoy.
Labels: Alfonso Soriano, Baseball, Chicago Cubs
Does Soriano hit better with the bases empty?
8 Comments Published by Colin Wyers on Sunday, March 16, 2008 at 1:37 AM.
This came up for discussion on Tango's blog a while back, and it's been sort of sticking in the back of my mind.
Here's Tango:
His wOBA are: .379 with bases empty and .344 with men on base. IIRC, the difference for the average player is a 5 point drop or so. I’m sure someone can correct me. But, he’s got a 35 point difference here (based on almost 3000 PA with bases empty and 2000 with men on base). One standard deviation is roughly a 15 point difference, so we see here a difference of around 2 standard deviations.
While that doesn’t necessarily mean that Soriano definitely prefers to bat with bases empty, it points very strongly toward that.
MGL's reply:
That is dead wrong. Sort of. We don’t really care how many SD’s someone is off. By the time someone points something out to us, OF COURSE it is going to be unusual. That is the classic example of selective sampling or cherry picking. Take any distribution that is completely random (no skill whatsoever). Within that distribution, 5% will be off the mean by more than 2 SD’s. Well, someone can and will point those 5% (or 1%, or .1% if the sample is large enough) players out, and say, “Well, these guys are way off the mean - something must be going on!”
Until we figure out how much, if any, skill, there is, those SD’s mean NOTHING (because they are cherry picked, as is Soriano’s)!
If there is little or no skill for players batting with runners and and without, which I suspect is the case, as with most ‘splits’, then the # of SD from the mean means nothing, since everyone gets regressed 100% (or near 100%). Even if there is a little skill, most of that 2 SD will get regressed toward the mean.
I didn't really have anything to add to the discussion at that point, but it stuck with me. I've never been fully convinced by the argument that Soriano doesn't perform well in RBI situations based upon some true talent level, but I've never really had an argument against the idea, and the benefit derived from moving Soriano down in the lineup didn't seem worth the risk.
Then Dave Pinto chimed in:
Since 2000, which represents all but a few PA of his career, Soriano is ever so slightly worse with a man on first than with the bases empty. So the idea that a man on first bothers him doesn't really hold water. However, I believe most batters do better with a man on first. In the National League in 2007, a man on first added sixteen points to a player's batting average, twelve points to a player's on base average and eighteen points to his slugging percentage.
Pretty much the same split data; Tango used the Retrosheet data on Baseball Reference, and I think Pinto uses BIS data, but other than that it's pretty much all the same.
Then Chuck gets in on the act:
This suggests that something else is up with Soriano. It's not sample size that is the problem as Soriano has 1,676 at bats with runners on base. What it suggests is that his concentration is bothered when men are on base.
In other words, if the situation isn't all about him, he's not the same player.
Statistical evidence of selfishness.
Statistical evidence of selfishness. Really?
Or, put another way - what do Soriano's splits really tell us?
That's the same data Pinto used, but with some additional stats added. They are Walks per PA, Strikeouts per PA, Isolated Slugging (Slugging minus batting average) and Batting Average on Balls in Play (H-HR)/(AB-K-HR+SF).
So what do we see here? Soriano's strikeouts go up a bit, his walks increase by more than a bit, and his power numbers go down by a bit. It's the walk rate that interests me - if he's walking more, then he should be getting on base more. But he's not, because his batting average drops.
Batting average is subject to a lot more randomness than a lot of other things we look at in baseball. That's because, when a player walks or strikes out, he's only interacting with the pitcher; when he gets a base hit, he's interacting with the defense as well, and that introduces a lot of variables.
It's the drop in BABIP that interests me the most; BABIP is subject to even more randomness than batting average; we remove strikeouts and home runs, the two components to batting average a player has the most control over. So what's causing the big change in BABIP for Soriano?
For that, batted ball data is the next place I looked. The following is Retrosheet data, 2007 only:
There's a lot going on here - we've added a few new stat categories, for one. FB, GB, LD and Pop are flyball, groundball, line drive and popup; they're followed by percentages.
The first thing is that these numbers seem to take on about the same shape as Soriano's career number - small spike in K rate, larger spike in walk rate, big change in BABIP.
Soriano's line drive percentage remains very consistent, which is very informative; line drive rate is probably the best predictor of future BABIP that we have. In fact, using our expected BABIP formula (.120 plus LD%) it looks like Soriano's change in BABIP rates between the splits is nothing but a fluke.
But Soriano is also a more extreme fly ball hitter when men are on base in front of him - can we use that data? Dave Studeman has a somewhat more technical expected BABIP formula which uses K rate and FB%, which I'll call xBABIP to differentiate between it and the simpler formula. Again, nothing to suggest that such a radical difference in BABIP is attributable to anything Soriano is doing.
My feeling right now - call it a hunch - is that we're seeing two things going on here at once. Pitchers will try to pitch around Soriano when men are on base in front of him, driving up his walk rates and giving him fewer good pitches to hit. At the same time, we have an entirely random variance in BABIP that exaggerates the impact of the former tendency.
I have a ways to go before I can validate that hunch, though. Next steps:
- Get the rest of Soriano's career data from the Retrosheet event files, and see if that makes a difference.
- Take a look at the major league average in those figures, to see how Soriano compares.
- Take a look and see if players like Soriano - high slugging, low OBP - tend to see their walk rates go up with men on base ahead of them.
- Take a look at pitch-by-pitch data and see if Soriano sees more balls and fewer strikes with men on base.
That's a tall order, admittedly. And my SQL-fu isn't what I'd like it to be, so expect the going to be slow at best.
[An aside: I took a look at Soriano’s top comps from his 2007 PECOTA (I don’t have 2008), and three of them exhibited the same tendency with their walk rates in their careers. (Andre Dawson, Jeff Kent, and Leon Wagner - Paul Blair didn’t, but he looks like a real odd comp to me.) Consider it anecdotal evidence for #3, if you'd like.]
Labels: Alfonso Soriano, Baseball, Chicago Cubs
At the risk of sounding like a broken record, spring training statistics and results are meaningless. People mostly use them to justify opinions they already held, so far as I can tell.
And like I said yesterday, Lou wouldn't be looking at this crazy new lineup of his unless he saw something in it already. And, sure enough, the early returns have come up rosy:
That lineup experiment with Ryan Theriot batting leadoff and Alfonso Soriano batting second? It just may have passed the experimental stage after Cubs manager Lou Piniella used what might be his Opening Day lineup Thursday.
"We're going to stay in that configuration for a bit and take a good look at it," Piniella said.
And he later drove home the point: "The weather's cold in Chicago when we play the first six weeks or so of the season. And to get [Soriano] to the point where he might have to run [batting first], it's just taking chances."
Which... whatever. He got to see the game today and I was watching Gameday, but I really don't know what convinced him that today's lineup was working. Okay, I do know: he liked the idea of the lineup going into the game, and getting the W didn't hurt the idea. It's like pouring lead or reading tea leaves; you get out of it largely what you put into it. Like I said, spring training results are basically meaningless, so if Lou decides to start acting like some mystic when it comes to justifying his decisions - whatever, they're the decisions he wanted to make anyway, so I don't know what would stop him exactly.
[An aside - Lou has convinced himself that it was a lack of small-ball ability that caused the team's struggles early on last season. I'm more convinced that it was a small sample size effect that worked itself out as the season progressed. Lou seems less likely than I to accept that there are things out of Lou Piniella's control, which I suppose I understand but I am growing more and more frustrated with overall.]
Wittenmyer chimes in, calling this a prelude to a Roberts deal. I'm growing rather calloused to Roberts trade talks; I refuse to believe in any such thing until Roberts is actually traded, if that ever comes to pass.
UPDATE: Bruce Miles is my frakkin' hero. Just... go read it all.
Labels: Alfonso Soriano, Baseball, Chicago Cubs, Lou Piniella, Ryan Theriot
Putting a price on stupidity
1 Comments Published by Colin Wyers on Wednesday, March 12, 2008 at 6:42 PM.
Piniella said he plans to move Alfonso Soriano out of the leadoff spot and bat him second, with Ryan Theriot leading off. If it looks good in camp, that could be the order he settles on for the regular season, he said before today’s game against the Texas Rangers.Which, whatever. I am Jack's raging spleen, but there's absolutely nothing I can do about it. So, instead of ranting about it (I fully plan to later, when there's time), let's step back and take a cold, calculated, rational look at the matter with a lineup simulator. I'm using the average forecasts of several systems, helpfully supplied by Harry Pavlidis. And I'm presuming the lineup is:
- Theriot
- Soriano
- Lee
- Ramirez
- Fukudome
- DeRosa
- Soto
- Pie
- Pitcher
Maybe I should give Lou the benefit of the doubt and move Soto up and DeRosa down, but at this point Lou can get the benefit of the doubt back when he does something to earn it, as far as I'm concerned.
Lou's brilliantly conceived lineup scores 4.953 runs per game; the optimal lineup scores 5.224 runs per game. Let's look at how many runs that will cost the team over 138 games - basically, 85% of the season. We'll leave some room for days off in our analysis.
The Lou Lineup scores roughly 684 runs; the computer-gerenated lineup scores roughly 721 runs, for a difference of about 36 runs or about 3.5 wins.
3.5 wins.
Can I go back to the whole "rage" thing again?
UPDATE: I've run some more lineup numbers and go over them at BCB. Consider it "Colin in syndication."
Upon further reflection, once you have Ryan Theriot in your lineup, arguing over where to bat his is probably a question of what caliber bullet you want to shoot yourself in the foot with. [I'm cribbing from myself here.] So a lot of this is just a few month's frustrations with Lou and the team's overall philosophy coming to a head.
Lou is obsessed with getting some speed at the top of the order. I just want to get some people at the top of the order who will get on base in front of our big bats. It's starting to look like there are irreconcilable differences here.
Labels: Alfonso Soriano, Baseball, Chicago Cubs, Projections, Ryan Theriot
Brother, can you paradigm?
4 Comments Published by Colin Wyers on Monday, February 11, 2008 at 10:50 AM.
But... but I just have to comment on this:
Alfonso Soriano, LF
Does he have true leadership skills, as in the kind of ability to lead off for a legitimate pennant contender? That's doubtful, even with 100 percent health in his speedy legs.
Um. Last year as a leadoff hitter he and Curtis Granderson were virtually indistinguishable. Nobody sits there talking about how the Tigers aren't a legitimate playoff contender because Granderson doesn't steal enough bases.
UPDATE: Here's another fun one, from the Northern Indiana Post-Tribune, entitled "Table-setter Theriot has speed Cubs need." The lead paragraph is jut too awesome: "Baseball is a contact sport with shortstop Ryan Theriot. " I have no idea what that even means. Who did he "contact" last season? The writer does acknowlege that there's a chance "Ron Cedeno," who is apparently an "infield predator," beats him out for the starter's job in spring training.
Labels: Alfonso Soriano, Baseball, Chicago Cubs, Infield Predator, Media, Ryan Theriot