I'll go ahead and be honest; baserunning is something I don't pay a lot of attention to. If a guy can hit the ball and field the ball, that's usually good enough for me. I mean, usually a guy has to be on the bases to run them, right?
But something that sticks in my craw is when people tell me that baserunning is one of the things "your stats can't measure." First of all, they're not my stats - I don't know that I've contributed one thing to the larger body of knowledge about baseball. Second - my damned stats sure as hell can measure your baserunning! And so that's what we're going to do here.
I should note that what I said above still holds true - I rely greatly upon the work and ideas of other, better people. I am not a sabermetrician, just someone that writes about sabermetrics. So it behooves me to say that I'm standing on the shoulders of giants here.
First of all, none of this would be possible without a database of play-by-play data. Therefore, the required boilerplate text:
"The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711."
Truly, Retrosheet is gold-plated awesome.
Second, many acknowledgements to Dan Fox; you can read more about his (far superior) EqBRR stats on his blog or at Baseball Prospectus. Also of great help was Lee Panas of Tiger Tales; I highly recommend his excellent series on baserunning metrics. And in case you haven't notice already, I rely very heavily on the research of Tom Tango.
Parsing the Retrosheet logs, I took a look at twelve different common baserunning events. Only the lead runner was considered; it's not fair to judge a runner based on how well the guy on base in front of him is running. (Dusty Baker infamously referred to this as "clogging the bases.")
Event Code | Normal outcome | XB |
1B_2B | Runner on first advances to second on a single | Runner on first advances to third on a single |
2B_3B | Runner on first advances to third on a double | Runner on first advances to home on a double |
1B_GB | Runner on first stays on first after a groundout | Runner on first advances to second after a groundout |
1B_FB | Runner on first stays on first after a flyout | Runner on first advances to second after a flyout |
1B_NOTINPLAY | Runner on first stays on first on a ball not in play | Runner on first advances to second on a ball not in play |
2B_3B | Runner on second advances to third on a single | Runner on second advances to home on a single |
2B_GB | Runner on second stays on second after a groundout | Runner on second advances to third after a groundout |
2B_FB | Runner on second stays on second after a flyout | Runner on second advances to third after a flyout |
2B_NOTINPLAY | Runner on second stays on second on a ball not in play | Runner on second advances to third on a ball not in play |
3B_GB | Runner on third stays on third after a groundout | Runner on third advances to home after a groundout |
3B_FB | Runner on third stays on third after a flyout | Runner on third advances to home after a flyout |
3B_NOTINPLAY | Runner on third stays on third on a ball not in play | Runner on third advances to home on a ball not in play |
Every time one of those events occurred, several things got recorded into a table. First, I noted one of three outcomes: the "normal" outcome, the XB, or "extra base," outcome, or the runner being thrown out on the play. I also noted how many outs there were in the inning - the correct decision on whether or not to run is dependant upon the number of outs in the inning, and heads-up baserunners will change their baserunning strategy accordingly. All of those things are then assigned to a baserunner and totaled up. [More accurately, there's a set of SQL queries and an Excel spreadsheet that does most of the work.] Balls not in play includes stolen bases, wild pitches and passed balls.
Every play is then assigned a run value based upon its run expectancy. We'll use the scenario Tango uses to explain this with. let's say you have a runner on first base with one out. According to Tango's run expectancy chart, an average of .573 runs scores in an inning in that situation. Suppose that the next runner hits a single. (That would be an event code 1B_2B in my system.)
If the runner on first advances to second, then the run expectancy increases to .971. A better baserunner will, depending on the situation, advance to third instead; the run expectancy with runners at the corners and one out is 1.243. Sometimes, however, a baserunner will get thrown out advancing to third; with a runner at first and two outs, the run expectancy drops to .251.
[Note that we assume that the trail runner stays on first in either the extra base or thrown out scenarios; that's a convenient abstraction, one that introduces a small amount of inaccuracy in exchange for avoiding a large computational headache.]
So, using the run expectancy formula, we assign run values to all of our event outcomes, based on the number of outs in the inning. For each runner, we multiply his outcomes by their run expectancy values, and sum up. Then, for a final step, we take a look at how the average baserunner would have performed given the same opportunities, and subtract that number from the sum. That gives us our +/- rating of runs above/below average.
Since I'm a miserable computer programmer, all of my SQL queries take about forever to run, so I limited myself to the years 2004-2007. Fair warning: single season performances are subject to sample size issues, especially with things like baserunning.
Okay, let's take a look at the top 10 best and worst baserunning seasons:
Batter | Name | year | Opp. | +/- |
furcr001 | Rafael Furcal | 2004 | 298 | 12.46229 |
barfj003 | Josh Barfield | 2006 | 246 | 11.80434 |
pierj002 | Juan Pierre | 2007 | 368 | 11.08171 |
carrj001 | Jamey Carroll | 2006 | 295 | 10.95289 |
rollj001 | Jimmy Rollins | 2005 | 364 | 10.93031 |
milea001 | Aaron Miles | 2004 | 281 | 10.5188 |
crawc002 | Carl Crawford | 2004 | 400 | 10.2553 |
rollj001 | Jimmy Rollins | 2004 | 321 | 9.949088 |
durhr001 | Ray Durham | 2004 | 218 | 9.554747 |
milea001 | Aaron Miles | 2007 | 192 | 9.416688 |
Batter | Name | year | Opp. | +/- |
lawtm002 | Matt Lawton | 2005 | 245 | -7.7626 |
lee-d002 | Derrek Lee | 2007 | 221 | -7.92601 |
ortid001 | David Ortiz | 2005 | 157 | -8.12203 |
konep001 | Paul Konerko | 2004 | 161 | -8.21513 |
lowem001 | Mike Lowell | 2007 | 163 | -8.24073 |
sosas001 | Sammy Sosa | 2004 | 141 | -8.24097 |
sancf001 | Freddy Sanchez | 2005 | 170 | -8.45961 |
mattg002 | Gary Matthews, Jr. | 2006 | 282 | -9.21264 |
garkr001 | Ryan Garko | 2007 | 163 | -10.2327 |
berkl001 | Lance Berkman | 2004 | 213 | -10.3233 |
Makes sense, yes? Speedy middle infielders and outfielders run better than slow corner infielders and outfielders. Lee is a surprising entry on our trailers; he has a reputation for being a very good basestealer for a first baseman. But on the whole it doesn't seem very controversial.
What we also get here are the run values of being a good or poor baserunner. The difference between the best and worst over this four year period was about 23 runs, or roughly two wins. Now two wins is nothing to scoff at, but it pales in comparison to defense or offense; nobody would look at this chart and say, "Gee, I think that the Sox should try and see if they can trade David Ortiz for Aaron Miles."
And, since this is ostensibly a Cubs blog - your 2007 Chicago Cubs!
Name | Batter | Opp. | +/- |
Derrek Lee | lee-d002 | 221 | -7.92601 |
Michael Barrett | barrm003 | 106 | -7.43955 |
Mark DeRosa | derom001 | 153 | -1.8259 |
Aramis Ramirez | ramia001 | 154 | -7.01669 |
Alfonso Soriano | soria001 | 206 | -3.55499 |
Ryan Theriot | therr001 | 219 | -3.70184 |
Jacque Jones | jonej003 | 149 | -0.68092 |
Cliff Floyd | floyc001 | 75 | 0.153609 |
Matt Murton | murtm001 | 84 | 1.03073 |
Mike Fontenot | fontm001 | 94 | -0.82379 |
Cesar Izturis | iztuc001 | 152 | 0.30366 |
Felix Pie | pie-f001 | 71 | 1.143167 |
Jason Kendall | kendj001 | 161 | -1.61157 |
Angel Pagan | pagaa001 | 74 | 2.719269 |
Daryle Ward | wardd002 | 44 | -0.61667 |
Koyie Hill | hillk002 | 15 | 0.13925 |
Ronny Cedeno | ceder002 | 11 | -0.14754 |
Geovany Soto | sotog001 | 17 | -1.63394 |
This will come as a surprise to nobody who watched the Cubs last season, but good Lord the Cubs sucked at running the bases. Daryle Ward was saved from himself on the basepaths by aggressive pinch-running by Lou Piniella; the same couldn't be said for Lee, Ramirez and Barrett. Oh, and that number for Soto in just 17 opportunities? Ah, catchers and their appreciable lack of knees.
The numbers on Soriano and Lee surprise me. Soriano was below average in 2006 as well (possibly a byproduct of an excessive amount of caught stealings that occurred chasing the mythical 40/40), but was very solid the previous two seasons. Lee, on the other hand, has been a below-average baserunner the past four years.
Oh, and I just have to add - Ryan Theriot is a below-average baserunner. That is all.
Spreadsheet available for download.
The next step is to work out a Marcels-like projection system using this data, so that I can incorporate it into my WAR depth chart.
Labels: Alfonso Soriano, Baseball, Chicago Cubs, Derrek Lee, Ryan Theriot
It seemed like you alluded to this, but I'll go ahead and make the (probably redundant) point. It seems to me that a player like Lee, or Ramirez, for that matter, while they might be -7 on the basepaths, they more than offset that -7 with plus defense and run production. OTOH, a player like Theriot, who offers only average defense and very little run production cannot possibly offset even a -3.5.
I just love the list of top 3 baserunning Cubs:
1. Angel Pagan
2. Matt Murton
3. Felix Pie
To think that Matt Murton was in the top 3 on anything involving the Cubs last year, particularly something he's not really supposed to be that good at, is entertaining. Of course there are sample size issues, but it's still funnny.
Is there anyway to get running totals for all of the season's? That would be pretty useful, but having downloaded the spreadsheet, I'm pretty sure I'm looking at SFR to add the totals, but I just wanted to check that I was looking at it correctly.
Lee's 2007 is an aberration. He was above average in the +/- system by James, which is very similar to this, in 2005 and 2006. His 2007 was was absolutely terrible. He made 5 outs on the bases (excluding caught stealing), which tied him with Ryan Theriot and a few others for 2nd most BR outs in baseball. He made only 1 BR out in 2005 and 2006 combined.
Like Sam said, there are sample size issues and frankly, the fact that Murton shows up above average is a huge red flag that something is flawed. We're talking about a guy who didn't score from 3rd base in the same at-bat on a passed ball, a wild pitch, and ground ball to 2nd base with less than 2 outs.
Good stuff, Colin. Keep it coming.
Would it be possible to separate the events into categories that would allow stolen base attempts to be measured separately from advancing on batted balls?
Do the Cubs still have that crazy third base coach who sent everyone home and ran the Cubs into a ton of outs? If not, how about running a few seasons when he was the 3B coach comparing him to the new 3B coach in situations he could affect?
I liked Angel Pagan as an extra outfielder. Why colitis somehow derailed his Cub career is a mystifier to me (laughs). Oh well. If I'm Felix Pie I'd be mighty upset that my name is the third best on that list.
pmayo - To be honest, the main reason I did this was to show that Ryan Theriot's superior baserunning wasn't offsetting his liabilities on offense. But I didn't expect him to be below average!
The Murton figures do seem a bit odd, but he's been above-average for three years according to these rankings. If you can figure out what events I'm missing that turn Murton into a poor baserunner I'll add them to the events list, but I'm inclined to buy on the general conclusion. It's really a "smallest midget" sort of thing, really - he comes off looking good but hardly excellent. (Which is pretty much the story of Matt Murton's life; just like Ryan Theriot is the player I use as a living embodiment of replacement player, Matt Murton is my example of a perfectly league-average ballplayer.)
Retrosheet stores coaching data seperately from the event files, so it would take some work to get that integrated with the baserunning metrics. "Wavin'" Wendell Kim left the Cubs several years ago, anyway, after 2004. It might be a fun thing to do a WOWY on; I know Dan Fox has done something similar before.
A running total would certainly be possible, though. And I could seperate out the stolen bases from the rest of the events, but for what I'm using these for, I don't know if I see the value in doing so. It's probably the great debate about baserunning metrics, and I'd say I understand the arguements for doing so but I really don't.
Oh, and on the topic of Derrek Lee - I know it's certainly an outlier, but I'm not sure it's an abberation. Lee's stock in range-based defensive ratings dropped across the board in 2007. It's possible that Lee is losing a step, both in the field and on the bases. You'd regress to the mean, obviously, but it's an area of (mild) concern.
Yeah, Lee's lost a step, or whatever you want to call it. I've always looked at Lee and saw when others were calling him an athlete a guy who could trip over his own shoes 3 or 4 times per day. He's always seemed clumsy to me and his defense is wildly overrated at 1st. Age caught up with him, in my opinion, getting rid of any youth he might have displayed prior to 2007. Now he's just an old, tall, clumsy player. I expect his baserunning will be just as poor in the future. I shouldn't have used the word aberration though.
As for Theriot, he's just dumb. He has the ability at the plate to work the count in his favor and doesn't. People who want to call Theriot a smart ballplayer are ignoring a shitload of evidence that suggests otherwise.
The same could be said for Matt Murton. If I had a nickel every time a Cubs fan said how he smart a ballplayer he was I'd be filthy rich.
I think both he and Theriot are incredibly stupid ballplayers. Neither seems to have half a clue what the hell they're doing half the time.
You want an example of how stupid Theriot is? When there's a runner on first and he hits a groundball that's probably going to be turned into two outs, he runs down the line while shaking his shoulders in and out. Just watch next time. If he didn't do that stupid nonsense he'd probably ground into half as many double plays as he does.
Funny thing about these numbers, Colin. It isn't going to change a single person's mind about Ryan Theriot. Go show TheHawk your numbers and he'll say your missing something and incapable of understanding baseball. Theriot fans are Theriot fans for life.
Colin,
Question: Does this data incorporate "Cedeno incidents" (i.e., getting picked off)?
Also, could you please, pretty please explain to me why Lou thinks that DeRo isn't a good enough baserunner to hit at the top of the order despite his high OBP, but Riot is a great baserunner that needs to be at the top of the order despite his low OBP?
Any insight you could give me into Lou's thinking would be greatly appreciated.
DMH
DRMH: Stolen bases.
And I'd have to double check, but runners getting picked off should be included in the balls not in play category.
as i am excel-less on my mac (too cheap, i guess), what do your statistics say about moises alou? especially '04ish.
god he was terrible. or did it just seem that way?
He was pretty bad in 2004 - about five runs below average. The numbers have declined steadily since then as he gets on base less.
If you don't want to fork out the cash for Excel, you could download OpenOffice for free.
If you can figure out what events I'm missing that turn Murton into a poor baserunner I'll add them to the events list
I understand this is simply anectdotal, but there was a game in Atlanta early last season where Murton was on third with nobody out and the Cubs trailing by two runs. On a chopper by Jock Jones hit to Chipper Jones, Murton froze along the third base line in spite of the fact that Chipper never looked back in conceding a run that was not the tying run. Chipper threw to first and was surprised to see Murton still on third. That was certainly not the first time Murton appeared to display a real lack of fluidity on the basepaths, although it jumps to mind as it was at a critical juncture of the game.
Murton always struck me as someone who lacked smart baserunning instincts. He's not necessarily slow, but he certainly always seemed to be lacking lucid awareness. OTOH I suppose he doesn't get dinged too badly because this can be construed as simply being "conservative" and thus may have more value than some moron blindly taking an extra base when it's not warranted. I like Murton, and while I don't doubt the validity of this study, it doesn't convince me that Murton is anything less than a lousy baserunner.
Good job Colin. It's nice to see somebody else looking at base running.
Lee