I'll go ahead and be honest; baserunning is something I don't pay a lot of attention to. If a guy can hit the ball and field the ball, that's usually good enough for me. I mean, usually a guy has to be on the bases to run them, right?
But something that sticks in my craw is when people tell me that baserunning is one of the things "your stats can't measure." First of all, they're not my stats - I don't know that I've contributed one thing to the larger body of knowledge about baseball. Second - my damned stats sure as hell can measure your baserunning! And so that's what we're going to do here.
I should note that what I said above still holds true - I rely greatly upon the work and ideas of other, better people. I am not a sabermetrician, just someone that writes about sabermetrics. So it behooves me to say that I'm standing on the shoulders of giants here.
First of all, none of this would be possible without a database of play-by-play data. Therefore, the required boilerplate text:
"The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711."
Truly, Retrosheet is gold-plated awesome.
Second, many acknowledgements to Dan Fox; you can read more about his (far superior) EqBRR stats on his blog or at Baseball Prospectus. Also of great help was Lee Panas of Tiger Tales; I highly recommend his excellent series on baserunning metrics. And in case you haven't notice already, I rely very heavily on the research of Tom Tango.
Parsing the Retrosheet logs, I took a look at twelve different common baserunning events. Only the lead runner was considered; it's not fair to judge a runner based on how well the guy on base in front of him is running. (Dusty Baker infamously referred to this as "clogging the bases.")
|Event Code||Normal outcome||XB|
|1B_2B||Runner on first advances to second on a single||Runner on first advances to third on a single|
|2B_3B||Runner on first advances to third on a double||Runner on first advances to home on a double|
|1B_GB||Runner on first stays on first after a groundout||Runner on first advances to second after a groundout|
|1B_FB||Runner on first stays on first after a flyout||Runner on first advances to second after a flyout|
|1B_NOTINPLAY||Runner on first stays on first on a ball not in play||Runner on first advances to second on a ball not in play|
|2B_3B||Runner on second advances to third on a single||Runner on second advances to home on a single|
|2B_GB||Runner on second stays on second after a groundout||Runner on second advances to third after a groundout|
|2B_FB||Runner on second stays on second after a flyout||Runner on second advances to third after a flyout|
|2B_NOTINPLAY||Runner on second stays on second on a ball not in play||Runner on second advances to third on a ball not in play|
|3B_GB||Runner on third stays on third after a groundout||Runner on third advances to home after a groundout|
|3B_FB||Runner on third stays on third after a flyout||Runner on third advances to home after a flyout|
|3B_NOTINPLAY||Runner on third stays on third on a ball not in play||Runner on third advances to home on a ball not in play|
Every time one of those events occurred, several things got recorded into a table. First, I noted one of three outcomes: the "normal" outcome, the XB, or "extra base," outcome, or the runner being thrown out on the play. I also noted how many outs there were in the inning - the correct decision on whether or not to run is dependant upon the number of outs in the inning, and heads-up baserunners will change their baserunning strategy accordingly. All of those things are then assigned to a baserunner and totaled up. [More accurately, there's a set of SQL queries and an Excel spreadsheet that does most of the work.] Balls not in play includes stolen bases, wild pitches and passed balls.
Every play is then assigned a run value based upon its run expectancy. We'll use the scenario Tango uses to explain this with. let's say you have a runner on first base with one out. According to Tango's run expectancy chart, an average of .573 runs scores in an inning in that situation. Suppose that the next runner hits a single. (That would be an event code 1B_2B in my system.)
If the runner on first advances to second, then the run expectancy increases to .971. A better baserunner will, depending on the situation, advance to third instead; the run expectancy with runners at the corners and one out is 1.243. Sometimes, however, a baserunner will get thrown out advancing to third; with a runner at first and two outs, the run expectancy drops to .251.
[Note that we assume that the trail runner stays on first in either the extra base or thrown out scenarios; that's a convenient abstraction, one that introduces a small amount of inaccuracy in exchange for avoiding a large computational headache.]
So, using the run expectancy formula, we assign run values to all of our event outcomes, based on the number of outs in the inning. For each runner, we multiply his outcomes by their run expectancy values, and sum up. Then, for a final step, we take a look at how the average baserunner would have performed given the same opportunities, and subtract that number from the sum. That gives us our +/- rating of runs above/below average.
Since I'm a miserable computer programmer, all of my SQL queries take about forever to run, so I limited myself to the years 2004-2007. Fair warning: single season performances are subject to sample size issues, especially with things like baserunning.
Okay, let's take a look at the top 10 best and worst baserunning seasons:
|mattg002||Gary Matthews, Jr.||2006||282||-9.21264|
Makes sense, yes? Speedy middle infielders and outfielders run better than slow corner infielders and outfielders. Lee is a surprising entry on our trailers; he has a reputation for being a very good basestealer for a first baseman. But on the whole it doesn't seem very controversial.
What we also get here are the run values of being a good or poor baserunner. The difference between the best and worst over this four year period was about 23 runs, or roughly two wins. Now two wins is nothing to scoff at, but it pales in comparison to defense or offense; nobody would look at this chart and say, "Gee, I think that the Sox should try and see if they can trade David Ortiz for Aaron Miles."
And, since this is ostensibly a Cubs blog - your 2007 Chicago Cubs!
This will come as a surprise to nobody who watched the Cubs last season, but good Lord the Cubs sucked at running the bases. Daryle Ward was saved from himself on the basepaths by aggressive pinch-running by Lou Piniella; the same couldn't be said for Lee, Ramirez and Barrett. Oh, and that number for Soto in just 17 opportunities? Ah, catchers and their appreciable lack of knees.
The numbers on Soriano and Lee surprise me. Soriano was below average in 2006 as well (possibly a byproduct of an excessive amount of caught stealings that occurred chasing the mythical 40/40), but was very solid the previous two seasons. Lee, on the other hand, has been a below-average baserunner the past four years.
Oh, and I just have to add - Ryan Theriot is a below-average baserunner. That is all.
The next step is to work out a Marcels-like projection system using this data, so that I can incorporate it into my WAR depth chart.