The Other Fifteen

Eighty-five percent of the f---in' world is working. The other fifteen come out here.

How important is the narrative?

If you think back to your high school lit classes, you may dimly recall something called the narrative arc:

  1. Exposition
  2. Complication
  3. Climax
  4. Resolution

It might seem more familiar if I go ahead and diagram it out for you:


That's why it's called the narrative arc - you have your rising action and your falling action. It's not quite a bell curve - the slope on the left is more gradual than the slope on the right.

This has been around pretty much since Aristotle; it's an abstraction but a useful one nonetheless.

Let's apply this to Casablanca real quick, just to make sure we understand how it's supposed to work. Exposition? Explain the political situation in France and its territories, the existence of Rick's bar (and Rick), and then set up the bit about the letters of transit. Complication? The old love interest reappears in his life, looking for those letters of transit. Climax? He puts her on the plane. Resolution? "Louis, I think this is the beginning of a beautiful friendship."

Yep, we've got it.

Well, baseball is an endless source of narrative arcs. An individual plate appearance can be looked at as a narrative arc, the battle of pitcher and hitter; if the hitter starts fouling off pitches and really working the count, a single at-bat can take on an epic quality, especially if the game situation adds a bit of drama to the outcome.

You can find narrative arcs in entire seasons - the 2007 Mets "choke job" is the first that comes to mind - or even the entire history of a franchise - 100 years of futility, anyone?

But by far the most common narrative arc of baseball is a single game, and its most frequent chronicler is the beat writer. Let's take a look at today's (or yesterday's, I guess) Cubs/Brewers tilt, first from the Chicago Tribune:

When Kosuke Fukudome hit a three-run homer off Milwaukee closer Eric Gagne to tie the season opener in the bottom of the ninth inning Monday, fans all over Wrigley Field held up professionally made signs with English words on one side and Japanese on the other.

It was meant to be a two-sided version of the phrase "It's Gonna Happen." But something got lost in translation, and the Japanese side read: "It's An Accident."
Fukudome's heroics were no accident, but they wound up going to waste when Tony Gwynn Jr. hit a sacrifice fly off Bob Howry in the 10th to lift the Brewers to a 4-3 victory on a long, soggy afternoon.

"It was a good ballgame," manager Lou Piniella said. "It was well-played, tough conditions. But somebody had to win, somebody had to lose, and they won the ballgame."

Fukudome went 3-for-3 with a walk. He doubled to the center-field wall on the first major-league pitch he saw and earned salaams from fans in the right-field bleachers for his stunning debut.

But the day was a total downer for Carlos Zambrano, who remains winless in four Opening Day starts and left in the seventh inning with forearm cramps. And for Kerry Wood, who allowed three runs in the ninth in his debut as the Cubs' closer.

It's the narrative arc of a tragedy; Fukudome carries the team to a great height, only to witness a greater fall.

Let's take a look at things from the other side, in the Milwaukee Journal-Sentinel:

When Tony Gwynn Jr. arrived in spring training more than six weeks ago, he set an ambitious goal for himself.

Upon learning he'd be in the Milwaukee Brewers' starting lineup at the start of the season, Gwynn set another lofty goal.

"Goal No. 1 was to be in the opening day lineup," said Gwynn, who earned the right to start most days in center field while Mike Cameron is serving a 25-game suspension.

"Goal No. 2 is to be productive with the time I get, while 'Cam's' out."

The 25-year-old son of a Hall of Famer took a nice first step in that direction Monday in helping the Brewers pull out 4-3, rain-interrupted victory in 10 innings over the Chicago Cubs in the season opener at Wrigley Field.

It's more of a melodrama: largely unsung bench player makes the most of his opportunities, becoming the surprise hero of the Brewer's great victory over evil.

In baseball, there are other ways of representing the narrative arc rather than simply writing them out. Scorecards and other play-by-play accounts are becoming more common, but newspapers gave us the most iconic representations of baseball as numbers and codes: the linescore and boxscore.

Now, when we discuss baseball statistics, what we're really talking about is an aggregation of these records; we take the events and collect them. That's true about the old-school triple crown stats - batting average, home runs and runs batted in - or the "new age" VORP and Win Shares. Baseball statistics are, at their heart, simply a summary of what occurred on the field of play.

But when we collect statistics in that way, we tend to do so in a way that dissociates them from their narrative context. A player's VORP counts hits against a hated division rival in the midst of a tight race exactly the same as hits against a 30-year-old journeyman pitcher playing out the string for a basement dweller in September. Baseball statistics don't seem to have any understanding of the fact that Yankees players are attractive, handsome stars, and that Kansas City Royals players... aren't. There isn't a baseball stat that measure how athletic and, really, balletic Derek Jeter looks when he does that mid-air throw or dives for a ground ball.

And yet all of those things are important, if not essential, in forming a narrative of a baseball season. All of those things add a sense of excitement and drama to baseball. And they're the things that first attracted most of us to baseball - yes, even the Dread Sabermetricians.

If you really want to drop some chum into the water, simply mention the word "clutch" and watch the feeding frenzy begin. I've been up to my eyeballs in clutch recently - all the way from NSBB to BTF to Tango's site. I'm pretty sure that the sacking of Carthage occurred over less than what happened in the Baseball Think Factory thread.

It's an argument that I don't think will ever fully be resolved, and I'm starting to think it's because of a very narrative point of view on the part of people who are the most ardent supporters of the idea that clutch hitting has to exist.

Because in the narrative, clutch hitting does have to exist. If you're trying to boil a ballgame down to a simple narrative - and it has to be a simple narrative; newspaper stories about baseball games aren't particularly long or in-depth, and highlight reels are even less detailed - then you are boiling the game down to a handful of plays, and thus making those plays stand in for the entire game. You create heroes and villains, successes and failures, tragedies and comedies.

In the narrative, you have to have clutch. Have to.

And you need the narrative - there's a reason that you find more recreational baseball analysis than you do recreational analysis of the derivatives market. But in the end, the narrative is an intensely personal thing - Cubs fans and Brewers fans hardly share the same narrative about tonight's game, even if they witnessed the exact same events. The narrative can't hold a larger meaning, because the narrative is not an objective fact; the narrative is a subjective expression of an individual or collective - but NOT universal - experience.

The narrative is also the product of two conveniences - if I was being uncharitable I would call them lies, and I'm open to being uncharitable if it will help explain the point.

Let's take a look at today's ballgames according to Fangraphs. What we're looking at are graphs of Win Probability, the closest thing sabermetrics has to a truly narrative statistic. What you'll note is that none of those graphs seem to match the shape of our narrative arc graph from earlier - for one, all of them are much more jagged. The narrative arc is neat, tidy - a real baseball game is a much more varied experience than a simple arc; the phrase "roller coaster" is cliched, but there really is no better explanation.

But the narrative conveniences of the ways we experience the game - whether it's the AP recap in the morning paper, or the highlight real on Sportscenter - don't have room for all of those nuances, all of the twists and turns and red herrings that actually happen. On the other hand, those conveniences exist because most of us simply don't have time to watch each and every ballgame; they're shortcuts and conveniences, standins for the reality underneath.

So the first lie is that you can sum up a ballgame in four or five plays. Did the Cubs lose because Bob Howry allowed one run in the tenth inning, or because for eight innings the Cubs didn't score any runs at all? Howry's charged with the loss in the boxscore, but I suspect a lot more of the "blame" should be shouldered by the lack of production pretty much all up and down the lineup.

The other lie is the emphasis on driving in runs. You will sometimes see token acknowledgement of Runs as the counterpart to Runs Batted In - especially for leadoff hitters with added entertainment value as stolen base threats - but you'll rarely, if ever, see an MVP trophy awarded for it.

Absent the thrill of the stolen base, setting the table is a lot less exciting narratively than clearing it - an RBI single is a lot more fun to watch than the walk that preceeded it. But it's not any more valuable to the team; one does not exist without the other. And yet fame, money and awards seem far more drawn to players who drive runs in rather than players who score them.

I don't want to come off as underrating the importance of the narrative point of view - I am probably the least rational person possible when it comes to actually watching a baseball game. (I think I'm still traumatized by the memory of Roberto Novoa, just to name an instance.) But it's foolish and dangerous to use it as the ONLY context through which you view baseball.

This isn't a cry for fusion, or balance, or peaceful coexistence. The world wouldn't be a better place if newspaper articles all read "Today the Cubs and Brewers recorded 27 outs apiece in a contest at Wrigley Field, which revealed almost nothing about the two teams due to the small sample size involved." Nor would the world be a better place if VORP started including Steely-Eyed Resolve as one of its components.

What I am asking for is a simple truce: believers in clutch, I as a student of sabermetrics will stop telling you that clutch doesn't exist, or is insignificant, or what have you, if you will stop insisting that its existence in any way, shape or form has an impact on impartial evaluations of player performance. Do we have a deal?


This in no way undermines the dignity of the sport.

Watching Dodgers-Red Sox at the Coliseum (available on NESN if you get it; it's not blacked out, for you fellow satellite owners.)

Well, Tim Wakefield's personal catcher, Kevin Cash, hit a home run to put the Red Sox up by two. As I'm sure you know, the Coliseum is not exactly set up like a baseball stadium. Fence distance out to left is about 200 feet down the foul line. So I watched the replay a few times, did some measuring, and came up with this hit location:


Beautiful. Did I mention that Chan Ho Park is now pitching for the Dodgers? This is high comedy.

Labels: , , ,

2008 Cubs Opening Day Roster, by WAR, Part II

This is really rushed - suffice it to say that I don't think a lot of people will be pleased with what I'm going to call the Kerry Wood Issue, and I don't have a good answer for you right now. Otherwise, I like what I'm seeing with our pitching staff.


This is park adjusted. Thanks to Sam Larson for some tweaks to the relief pitcher calculations.

I'm really not married to any of these forecasts, and hope to publish a revised set of both charts this weekend. I just know that if I didn't publish something, I would forget to do so entirely.

Labels: , , ,

2008 Cubs Opening Day Roster, by WAR

Hitters only. Let's get straight to the bidness:


Someday I'll look into actually putting these charts together in a more interactive format, but Excel outputs such awful HTML, and I really don't feel like messing with it myself at this point. (EditGrid and Google Docs don't support the fancier formatting, either.)

If you're late to the party, I explain WAR in a previous post. WAR stands for Wins Above Replacement, and it measures a player's overall contribution to team wins. WAR_650 is a new stat that prorates WAR out to 650 plate appearances. Offensive production is now park adjusted as well. [An updated version of the WAR calculator should be ready for release here soon, incorporating these refinements.]

These are all based on projected stats. For offense and wOBA, I made a composite of:

Defense is based on Sean Smith's defensive projections. Baserunning projections are very crude, and involve a bit of Kentucky windage; they're based on my baserunning metrics.

Things to draw from the chart:

  • The Cubs look to have a very fine offense this season.
  • They look to have a very good defense again.
  • They're miserable at running the bases.
  • I love to color-code things.
  • Ryan Theriot sucks.

So, basically, nothing you didn't already know. I hope to get around to the pitchers by opening day, but no promises.

Labels: , , , ,

A look at baserunning

I'll go ahead and be honest; baserunning is something I don't pay a lot of attention to. If a guy can hit the ball and field the ball, that's usually good enough for me. I mean, usually a guy has to be on the bases to run them, right?

But something that sticks in my craw is when people tell me that baserunning is one of the things "your stats can't measure." First of all, they're not my stats - I don't know that I've contributed one thing to the larger body of knowledge about baseball. Second - my damned stats sure as hell can measure your baserunning! And so that's what we're going to do here.

I should note that what I said above still holds true - I rely greatly upon the work and ideas of other, better people. I am not a sabermetrician, just someone that writes about sabermetrics. So it behooves me to say that I'm standing on the shoulders of giants here.

First of all, none of this would be possible without a database of play-by-play data. Therefore, the required boilerplate text:

"The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711."

Truly, Retrosheet is gold-plated awesome.

Second, many acknowledgements to Dan Fox; you can read more about his (far superior) EqBRR stats on his blog or at Baseball Prospectus. Also of great help was Lee Panas of Tiger Tales; I highly recommend his excellent series on baserunning metrics. And in case you haven't notice already, I rely very heavily on the research of Tom Tango.

Parsing the Retrosheet logs, I took a look at twelve different common baserunning events. Only the lead runner was considered; it's not fair to judge a runner based on how well the guy on base in front of him is running. (Dusty Baker infamously referred to this as "clogging the bases.")

Event Code Normal outcome XB
1B_2B Runner on first advances to second on a single Runner on first advances to third on a single
2B_3B Runner on first advances to third on a double Runner on first advances to home on a double
1B_GB Runner on first stays on first after a groundout Runner on first advances to second after a groundout
1B_FB Runner on first stays on first after a flyout Runner on first advances to second after a flyout
1B_NOTINPLAY Runner on first stays on first  on a ball not in play Runner on first advances to second on a ball not in play
2B_3B Runner on second advances to third on a single Runner on second advances to home on a single
2B_GB Runner on second stays on second after a groundout Runner on second advances to third after a groundout
2B_FB Runner on second stays on second after a flyout Runner on second advances to third after a flyout
2B_NOTINPLAY Runner on second stays on second on a ball not in play Runner on second advances to third on a ball not in play
3B_GB Runner on third stays on third after a groundout Runner on third advances to home after a groundout
3B_FB Runner on third stays on third after a flyout Runner on third advances to home after a flyout
3B_NOTINPLAY Runner on third stays on third on a ball not in play Runner on third advances to home on a ball not in play

Every time one of those events occurred, several things got recorded into a table. First, I noted one of three outcomes: the "normal" outcome, the XB, or "extra base," outcome, or the runner being thrown out on the play. I also noted how many outs there were in the inning - the correct decision on whether or not to run is dependant upon the number of outs in the inning, and heads-up baserunners will change their baserunning strategy accordingly. All of those things are then assigned to a baserunner and totaled up. [More accurately, there's a set of SQL queries and an Excel spreadsheet that does most of the work.] Balls not in play includes stolen bases, wild pitches and passed balls.

Every play is then assigned a run value based upon its run expectancy. We'll use the scenario Tango uses to explain this with. let's say you have a runner on first base with one out. According to Tango's run expectancy chart, an average of .573 runs scores in an inning in that situation. Suppose that the next runner hits a single. (That would be an event code 1B_2B in my system.)

If the runner on first advances to second, then the run expectancy increases to .971. A better baserunner will, depending on the situation, advance to third instead; the run expectancy with runners at the corners and one out is 1.243. Sometimes, however, a baserunner will get thrown out advancing to third; with a runner at first and two outs, the run expectancy drops to .251.

[Note that we assume that the trail runner stays on first in either the extra base or thrown out scenarios; that's a convenient abstraction, one that introduces a small amount of inaccuracy in exchange for avoiding a large computational headache.]

So, using the run expectancy formula, we assign run values to all of our event outcomes, based on the number of outs in the inning. For each runner, we multiply his outcomes by their run expectancy values, and sum up. Then, for a final step, we take a look at how the average baserunner would have performed given the same opportunities, and subtract that number from the sum. That gives us our +/- rating of runs above/below average.

Since I'm a miserable computer programmer, all of my SQL queries take about forever to run, so I limited myself to the years 2004-2007. Fair warning: single season performances are subject to sample size issues, especially with things like baserunning.

Okay, let's take a look at the top 10 best and worst baserunning seasons:

Batter Name year Opp. +/-
furcr001 Rafael Furcal 2004 298 12.46229
barfj003 Josh Barfield 2006 246 11.80434
pierj002 Juan Pierre 2007 368 11.08171
carrj001 Jamey Carroll 2006 295 10.95289
rollj001 Jimmy Rollins 2005 364 10.93031
milea001 Aaron Miles 2004 281 10.5188
crawc002 Carl Crawford 2004 400 10.2553
rollj001 Jimmy Rollins 2004 321 9.949088
durhr001 Ray Durham 2004 218 9.554747
milea001 Aaron Miles 2007 192 9.416688
Batter Name year Opp. +/-
lawtm002 Matt Lawton 2005 245 -7.7626
lee-d002 Derrek Lee 2007 221 -7.92601
ortid001 David Ortiz 2005 157 -8.12203
konep001 Paul Konerko 2004 161 -8.21513
lowem001 Mike Lowell 2007 163 -8.24073
sosas001 Sammy Sosa 2004 141 -8.24097
sancf001 Freddy Sanchez 2005 170 -8.45961
mattg002 Gary Matthews, Jr. 2006 282 -9.21264
garkr001 Ryan Garko 2007 163 -10.2327
berkl001 Lance Berkman 2004 213 -10.3233

Makes sense, yes? Speedy middle infielders and outfielders run better than slow corner infielders and outfielders. Lee is a surprising entry on our trailers; he has a reputation for being a very good basestealer for a first baseman. But on the whole it doesn't seem very controversial.

What we also get here are the run values of being a good or poor baserunner. The difference between the best and worst over this four year period was about 23 runs, or roughly two wins. Now two wins is nothing to scoff at, but it pales in comparison to defense or offense; nobody would look at this chart and say, "Gee, I think that the Sox should try and see if they can trade David Ortiz for Aaron Miles."

And, since this is ostensibly a Cubs blog - your 2007 Chicago Cubs!

Name Batter Opp. +/-
Derrek Lee lee-d002 221 -7.92601
Michael Barrett barrm003 106 -7.43955
Mark DeRosa derom001 153 -1.8259
Aramis Ramirez ramia001 154 -7.01669
Alfonso Soriano soria001 206 -3.55499
Ryan Theriot therr001 219 -3.70184
Jacque Jones jonej003 149 -0.68092
Cliff Floyd floyc001 75 0.153609
Matt Murton murtm001 84 1.03073
Mike Fontenot fontm001 94 -0.82379
Cesar Izturis iztuc001 152 0.30366
Felix Pie pie-f001 71 1.143167
Jason Kendall kendj001 161 -1.61157
Angel Pagan pagaa001 74 2.719269
Daryle Ward wardd002 44 -0.61667
Koyie Hill hillk002 15 0.13925
Ronny Cedeno ceder002 11 -0.14754
Geovany Soto sotog001 17 -1.63394

This will come as a surprise to nobody who watched the Cubs last season, but good Lord the Cubs sucked at running the bases. Daryle Ward was saved from himself on the basepaths by aggressive pinch-running by Lou Piniella; the same couldn't be said for Lee, Ramirez and Barrett. Oh, and that number for Soto in just 17 opportunities? Ah, catchers and their appreciable lack of knees.

The numbers on Soriano and Lee surprise me. Soriano was below average in 2006 as well (possibly a byproduct of an excessive amount of caught stealings that occurred chasing the mythical 40/40), but was very solid the previous two seasons. Lee, on the other hand, has been a below-average baserunner the past four years.

Oh, and I just have to add - Ryan Theriot is a below-average baserunner. That is all.

Spreadsheet available for download.

The next step is to work out a Marcels-like projection system using this data, so that I can incorporate it into my WAR depth chart.

Labels: , , , ,

The sort of things that actual ballclubs don't do, but should consider

Okay, so there's talk of the Cubs acquiring Felipe Lopez. On the surface, this looks pretty dumb - if the Cubs were going to acquire a guy that plays shortstop, there were a lot of better options available earlier in the offseason. And Lopez provides more value than Theriot according to my new, park-adjusted WAR chart (which I'm still cleaning up, in addition to some other projects) but not to the extent where I thing it would show up in the standings -a tenth of a win isn't much to write home about.

But let's think outside of the box here. Lopez had a bad season last year at the plate, but even so still outhit Ryan Theriot after you adjust for park effect. Of the two, Lopez is a better bet to hit well. On the flipside, Ryan Theriot is the better defensive player. If you combined Lopez's hitting with Theriot's fielding, you might have a legitimate major league shortstop.

Well... can we do that?

Let's take a look at some of the Cubs pitching staff. For example, last season, 46.9 % of the balls put into play off Carlos Zambrano were ground balls. With Rich Hill, only 36.0 % of his balls in play were grounders. Here's the breakdown:

  • Zambrano, 46.9%
  • Lieber, 43.8 %
  • Marquis, 49.5 %
  • Dempster, 47.1 %
  • Lilly, 33.7 %
  • Hill, 36.0%

Do you see where I'm headed with this? Give Lopez as much time as possible playing behind Hill and Lilly, and give him plenty of time off when whichever of Lieber, Marquis or Dempster are on the mound, and have they pretty much split time with Z (whose high strikeout rate lets him overcome defensive inadequacies behind him). You might end up with a halfway to decent shortstop that way.

Labels: , , , ,

How do we define youth?

The reposting of my Ryan Theriot Sucks screed prompted this comment:

Cesar Izturis sucked. The Bush Administration sucks. The season finale of
Project Runway sucked.

Ryan Theriot is a solid, young player with speed, attitude and
perseverance.What's your beef with him?

I mean, I can see "Ryan Theriot should be a bench player" or whatever, but,
geez, lighten up, Francis.

"Cesar Izturis sucked. ... Ryan Theriot is a solid, young player with speed, attitude and perseverance."

Okay, so let's play a variant on that old carnival game, and have you guess: who is older, Cesar Izturis or Ryan Theriot?

The answer is Ryan Freaking Theriot.

So he's only older by about two months. But do not let his freakish, manchild apprearance fool you. He is not young for a baseball player. You look at a standard aging curve, and Theriot is right at peak. There is no youth to him - Ryan Theriot is what he is. Expecting development from him like he was some 22-24 year old rookie is wrong, wrong, wrong.

And once you start looking at the Ryan Theriot that is, not the Ryan Theriot that could be, it's pretty easy to figure out what there is to dislike about him. He's average defensively at shortstop, but hardly spectacular - and the aging curve for shortstop defense peaks earlier than the aging curve for offense, so it's pretty much downhill from here.

Once you look at his hitting, you see that he's pretty mediocre as a hitter. How mediocre am I talking here? The guy basically hits like Omar Vizquel - Vizquel's OPS+ at age 27 was 70, Theriot's OPS+ at that age was 72. It's hard to justify that sort of offensive production unless you field like Vizquel, and Theriot certainly doesn't. I am not exaggerating when I say that Ryan Theriot is one of the worst full-time players in baseball. What makes this so absurd is that Theriot didn't get the job because of an injury; he was handed a full-time job simply based upon who he is as a player. It's utterly baffling.

At this point, I could care less about his speed, attitude, perserverence or his ability to sell T-shirts; I cannot be convinced any of that outweights his substantial liabilities as a ballplayer. As a bench player I suspect he's fine (or at least not totally objectionable) but that's not what he is at this point.

I feel like I spend way too much time on this topic, but it's a horse that just won't die, so I guess I'll have to just keep beating it until it does.

Labels: , ,

Serving leftovers

It's late, and I'm really uninspired right now - so it's a good opportunity to reheat some old content and serve it.

Bleed Cubbie Blue, the Cubs site where I got my start writing about the team, has recently switched to a new hosting platform. The upshot - at least for this post - is that all of the "diaries" I posted over there are now available as a blog.

I'll go ahead now and highlight some key posts - ones that I don't entirely grimace to read later.

The Official Ryan Theriot Sucks Diary - "But it is to say that the Cubs would be reckless and stupid to go into 2008 with nothing but Ryan Theriot and a bunch of prayers at shortstop. He's just not very good." As true today as it was then! (Sadly.)

2008 Offseason Guide - Looking over it, I think I did pretty well - Ohman and Jones got traded, Fukudome got signed. I was wrong about pitching - the Cubs got Lieber, and actually it wasn't a bad deal. The market didn't go quite as crazy as I expected.

...yeah, not a lot of output there. In good news, the huge archive of morning headlines should help me whenever I think "Hey, I remember an article that said..."

Labels: ,

That's some reason


Hendry, see, has to produce a regular after all this blather about a farm system. Pie has gotten the most hype, although catcher Geovany Soto looks to be most ready and Ryan Theriot seems to earn Hendry no credit for some reason.

Um... because Ryan Theriot is bad at baseball.

You're welcome.

Another look at Alfonso Soriano's splits

The other day I took a look at Soriano's splits with men on base and with the bases empty. If you want to follow along with this, it's advised that you read that piece first.

You back? Good.

This is seven years of data, 2000-2007. (1999 isn't available from Retrosheet, and so I decided that made a reasonable cutoff point for the time being.) First, let's take a look at the average over that timespan:

Overall: .301 BABIP, 14.9% LD

Men On: .306 BABIP, 14.7% LD

Bases Empty: .298 BABIP, 15.1% LD

Line drive rate and BABIP seem to be very consistent between the two states - there's a small variance, but nothing extraordinary.

Alfonso Soriano:

Overall: .317 BABIP, 16.8% LD

Men On: .307 BABIP, 16.0% LD

Bases Empty: .322 BABIP, 17.2% LD

The line drive rate swings seem a bit more pronounced than the average, but the shape of the distribution is about the same. The BABIP shift, while tracking the LD% pretty well, is counter to the league norms. In fact, with men on base Soriano's BABIP is essentially league average, despite a much better LD% and (presumably) better speed than the average hitter.

It seems like Soriano has been very unlucky with runners ahead of him during his career. I don't see any reason to think that BABIP differential is sustainable other than the fact that he's sustained it this long; I can't say for certain but I'm pretty confident it's not an issue for him to hit with runners on base ahead of him.

Which is why it's really too bad that Lou is committing to bat him behind the team's worst hitter.

UPDATE: I plan on seeing what other little nuggets the data contains at some later point; that said, I have a lot of plans for different things to look into at a later date, and time is a finite resource. In the meantime, my toys are your toys. The file is too large for EditGrid, so it's provided as an Excel spreadsheet. Enjoy.

Labels: , ,

Legging out grounders

First off, a caveat: I have no idea what exactly this data means. I don't know that it means anything, per se. But I think it's interesting nonetheless, and in lieu of sitting on it until I can make something of it, I'm going to share it as is - if any of you think you know what it means, please feel free to let me know.

Basically, I figured out how often a player got a single, double or triple off of a ground ball. And then I did H/GB - how many times a player got a hit off a ground ball. Full results available on EditGrid. 2007 only.

First off, a frame of reference - the league average batting average on ground balls is .243. So, now, some leaders and trailers.

Top five, minimum 100 plate appearances:

  1. Matt Kemp, .442
  2. Jeff Salazar, .407
  3. Willy Taveras, .398
  4. Ichiro Suzuki, .377
  5. Omar Infante, .375

Bottom five, minimum 100 plate appearances:

  1. Jason Phillips, .083
  2. Ramon Martinez, .094
  3. Koyie Hill, .100
  4. Chad Tracy, .101
  5. Barry Bonds, .123

The first list is dominated by speedy outfielders and middle infielders; the second by catchers and corner infielders I'm sure this surprises nobody.

Worst non-Koyie Hill Cub? Daryle Ward - shocking, I know - hitting only .171 on grounders. After that is... Felix Pie? That surprises the hell out of me, for one. Pie hit 68 ground balls, and only got hits on 12 of them. Seriously doubt that's sustainable.

Best Cub? Cliff Floyd at .327 - again, surprising the hell out of me. Next most? How about Matt Murton, at .295.

And, since I know everyone is wondering - Theriot hit .249 on ground balls. Aramis Ramirez hit .234 on ground balls last season, if you're interesting in comparing that sort of thing. Soriano hit .279.

If I go ahead and expand my net a bit, we see Geovanny Soto and his absurd .471 batting average on grounders. Probably not sustainable.

I want to just go ahead and remind everyone that this doesn't mean anything - at least, not without a lot more context than I've provided you with so far. (Any statistic that has Barry Bonds and Koyie Hill that close to each other on a leaderboard needs to be taken with a huge grain of salt.)

Simply glancing at the list, it seems to be roughly correlated with speed, which makes intuitive sense. (To actually figure out what the correlation is, we'd need a measure of speed - and nobody has published a list of 40-yard-dash times with Retrosheet IDs, sadly. I know about Speed Score, but quite frankly just looking at the description of how to calculate it hurts my brain.) I'm also pretty certain that there's a lot of noise in that list.

To be honest, I learned more about SQL than I did about baseball doing this - which makes sense, as this was mostly a learning exercise to get more comfortable working with Retrosheet data.

From that perspective, it was largely a success - I learned how to automate a lot of things I've been stupidly doing by hand, which is great. I am starting to run up against limitations in my current setup, however - I only have a handful of fields from one year of Retrosheet data. Tomorrow I'm going to play around with a nifty script I've found to hopefully give myself a better sandbox to play with.


A better FIP

This seems pretty cool. Maybe even a better DIPS, since it uses batted ball data. Don't have much time to look into it right now, sadly, but I hope to come back to it later.


A few links for your reading pleasure

Just how important is blocking pitches in the dirt? Some very cool work by Dan Turkenkopf. Some discussion over at Tango's blog.

Pitch type leaderboards at Fangraphs. Through 2002. If you're like me, you're sitting there going, "Do extreme curveball pitchers have a notable effect on their BABIP?" Add that to my growing list of projects.

Micah Hoffpaiur is probably this year's Jake Fox - a guy who makes a splash in spring training despite the fact that he doesn't hit well enough for the corners, and can't play anywhere but the corners. Makes you wonder what we have Daryle Ward for, I guess.

Also, Fukudome doesn't mind playing center:
"If he tells me to play center field, I'll play center field. I have played center field a full season in Japan, so I'm very comfortable. I have played more in right field, but I can play either position. It shouldn't be a problem."
Hmm. New corner outfielder signing saying he doesn't mind playing center. Where have I heard that before?

Labels: ,

Does Soriano hit better with the bases empty?

This came up for discussion on Tango's blog a while back, and it's been sort of sticking in the back of my mind.

Here's Tango:

His wOBA are: .379 with bases empty and .344 with men on base.  IIRC, the difference for the average player is a 5 point drop or so.  I’m sure someone can correct me.  But, he’s got a 35 point difference here (based on almost 3000 PA with bases empty and 2000 with men on base).  One standard deviation is roughly a 15 point difference, so we see here a difference of around 2 standard deviations. 

While that doesn’t necessarily mean that Soriano definitely prefers to bat with bases empty, it points very strongly toward that.

MGL's reply:

That is dead wrong. Sort of.  We don’t really care how many SD’s someone is off. By the time someone points something out to us, OF COURSE it is going to be unusual.  That is the classic example of selective sampling or cherry picking.  Take any distribution that is completely random (no skill whatsoever).  Within that distribution, 5% will be off the mean by more than 2 SD’s.  Well, someone can and will point those 5% (or 1%, or .1% if the sample is large enough) players out, and say, “Well, these guys are way off the mean - something must be going on!”

Until we figure out how much, if any, skill, there is, those SD’s mean NOTHING (because they are cherry picked, as is Soriano’s)!

If there is little or no skill for players batting with runners and and without, which I suspect is the case, as with most ‘splits’, then the # of SD from the mean means nothing, since everyone gets regressed 100% (or near 100%).  Even if there is a little skill, most of that 2 SD will get regressed toward the mean.

I didn't really have anything to add to the discussion at that point, but it stuck with me. I've never been fully convinced by the argument that Soriano doesn't perform well in RBI situations based upon some true talent level, but I've never really had an argument against the idea, and the benefit derived from moving Soriano down in the lineup didn't seem worth the risk.

Then Dave Pinto chimed in:

Since 2000, which represents all but a few PA of his career, Soriano is ever so slightly worse with a man on first than with the bases empty. So the idea that a man on first bothers him doesn't really hold water. However, I believe most batters do better with a man on first. In the National League in 2007, a man on first added sixteen points to a player's batting average, twelve points to a player's on base average and eighteen points to his slugging percentage.

Pretty much the same split data; Tango used the Retrosheet data on Baseball Reference, and I think Pinto uses BIS data, but other than that it's pretty much all the same.

Then Chuck gets in on the act:

This suggests that something else is up with Soriano. It's not sample size that is the problem as Soriano has 1,676 at bats with runners on base. What it suggests is that his concentration is bothered when men are on base.

In other words, if the situation isn't all about him, he's not the same player.

Statistical evidence of selfishness.

Statistical evidence of selfishness. Really?

Or, put another way - what do Soriano's splits really tell us?

That's the same data Pinto used, but with some additional stats added. They are Walks per PA, Strikeouts per PA, Isolated Slugging (Slugging minus batting average) and Batting Average on Balls in Play (H-HR)/(AB-K-HR+SF).

So what do we see here? Soriano's strikeouts go up a bit, his walks increase by more than a bit, and his power numbers go down by a bit. It's the walk rate that interests me - if he's walking more, then he should be getting on base more. But he's not, because his batting average drops.

Batting average is subject to a lot more randomness than a lot of other things we look at in baseball. That's because, when a player walks or strikes out, he's only interacting with the pitcher; when he gets a base hit, he's interacting with the defense as well, and that introduces a lot of variables.

It's the drop in BABIP that interests me the most; BABIP is subject to even more randomness than batting average; we remove strikeouts and home runs, the two components to batting average a player has the most control over. So what's causing the big change in BABIP for Soriano?

For that, batted ball data is the next place I looked. The following is Retrosheet data, 2007 only:

There's a lot going on here - we've added a few new stat categories, for one. FB, GB, LD and Pop are flyball, groundball, line drive and popup; they're followed by percentages.

The first thing is that these numbers seem to take on about the same shape as Soriano's career number - small spike in K rate, larger spike in walk rate, big change in BABIP.

Soriano's line drive percentage remains very consistent, which is very informative; line drive rate is probably the best predictor of future BABIP that we have. In fact, using our expected BABIP formula (.120 plus LD%) it looks like Soriano's change in BABIP rates between the splits is nothing but a fluke.

But Soriano is also a more extreme fly ball hitter when men are on base in front of him - can we use that data? Dave Studeman has a somewhat more technical expected BABIP formula which uses K rate and FB%, which I'll call xBABIP to differentiate between it and the simpler formula. Again, nothing to suggest that such a radical difference in BABIP is attributable to anything Soriano is doing.

My feeling right now - call it a hunch - is that we're seeing two things going on here at once. Pitchers will try to pitch around Soriano when men are on base in front of him, driving up his walk rates and giving him fewer good pitches to hit. At the same time, we have an entirely random variance in BABIP that exaggerates the impact of the former tendency.

I have a ways to go before I can validate that hunch, though. Next steps:

  1. Get the rest of Soriano's career data from the Retrosheet event files, and see if that makes a difference.
  2. Take a look at the major league average in those figures, to see how Soriano compares.
  3. Take a look and see if players like Soriano - high slugging, low OBP - tend to see their walk rates go up with men on base ahead of them.
  4. Take a look at pitch-by-pitch data and see if Soriano sees more balls and fewer strikes with men on base.

That's a tall order, admittedly. And my SQL-fu isn't what I'd like it to be, so expect the going to be slow at best.

[An aside: I took a look at Soriano’s top comps from his 2007 PECOTA (I don’t have 2008), and three of them exhibited the same tendency with their walk rates in their careers. (Andre Dawson, Jeff Kent, and Leon Wagner - Paul Blair didn’t, but he looks like a real odd comp to me.) Consider it anecdotal evidence for #3, if you'd like.]

Labels: , ,

This is what I think of your Brian Roberts rumors

funny pictures

Yep, that about says it.

Labels: , , , ,

Pouring lead

At the risk of sounding like a broken record, spring training statistics and results are meaningless. People mostly use them to justify opinions they already held, so far as I can tell.

And like I said yesterday, Lou wouldn't be looking at this crazy new lineup of his unless he saw something in it already. And, sure enough, the early returns have come up rosy:

That lineup experiment with Ryan Theriot batting leadoff and Alfonso Soriano batting second? It just may have passed the experimental stage after Cubs manager Lou Piniella used what might be his Opening Day lineup Thursday.

"We're going to stay in that configuration for a bit and take a good look at it," Piniella said.

And he later drove home the point: "The weather's cold in Chicago when we play the first six weeks or so of the season. And to get [Soriano] to the point where he might have to run [batting first], it's just taking chances."

Which... whatever. He got to see the game today and I was watching Gameday, but I really don't know what convinced him that today's lineup was working. Okay, I do know: he liked the idea of the lineup going into the game, and getting the W didn't hurt the idea. It's like pouring lead or reading tea leaves; you get out of it largely what you put into it. Like I said, spring training results are basically meaningless, so if Lou decides to start acting like some mystic when it comes to justifying his decisions - whatever, they're the decisions he wanted to make anyway, so I don't know what would stop him exactly.

[An aside - Lou has convinced himself that it was a lack of small-ball ability that caused the team's struggles early on last season. I'm more convinced that it was a small sample size effect that worked itself out as the season progressed. Lou seems less likely than I to accept that there are things out of Lou Piniella's control, which I suppose I understand but I am growing more and more frustrated with overall.]

Wittenmyer chimes in, calling this a prelude to a Roberts deal. I'm growing rather calloused to Roberts trade talks; I refuse to believe in any such thing until Roberts is actually traded, if that ever comes to pass.

UPDATE: Bruce Miles is my frakkin' hero. Just... go read it all.

Labels: , , , ,

Putting a price on stupidity

Lou is offering us a sneak peak at what it's going to feel like when he's acting like an idiot during the regular season:

Piniella said he plans to move Alfonso Soriano out of the leadoff spot and bat him second, with Ryan Theriot leading off. If it looks good in camp, that could be the order he settles on for the regular season, he said before today’s game against the Texas Rangers.
Which, whatever. I am Jack's raging spleen, but there's absolutely nothing I can do about it. So, instead of ranting about it (I fully plan to later, when there's time), let's step back and take a cold, calculated, rational look at the matter with a lineup simulator. I'm using the average forecasts of several systems, helpfully supplied by Harry Pavlidis. And I'm presuming the lineup is:

  1. Theriot
  2. Soriano
  3. Lee
  4. Ramirez
  5. Fukudome
  6. DeRosa
  7. Soto
  8. Pie
  9. Pitcher

Maybe I should give Lou the benefit of the doubt and move Soto up and DeRosa down, but at this point Lou can get the benefit of the doubt back when he does something to earn it, as far as I'm concerned.

Lou's brilliantly conceived lineup scores 4.953 runs per game; the optimal lineup scores 5.224 runs per game. Let's look at how many runs that will cost the team over 138 games - basically, 85% of the season. We'll leave some room for days off in our analysis.

The Lou Lineup scores roughly 684 runs; the computer-gerenated lineup scores roughly 721 runs, for a difference of about 36 runs or about 3.5 wins.

3.5 wins.

Can I go back to the whole "rage" thing again?

UPDATE: I've run some more lineup numbers and go over them at BCB. Consider it "Colin in syndication."

Upon further reflection, once you have Ryan Theriot in your lineup, arguing over where to bat his is probably a question of what caliber bullet you want to shoot yourself in the foot with. [I'm cribbing from myself here.] So a lot of this is just a few month's frustrations with Lou and the team's overall philosophy coming to a head.

Lou is obsessed with getting some speed at the top of the order. I just want to get some people at the top of the order who will get on base in front of our big bats. It's starting to look like there are irreconcilable differences here.

Labels: , , , ,

Baseball sucks because of stats geeks no one likes

At least it's funnier than most of what makes it to SNL these days. (Hat tip: Yankees Chick)

NL Central Projected Standings Chart

Like the title says. Results from here, here and here.

  ↓ Vegas  ↓ CHONE  ↓ CAIRO  ↓ PECOTA  ↓ Average  ↓
Cubs 87.5 87 89.2 91 88.7
Brewers 84.5 84 85.6 88 85.5
Reds 79.5 78 78.9 79 78.9
Cardinals 78.5 75 77.2 72 75.7
Astros 72.5 75 77.6 74 74.8
Pirates 68.5 75 68.2 70 70.4

Currently it's sorted by my own inclinations as to how the division will stack up, which agrees pretty well with the average forecast. If I've done my work right, it should be sortable. Just click on the column header to sort by the various forecasting systems.

Labels: , ,

Lou is acting stupid

Again. His latest comments:

"I'm getting a little concerned about a couple of our guys," Piniella said. "Look, we don't have much longer. I'm going to have to start getting them in there a little more often. Forget these days off."

I haven't been one of those fans that thinks that every decision Lou makes is right, because Lou is the one making it - but I have been pretty pleased with him as a manager as a whole so far. I'm starting to question that line of thinking.

Greg Couch's column goes downhill after five words.

It starts off well:

Spring training doesn’t predict anything.

Everything after that is pretty much an utter vortex of suck. Mostly because he ignores those five words, and goes on to talk about how he isn't convinced about Rich Hill.

The best part:

Rich Hill could be a problem for the Cubs this year, not because he’s bad but because he’s being counted on as a legit No. 3 starter. Are we sure he’s that good?

The Cubs have left themselves one front-end-of-the-rotation starter short. They’re counting on Hill developing into that starter, and he might. But at this point, why has he been guaranteed a spot in the rotation at all?

Ohboy. "[W]hy has he been guaranteed a spot in the rotation at all?" Because he was one of the top-ten strikeout pitchers in the National League last season!

I keep telling myself that I shouldn't let myself get so excited about these things; after all, these guys have to write something about spring training, and I'm sure that you don't sell a lot of newspapers by referring to everything as "meaningless" for over a month straight. But sometimes I can't help it.

Labels: , , , , ,

If you have to say it, it's not true

There are some phrases in the English language that are automatically lies; the only occasions one which you could truthfully utter them are occasions when nobody would bother to say otherwise. They include:

  • There's absolutely nothing to worry about.
  • I didn't say anything.
  • My little Johnny would never do that!
  • No, really, it's not important.
  • Everything is under control.

I'll confess to not following the Bulls much, but as soon as I saw the headline "Bulls coach Boylan: 'I'm in control'", I knew that Jim Boylan had lost the team. Let's look at the full quote:

"It doesn't really matter what other people say," Boylan said. "I just know that what I do in the locker room and with this team, I'm in control. I run this team. We have guys with strong character. We've had a couple of situations where people used poor judgment. I don't think it reflects entirely on the team.

"We have some guys who are very serious about their craft. Some of the other guys need to emulate what they do. If we do that, we'll be fine. If not, there will be other consequences."

Loose translation:

They're coming through, repeat, they're coming through. This is Earth Alliance Station Babylon 5. They're all over the place! They're killing us. Can anyone hear me? They're killing us!

So, yeah, the situation is desperate. John Paxon has sent Jim Boylan off to die, and to die alone. It's basically like Gunga Din without the uplifting bits. Only the continuing saga of the Knicks keeps the Bulls from being the laughingstock of all basketball. Very sad.

Labels: ,

Mark Cuban is either an idiot, or he thinks you're one

Ok, so new media mogul/billionare egomaniac (and would-be owner of the Cubs) Mark Cuban has banned bloggers from the lockerroom at Mavericks games. There are all sorts of weird levels to this, and I'll leave it to others to analyze the intricacies of it. (Or you could just read Deadspin and get a purely emotional take on the matter. Sold!)

But there's three points I want to address here.

First, it absolutely amazes me the amount of control that big-time professional sports leagues want - and end up having - over media outlets. It's astounding, is what it is.

Let me backfill you on my point of reference here. I served in the Marine Corps for five years as a Public Affairs Specialist, which means I've done my fair share of media relations work. For things like the Iraq War, y'know, stuff like the 24/7 live coverage of the initial assault of Iraq, and the first Marine forces to enter Fallujah after the deaths of the Blackwater contractors - so allow me to just say that I understand logistical hurdles to facilitating media coverage.

But we were much more permissive when it came to allowing reporters latitude to report on operations in zone - even while embedded with our units - than organizations like the Mavericks are in allowing reporters to cover their sporting teams. I can damn well assure you that nobody's life has ever been endangered by the Dallas Morning News' coverage of the Dallas Mavericks, too.

It's pretty par for the course these days - teams aren't fond of criticisms, and like to be able to control their image. They also like to monetize their media coverage; witness the NFL Network and the forthcoming MLB network. That doesn't make it any less disappointing.

Second, even if Cuban's motives in this matter are purely what he claims they are - trying to level the playing field for bloggers, so that blogs attached to mainstream media outlets aren't given preferential treatment - is he so tonedeaf as to not see this line of criticism coming? Nobody in the Mavericks media relations department realized that this would be viewed in a negative light? If that's the case, then Cuban either needs to fire his media relations advisors or fire himself from handling media relations. This is basic stuff here, folks.

Third, either Mark Cuban is an idiot or he thinks you're one. Here's his justification for the change:

Should bloggers be allowed in the Mavs locker room ? Conceptually its not a big deal. A blogger, a beat writer, a columnists. The medium they use to deliver their content should be irrelevant. No question about it.


Right now we have a situation where a blogger that works for the Dallas Morning News would like continued access to the locker room. Prior to last week, I had no idea this person's primary job at the Morning News is to blog. I hadn't seen or read it. He was just one of the 4 or 5 people from the Morning News in the locker room post game. When it was brought to my attention I immediately made it an issue. Why ?

Not because I don't want this blogger in the locker room doing interviews. What I didn't like was that the Morning News was getting a competitive advantage simply because they were the Dallas Morning News. I am of the opinion that a blogger for one of the local newspapers is no better or worse than the blogger from the local high school, from the local huge Mavs fan, from an out of town blogger. I want to treat them all the same.

Unfortunately, there isn't enough room to allow any and all bloggers in the locker room. There also are no standards that I have been able to come up with that differentiate between bloggers to the point where I should or should not credential one versus the other. My experience in reading blogs has favored bloggers not affiliated with major media companies, but that could be my unique bias.


If he is correct and blogging is part of the base job of being a beat reporter, thats a sad commentary on beat reporters. They get 500 words in a story about a game or event, if readers are lucky. If there is excess time, I would imagine that time could be spent offering indepth analysis and access rather than throwing up hundred word commentary on a blog. If there isn't space in the paper, then in depth analysis that takes advantage of the minimal marginal cost of publishing feature stories, IMHO, would be a far better use of a beatwriters time and serve as a far stronger differentiation that would attract readers.


As far singling out mr MacMahon, I havent read what he has written, so that is not the case. its an issue of fairness. As a blogger, and someone very familiar with bloggers and the blogosphere, I recognize that a fair policy would apply to all bloggers. There is nothing superior about a blog produced bysomeone in the employ of The Belo Corporation. So there is no reason to give them preferential treatment. Where there is physical room to fairly credential any and all bloggers, Mr MacMahon is welcome. Where we can not accomodate all bloggers, he will be excluded.

Lemme go ahead and whittle that down for you a bit futher:

Right now we have a situation where a blogger that works for the Dallas Morning News would like continued access to the locker room. Prior to last week, I had no idea this person's primary job at the Morning News is to blog. I hadn't seen or read it. ... When it was brought to my attention I immediately made it an issue. ... My experience in reading blogs has favored bloggers not affiliated with major media companies, but that could be my unique bias. ... If he is correct and blogging is part of the base job of being a beat reporter, thats a sad commentary on beat reporters. They get 500 words in a story about a game or event, if readers are lucky. If there is excess time, I would imagine that time could be spent offering indepth analysis and access rather than throwing up hundred word commentary on a blog. ... As far singling out mr MacMahon, I havent read what he has written, so that is not the case.

Ok, so let me parse this for you:

  1. Mark Cuban's "experience in reading blogs" does not extend to reading what is probably one of the most widely read blogs covering his team.
  2. Mark Cuban is, in fact, so wildly out of touch with such things that he didn't even know that the DMN had hired a writer for this express purpose.
  3. Upon learning of that writer's existence, Mark Cuban banned him from the locker room, still without reading his blog.
  4. Without reading this man's body of work, Cuban is still entirely confident that the blog is part of a "sad commentary on beat reporters."
  5. He is also apparently confident that the blog does not offer indepth analysis and access, again without having read a word of it.

Contrast this to how Cuban thinks a startup company should handle media relations:

NEVER EVER EVER hire a PR firm. A PR firm will call or email people in the publications, shows and websites you already watch, listen to and read. Those people publish their emails. Whenever you consume any information related to your field, get the email of the person publishing it and send them an email introducing yourself and the company. Their job is to find new stuff. They will welcome hearing from the founder instead of some PR flack. Once you establish communications with that person, make yourself available to answer their questions about the industry and be a source for them. If you are smart, they will use you.

Doesn't that seem a thousand miles away from what Cuban is doing in this situation? Maybe Cuban's policies are well-intentioned and as fair-minded as he claims; I'm not a mindreader and so I can't tell you one way or another. I do know that he's handling this in a very ham-fisted and tin-eared fashion, and so he's not making it any easier for those who want to believe those things.

But I do know this: Cuban wants you to believe that he, the new-media titan, the billionaire blogger... nay, CHAMPION of blogging, is so out of touch with the blogging experiences of the dedicated Mavericks fans.

Either he's an idiot, or he thinks you are.

Labels: , ,

Tribune waffles a bit on the whole "math" thing

Rick Morrissey wants you to know that even if that math stuff may have worked in the past, he wants no part of it.

But they don't run the world, yet, which means we can still type in our credit card numbers online without worrying that all our money is being sucked into a fund earmarked for global dominance by a dastardly computer.

Computers have no use for heart, or least they can't quantify it. They can't analyze what's inside an athlete, for example. They can't tell you who has the heart of a lion or the backbone of an earthworm.

Computers can't tell you that White Sox first baseman Paul Konerko is upset with how he played last season. All they can tell you is that he hit .259 in 2007, that he just turned 32 and, therefore, he must be on the downside of his career because that's what the model says is supposed to happen to him.

Right. Computers can only tell you about the relevant facts. God, I would love to see Morrissey cover the financial markets once:

Computers have no use for heart, or least they can't quantify it. They can't analyze what's inside a mid-level sales associate, for example. They can't tell you who has the heart of a lion or the backbone of an earthworm.

Computers can't tell you that Countrywide mortgage specialist John Doe is upset with how he performed last quarter. All they can tell you is that the housing market is in a decline, his company is facing bankrupcy, therefore, his sales and commissions are likely to continue to decline next quarter.

Just once!

The best part, and the part I want to address without mocking, is this:

That the Sox dropped from 90 victories in 2006 to 72 games last season was one of the shocks of the baseball season. But not to Baseball Prospectus, and the people who run it deserve their props. They chalk up a lot of what happened on the South Side last season to the inevitability of time catching up with older athletes. I chalk it up to a number of players having down years at the same time.

Isn't there room for a number of Sox to have good years at the same time? Say, in 2008? If Jim Thome stays healthy, he could have an excellent season. It's a big "if," of course, but not like wondering if, say, the rain can hold off in Seattle for a month or two.

It's possible that a number of Sox players could have good years (that is, play above their expected talent level) at once. I know this to be true, because I've seen it happen; that was the year they won the World Series.

So it's possible. But, and I'm going to try and emphasize this as much as possible:

Projections, in baseball or anything else, are simply the best estimate we have given the data available of the most likely future performance.

When PECOTA (or anything else, for that matter), projects the White Sox to win 77 games, there's a six-win standard deviation on that forecast. Absolutely has to be; in 162 games you cannot get any more accurate than that. So the Sox could win anywhere from 71 to 83 games and the forecast would be on target.

Could the White Sox exceed their forecast by another standard deviation? Sure. All of the aging players on that team could simultaneously "defy" their aging curve, some of their younger players could have unforeseen breakout seasons... a lot could happen. But it's not likely.

The improbable is possible. But there's absolutely no reason to project the improbable.

Labels: , , , ,

How meaningless is spring ERA?

This is not a study. I am not proving anything. This is an illustration, meant to showcase just how meaningless this stuff is. (There isn't a robust database of spring training results, as there is for the regular season, so the data set is limited to well below a meaningful sample size by my limited ambition to talk about this topic. So all of the figures and conclusions below are to be taken with a grain of salt; they simply illustrate the concept. They're examples.)

I took the spring training ERA and FIP-ERA from everyone who pitched in Cubs camp last year, and took a look at the weighted average error of those ERAs compared to what those pitchers did in the regular season.

ERA: 1.74

FIP-ERA: 1.94

So, a 4.00 ERA in spring training could mean a guy is a 2.26 ERA pitcher, or a 5.74 ERA pitcher. That's pretty much the difference between being the Cy Young winner and being designated for assignment. It's so absurdly large as to be absolutely meaningless.

So when you read things like this:

Jon Lieber's four shutout innings Saturday against Arizona put him in good position for the Cubs' fifth starter's spot.

But Ryan Dempster and Jason Marquis also have pitched well, making it a tough call for manager Lou Piniella.

"That's what competition for spots does," Piniella said.

Take it with a pinch of salt. Do not hyper ventilate or overanalyze. It's all meaningless; Lieber's four shutout innings are an utterly meaningless indicator of his future performance. Now, they way he pitched those innings could indicate his future performance; I have no worries that the Cubs have scouts who have at least some ability to divine difference in performance that ERA can't capture at this point.

But we armchair analysts know pretty much nothing more than what we knew yesterday, which is basically what we knew the day before.

I will continue to state this come April. In fact, excluding learning new ways of looking at the data I already have (and I won't pretend that's not a possibility), the absolute earliest I will know something new about the abilities of any individual ballplayer is probably May. Maybe June. Just so's everyone's aware.

Labels: , , ,

The Trib (Dave van Dyck, I should clarify) sits down with Nate Silver and talks to him about PECOTA's projections for the Cubs and White Sox. They did the same thing last year, and had great fun with the reactions of the White Sox to the rather dour forecast PECOTA had for them. Welllll...

Just ask the 2007 White Sox, who were coming off a World Series championship and then a 90-victory season only to be picked to finish 72-90 by the modern-day baseball folks who feed historical information into computers.

And those 2007 Sox finished … 72-90, much to the amazement of those who dismissed the data spit out by the Player Empirical Comparison and Optimization Test Algorithm, better known as PECOTA.

What is ahead for Chicago's baseball teams in 2008? The predictions:

  • The Sox, because of a series of off-season dealings, will rebound to 77-85 and third place in the American League Central, trailing the dominance of Cleveland and Detroit.
  • The Cubs will follow their 85-victory NL Central championship with another division title, this time with a 91-71 record, second best in the National League but not enough to break their 100-year drought of winning the World Series.

I'm pretty sure those forecasts will seem reasonable (and perhaps familiar) to anyone who frequents this site; I'd buy on both of those. I should note that the preseason forecasts are pretty reliable when it comes to predicting division winners; the wild card is probably a little trickier and the playoffs are an utter crapshoot. The Mets may, as he says, be the favorites to represent the NL in the Series, but that's a lot less reliable of a forecast than the one that has the Mets winning the East.

There are still some "mainstream press taking on a complex topic" sort of mistakes; for example:

He helped design PECOTA and put out a fascinating bible for baseball statisticians ("Baseball Prospectus 2008," available at bookstores) that includes thousands of computerized calculations on players and teams. In the past, most have proved amazingly accurate.

Silver is probably deeply involved in the publication of BP2008, but it's not a solo effort like Bill James' Baseball Abstracts were; if there's an individual mostly responsible for the BP annual it's probably Christina Kahrl. I'm nitpicking, of course; nothing's precisely wrong about that graf, but it doesn't exactly communicate the whole truth about the matter. It's just the sort of thing that I notice reflexively; spent way too many years doing press clippings, I guess.

Anyways, back to the analysis:

Who is the closer to take over from Ryan Dempster, who has moved to the rotation and whom Silver says will "be fine as a No. 4 starter"? Silver factored in Kerry Wood, Bob Howry and Carlos Marmol.

"All these guys, we think, will be fine," Silver said. "None [is a] top closer. Dempster was not that good a closer, so there's not that much to lose."

One thing that would help the Cubs is trading for Baltimore second baseman Brian Roberts, who would lead off.

"My guess is that he would add another two or three wins to the bottom line," Silver said. "Leadoff hitter has been kind of an Achilles' heel for the Cubs. We like guys who get on base. The [rumored] trade makes a lot of sense."

The thing about Dempster would be interesting to hear Silver justify further; run Dempster's PECOTA-projected ERA through Silver's own "quick-and-dirty" starter-to-reliever conversion and it jumps to a very unsightly 5.30 ERA, far worse than what ZiPS or CHONE are projecting for him.

And I don't see how Roberts adds two to three wins to this team; or perhaps I should say I don't see how the Roberts trade adds two to three wins to this team. Losing Gallagher is a huge blow to the team's rotation depth, and losing Cedeno kills any hope of getting out from under the impending Ryan Theriot Menace. And including Pie or Colvin (as the O's are reported asking for) further skews the deal toward irresponsible.

Maybe I'm overattatched to Cubs prospects, and maybe Silver doesn't keep up with the daily Roberts rumor mill. Who knows.

Labels: , , ,

Brian Roberts, not a Cub

But he is an Infield Predator!

Labels: , , ,

A quick note on the business side of things

It's getting really late and I should be getting to bed soon, so I'm going to just hotlink this and deal with this tomorrow.

But TribCo VP/general council Crane Kenney,  who's apparently been the Shadow Government of the Cubs for some time now, has sat down with the beat writers and gone over the myriad of business issues that face the club.

  1. Tribune
  2. Sun-Times
  3. Daily Herald

I'll be honest - I haven't had time to read all of these yet, just skim them. here's what sticks out at me at first glance:

  1. I still am far from thrilled with the angle the Sun-Times is taking on this; it's very tabloidy and sensational. Which, I suppose, is the Sun-Times for you.
  2. Apparently payroll is "north of $120 million;" I want to see how that matches up against what's been published. (I haven't updated my salary chart since mid-December, when the Cubs were right around $117 million.)
  3. The Cubs seem to be aggressively trying to find new revenue streams; they seem to want to push into Spanish-language radio broadcasts, for example.
  4. Kenney claims the Cubs are now actually more independent of TribCo than before, which seems to contract the surface impression but on second blush seems like a reasonable enough assertion.

All of this is going to bear more examination; hopefully I'll return to the topic tomorrow.

Labels: , , ,

So... an update, in the Brian Roberts hostage crisis:

The Orioles and Chicago Cubs have not had any recent talks about All-Star second baseman Brian Roberts, pushing the potential deal toward Opening Day or beyond.

Despite engaging in trade talks for more than three months, the two teams still have not agreed to a package that both find suitable. According to two baseball sources, the Cubs have offered infielder Ronny Cedeno, pitching prospects Sean Gallagher and Donald Veal, and one other player for Roberts, but the Orioles are holding out for a better offer.


The sticking point of the deal appears to be the inclusion of a fourth – or a possible fifth – player in the deal. The Orioles have inquired about several of the Cubs' prospects, including reliever Jose Ceda and outfielders Felix Pie and Tyler Colvin.

All I have to say is:

And maybe:

Really? The Cubs are offering up Veal and Gallagher? And the Orioles are holding out for Ceda, Pie or Colvin? Is there something about Brian Roberts that I really don't know? Does he cure cancer? Can he immanentize the eschaton? Is he Cleared Theta Clear? What?!? Could someone please clue me in here?

Labels: , , , , , , , ,

It's a typical situation, in these typical times

Dave Matthews Band will be playing in Busch Stadium this year.

I've been to a DMB concert before, and it's probably one of the more inspiring live shows I've had a chance to go to. (Of course, the fact that I spent a good part of high school downloading bootlegged DMB concerts probably had a bit to do with that.) Just... if you're in St. Louis around that time and you need pot, well, now you know where to find it. There will be pot.

And as a division rival, it pains me to see the Cardinals doing something like this. After all, the extra revenue from this will easily subsidize their big free agent acquisitions. Like... Cesar Izturis. And Matt Clement.

Oh. Right.

Well at least they got to keep their big departing free agents, like Troy Percival and fan favorite David Eckstein.

Oh. Right.

Have fun, Cardinals fans!

Labels: ,

I like to call it the "Sharper Image effect" - if you charge more for utter crap, some people won't realize it's utter crap. (By the way: what utter scuzzbuckets, huh?) Well, apparently it's true for drugs, as well.

Sabernomics says that this could explain why athletes take HGH, despite the evidence that it doesn't work. I know it tends to work for milk. Your experience may vary, but I've checked every kind of milk there is at Hy-Vee (our local grocery chain), and you know what? Every single one of them is bottled at the exact same place. You slap a brand-name on it, and voila, it costs more!

(What makes this especially galling is the fact that Hy-Vee has two generic brands, and one costs more than the other. It always amazes me that people are willing to pay a premium for the same milk just so that it says Hy-Vee rather than County Fair.)


Fantasy baseball help requested.

Okay, I'll admit it - it's been something of a busy week, and tomorrow is my fantasy baseball draft for my work league and I'm feeling a little underprepared. The Infield Predators are drafting third (that's me).

It's a Yahoo! draft, not auction, rotisserie. Categories:


Roster positions:

C, 1B, 2B, 3B, SS, CI, MI, OF, OF, OF, Util, SP, SP, RP, RP, RP, P, P, BN, BN, BN, BN, BN, DL, DL

Ten teams, both leagues. Any of you who aren't Buckley that want to offer advice, you have my eternal gratitude. (Don't worry, Buckley, you draft fourth. I'm sure whoever it is you want will still be there when your turn comes.)

Labels: , ,

It's TRUE! (Except, of course, when it isn't.)

The easiest way to prove a point is to discard any data in the sample that doesn't support your conclusion.

The best part:
Now realize we are talking about a total of 897 spring training games and 4,860 regular season games. So that’s not just a fluke.
Sure, that sounds convincing. But: its a fluke.

I don't want to spend too much time on this, because I could probably expend months simply going over everything that's wrong with this article and quite frankly, I have better ways to waste my time.

But even ignoring the complete disregard for sample size and, well, not cherry-picking your sample, this study doesn't even prove what it purports to. For example, let's completely make up some numbers here and take a look. We'll first list our made-up spring training win percentage, and then our made-up regular season win percentage.

2003: .400, .500

2004: .500, .600

2005: .600, .400

So, over three years, we have a .500 win percentage in spring training, and a .500 win percentage in the regular season. Obviously spring training results predict the regular season!

But only in one season did spring training correctly "predict" the team's future success, and then only weakly. It's plausible, I suppose, that over a five year period, spring training results can act as a gauge of team quality. But that wasn't the question asked. Over a single season, spring training results are meaningless and a predictor of future quality.