Last night, I went looking for defensive shifts. Well, here they are, broken down by base/out state and batter handedness:
There may be special cases, like the wishbone shift, but I think that's pretty representative of how fielders position themselves. (If you want the underlying data, here it is.)
It's hard to read the graph without looking at the accompanying data - the fielders don't always shift in the same direction on any given shift. Random thoughts based on a cursory reading of the data:
- Corner infielders tend to shift more than middle infielders. (As a measure of distance, that is.)
- For corner infielders, most shifts involve depth; most middle infielder shifts involve lateral movement.
- There's no small amount of noise to that graph; if I was going to use this sort of data for a project (say, determining responsibility for a ground ball in a zone rating type of system) I'd want to smooth it out into, say, maybe three or four different positionings per position. (Just eyeballing it real quick, for first base there seems to be three basic positionings - base empty, runner on first, and playing the bunt. Same for third base.)
- A lot of people were asking if having a good defensive third baseman could "cut into" a shortstop's graph, making his range look smaller than it really was. I have a hunch that we might be able to find out exactly how much third basemen are cutting into the shortstop's area of responsibility if we look at how much the shortstop's range grows when the third baseman is playing in.
My next project is going to be to figure out a way to estimate how many "chances" a player has at a position from the hit location data. (I haven't really done anything with it yet, but I have hit location data for balls hit into the outfield as well.) It's far less a conceptual problem than it is a judgement and labor problem - I have a good idea of how I want to delineate the zones of responsibility. The question is now how big to make the zones; then I have to actually go and write the code.
Continuing with making defensive graphs, I thought it might be illuminating to try and see exactly how much defensive positioning impacts where ground balls are fielded. The following are graphs of the average location of a fielded ground ball, color-coded by position.
First off, every ball in the data set:
Makes sense, I suppose. We'll call that our reference graph.
Now, let's take a look at just right handed batters with the bases empty:
Virtually identical - everybody plays "straight-up" in that situation. But now let's look at the same situation with a left-handed batter:
It's a subtle shift, but a shift nonetheless. The middle infielders seem to "cheat" a bit to their left, and the third baseman seems to be doing a heck of a lot of cheating, not only over but shallow.
Now, with a runner on first (second and third empty):
Everybody but the first baseman seems to play straight up - the first baseman plays a lot shallower, though. With runners on first and second:
That looks pretty much like our reference graph - maybe the third baseman's playing a little in, but other than that everyone is playing straight up. With the bases loaded:
Looks like everyone is playing straight up. Now, let's take a look at how fielders position themselves when a handful of lefty sluggers (Thome, Ortiz, Fielder, Giambi and Hafner) are at bat:
That's the wishbone shift, and I'm really pleased that the graph is capturing it so well.
The next step is to look at the number of outs in the inning and see how that affects defensive positioning. (Unless something else shiny distracts me first.)
First of all, let's define range as a fielder's ability to get to baseballs. For the purposes of this analysis, we don't care what a fielder does with the ball when he gets to it. We're only interested in how far he can go to get to a baseball.
Pretty much any defensive metric you see that attempts to measure range is really an approximation based upon the data available. The more data we have, the better (play-by-play data is better than seasonal totals; zone rating is even better) but the best thing would be to know:
- Where the ball was fielded, and
- Where the fielder was standing where the ball was hit.
I am unaware of anyone who keeps track of #2, but there are several outlets that keep track of #1. MLB.com keeps track of it for their Enhanced Gameday functionality. MLB Advanced Media, at least currently, makes that data available to researchers. Most commonly it's used in PitchF/X analysis, but for right now I'm just using the hit location data.
Rather than parse the remainder of the data I needed from the verbatim description (which is something I'd love to start doing, but let's just say that regular expressions scare me) I merged the hit location data with the Retrosheet play-by-play data. (The boilerplate: The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.) Dan Turkenkopf has a great parser that links the Gameday XML files up with the appropriate Retrosheet IDs. A few SQL queries later, and bingo, data! I filtered for all ground balls and divided them up by what position was responsible for fielding them. The result of the play (hit, error or out) was also recorded, although not used in the presentation here.
But what to do with the data? I decided to try plotting it with GNU R, a free statistics and graphing programming tool. The basics of this are covered in Baseball Hacks, by Joseph Adler. Additional functionality is provided by the Geneplotter library.
Okay, here we go - plots!
Those are our reference images - all fielders from 2007 are included. Now, let's take a look at a handful of shortstops, just to see how a few different players look.
I'll be the first to admit - if it wasn't nearly 2 in the morning, I would spend a little more time on cleaning up these graphs. The pitches right on the edges seem a bit more difficult to "read," for one. And I don't really need the entire field like that for just infielders - we can do the same thing with outfielders, I just haven't yet.
I'm just "doodling" with the data for now - if there is quantitative analysis to be done with this data, it'll have to wait at least until morning. I would love to put that Tulo graph next to that Jeter graph and show it to a bunch of Yankees fans, see what they have to say about it. Suggestions and requests are welcome.
Consider this a lark with data; I wouldn't consider the conclusions definitive, or even necessarily meaningful. It's at least thought provoking.
First, I cobbled together a fascimile of a zone rating system based upon the work of Sean Smith on TotalZone. Let me be clear here: my zone rating system is probably the worst zone rating system in existence. If you want a Zone Rating system based on Retrosheet data, TotalZone or SFR are vastly superior; UZR and PMR are better still. And my system - let's call it SZR, for "Stupid Zone Rating" - only rates shortstops, making it even more, well, stupid. Or "special," if you're worried about hurting its feelings.
Here's how it works:
- Shortstops are given credit every time they record an out or fielder's choice on a ground ball.
- An "opportunity" to make a play is assigned for errors, and half of all ground ball singles hit to left and center field.
Stupid Zone Rating is simply Outs divided by Outs plus Opportunities.
So why invent the worst possible zone rating system? Because it lets me play around with the data a bit. In this specific case, what I wanted to know was simple. Aramis Ramirez had easily the best defensive season of his career last season. He also missed no small amount of playing time, which gives us a healthy amount of "non-Aramis" opportunities to compare to.
What I was curious about was, did A-Ram's big defensive season have an effect on the Cubs shortstops?
The average SZR of a shortstop from 2004-2007 was .766. During that time, the average SZR of Cubs shortstops when Ramirez didn't play was .801; when Ramirez played, the average SZR of Cubs shortstops was .761.
Now, we'll drill down to 2007. When Ramirez played in 2007, Cubs shortstops averaged a .762 SZR; when Ramirez was out of the lineup, the Cubs averaged a .797 SZR.
I have to go to my actual paying job now, so I'll leave figuring out what that data means as an exercise to the reader. The full spreadsheet is available to peruse.
(The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.)
I am now a member of the Goat Riders of the Apocalypse.
What does that mean? As of yet, I'm not 100% certain. What I do know is that this is obviously going to cut down on the amount of activity here - anything Cubs-specific, especially anything non-mathy, is going to be over at GROTA.
As for The Other Fifteen? For right now, I'm planning to keep this running as a less-Cuby, more SABRy blog for the time being. But I'm not sure exactly how often I'll be posting here in the future.
So expect things to be tenative for the next week or so. I'm unsure of quite a bit going forward - how many Babylon 5 references and Excel spreadsheets the GROTA audience is clamoring for, how productive I can be writing for multiple outlets, how many jokes I can make about Ryan Theriot.
In the meantime, here's a look at how a potential Soriano trip to the DL could affect us. Enjoy.
Labels: Other Blogs
Let's assume for a moment that he's only going to be out a few days. That assumption means no callups. How best to deal with Soriano's injury? (When I say wins, I mean WAR over a full season.)
DeRosa in left, Fontenot at second?
This should come as no surprise to you - DeRosa is less valuable as a left fielder than as a second baseman - by about half a win. Fontenot, meanwhile, is somewhere between 1 to 1.5 wins worse than DeRosa at second.
Ward in left?
Ward is below replacement as a left fielder, given his atrocious defense in the outfield. In fact, the more I think about it, the more I don't like having Ward on the team. He makes the bench seem a lot shorter than it really is.
Johnson in left?
Probably the best option; it's, after all, why teams carry fourth outfielders. That means keeping Pie in the lineup, however, which the team has seem reluctant to do so far this season.
Felix Pie is being freed from the bench for at least one afternoon of baseball, according to Paul Sullivan. Not for good reasons like "defense" or "needing at-bats to develop." Nope. Because he's speedy!
Manager Lou Piniella said Pie sat so much because the Cubs faced four left-handers on the trip, and he expects him back in the lineup Tuesday night against Cincinnati's Aaron Harang. Piniella said he wants more speed in the lineup and to be more aggressive on the basepaths.
"The possibility there is to put as much speed as we can and force the action a little," Piniella said. "We've done that in a few games, but basically I've stayed with a set lineup."
The speediest Cubs lineup would include Pie in center and Ronny Cedeno at second, giving the Cubs four legitimate base-stealing threats, along with Ryan Theriot and Alfonso Soriano.
Of course, Pie and Soriano have to get on base to make use of their speed, and if Cedeno starts, Mark DeRosa would have to come off the bench.
I have no idea if Sullivan's remarks about Cedeno are indicative of all about Lou's thinking; the goal is certainly not to have Theriot and Cedeno in the lineup, so I certainly hope he's just idly musing about such things.
With the Cubs out of the race in '06, Baker often played veteran Neifi Perez over rookie Ryan Theriot, leading to criticism he slowed Theriot's development.
"I don't think that's a fair criticism," Theriot said. "Neifi was a great player, and a proven veteran, a guy that did a lot of good things for a long time. There's a flip side you never hear about. He could've thrown me to the wolves and it could've turned out bad. I learned a lot just sitting back watching. There wasn't very much pressure, so just sit there and understand the big-league life. I learned from guys like Neifi and Todd [Walker]. They taught me a lot. One thing Dusty did do, when he started seeing some confidence, a couple good games in a row, he kept throwing you back out there."
Yep, it's Dusty's fault that... that what? That we missed out on a half-season of The Riot? That Theriot isn't very good? What are you even saying here?
Let's take a look at the accumulated press reports and see if we can't divine what Lou's plans are for the day, shall we?
- Henry Blanco will start. Soto's a young guy, but he hardly leads the team in days off at this point, and it's a long season. I'd prefer it to be J.D. Closser as the backup, but what do I know, right?
- Hill is going to pitch in relief today. I imagine this will change if Marquis is having a complete-game no hitter or something, but other than that the idea is to get a few innings out of Hill and hopefully see an improvement in his ability to pitch in the zone.
- Gordon Wittenmeyer hints that last night's odd arrangement (putting Fukudome in
leftcenter and DeRosa in right) might become more common, as the Cubs try to work Mike Fontenot into the lineup so that we can have two lefties. (May I suggest Eric Patterson?) [Sorry about the mixup - I typed left and meant center for Fukudome. -CW]
- Expect to see either Johnson or Cedeno or both today; both have hit Moyers very well in their careers. (Which is something I don't care about but Lou does.) Johnson was probably a given anyways. The other Cubs who have hit Moyers well are D-Lee and Rich Hill. (Maybe that's why he's starting today.)
I'll admit that I don't know, but I'm going to try to find out. Hopefully the fans at BCB (who, I'll admit, generally pay more attention to how much someone hustles than I do) will help me track this as well as possible.
I'm also going to throw this open to fans of other teams - if you're willing to keep track of this with enough data for me to track the data at season's end, I'll go ahead and look at it for you as well. If a bunch of Red Sox fans want to know how much it hurts to have Manny being Manny, let me know how often he does it and I'll figure it out for you.
Let's talk a bit about sample size and regression to the mean, shall we? I feel like I owe some actual examples of what I'm talking about given my recent screed on the topic.
Let's go ahead and do a throwdown-to-showdown, between Cubs centerfield options Felix Pie and Reed Johnson. (It's a hot topic of discussion among Cubs fans who aren't too busy pining after Ronny Cedeno.) And let's use OBP as our stat of choice for comparing them for the moment. How much more often will Johnson get on base than Pie?
We're simply going to focus on 2008 production for just a minute, because people seem to be much more excited about 2008 production to date than they are about the latest build of the ZiPS projections, for instance.
According to Baseball-Reference.com, Reed Johnson has a .478 OBP in 23 plate appearances; Pie has a .217 OBP in 23 plate appearances. Obviously Johnson is a better choice as a center fielder, just based upon OBP, right?
At 23 plate appearances apiece, we know no such thing. What we do know, on the other hand, is that both of them are major league baseball players, and the central tendancy of OBP talent among major league players is roughly .330. That's the mean; now we can regress to the mean. (A great primer on how to do that is available from Sal at Athletics Nation.)
Once we go ahead and regress Reed Johnson's OBP to the mean, we end up with a .338 OBP. For Pie, once we regress to the mean, we end up with a .321 OBP.
It has to be hard to be a beat writer for the Cubs. Piniella mocks you in the post-game press conferences. The fans mock you on your own blog (unless you're Mariotti, in which case you can hide in your Hall of Doom while your lackeys do the dirty work). Sometimes you have to feel underappreciated.
Well, here's your opportunity to make up for that, in cold hard cash!
See this? This is the face of the President who freed the slaves and reunited the Union. He's valid legal tender, and he's all yours... IF you can do this simple favor for me.
Next time you're in a room with Lou in some sort of a Q&A forum, ask him if he's concerned about Theriot's slow start. Pie and Hill have been demoted for less, after all.
Okay, it's church-going time. Strictly nondenominational, so long as you are willing to worship at the Church of Baseball.
Now, when you come to worship at the Church of Baseball, there are a few things that you need to know. If you know nothing else, then you must at least know this. Study it. Repeat it to yourself. Say it after every inning. Make it a part of yourself. Ready? Okay, here goes the Sabermetrician's Creed:
I cannot learn anything meaningful about a batter in less than 50 plate appearances.
I cannot learn anything meaningful about a pitcher in less than 20 innings pitched.
I cannot learn anything meaningful about a team in less than 500 plate appearances.
This is not negotiable - this isn't something that you can argue over or find exceptions to. You absolutely cannot draw conclusions about a player or team's performance in ten games or less; you can not.
This is not to say that you can draw conclusions about a team's performance in twenty games - you probably can't do that, either. But there is absolutely no way you can do it in less than ten.
Again: this is not a matter of opinion. Anyone that tells you otherwise is lying to you, and that's the truth.
[I will add one caveat - doctors are an exception to this rule. So you can learn that Pedro Martinez has a bum hamstring. I'm sure that really surprised you.]
Right now, we see amazing things all around us: the Orioles are first in the AL East! The Tigers have the worst record in baseball! Jason Kendall leads the NL in batting average! Kyle Lohse leads the NL in ERA!
And people are nearly desperate to ascribe meaning to all of those things. But they don't matter. They're meaningless. The literally have no meaning, none, whatsoever.
But no team has ever gone 0-7 and made the playoffs!
I don't care.
Jason Kendall had eye surgery!
I don't care.
Dave Duncan is the greatest pitching coach ever!
I don't care.
Do you know why I don't care?
Because you absolutely cannot draw conclusions about a player's or team's performance in ten games or less; you can not.
Now go forth, and sin no more. Grace be with those who understand and appreciate the great importance of sample size in all things; amen.
The Cubs have had a wild ride the past few days, somewhat masking the fact that the Cubs have a four-game winning streak under their belts. They haven't been winning pretty, but they've been winning.
And while Lou calls it simply "playing his bench," I don't know how many teams are sitting two starters on back-to-back days in this early in the season. Ryan Theriot and Felix Pie have been sitting in favor of Ronny Cedeno and Reed Johnson. Both Cedeno and Johnson have been taking advantage of their opportunities; Johnson is hitting .375/.474/.438 so far, for an OPS of 912, while Cedeno hits .333/.400/.444, for an OPS of .844.
Meanwhile, Pie has a .200/.238/.200 line to his name, and Theriot is at an amazing .207/.281/.241 batting line on the season so far.
I don't think either Pie or Theriot have precisely lost their jobs yet - Pie is probably in a platoon situation for the time being, with Johnson eyeing more of Pie's playing time. His game-winning hit probably helped his cause some.
Theriot, on the other hand, needs to seriously keep his eyes open for some infield predation at this point. Pie was a top prospect who was held out of a lot of trading talks by the Cubs this winter; Theriot's no better than a lot of minor league journeymen at this point and is lucky to have a spot on a 25-man roster.
[Yes, there may be some wishful thinking in that paragraph. When June rolls around and Pie is either traded or at AAA, while Theriot is still our starting shortstop, I plan on eating my own arm off for fun.]
Buster Olney chats, I criticize. Deal? Deal!
Gray (Chicago): Buster, with 12 Ks and only 1 BB thus far on the season, is this the new, mature Carlos Zambrano we have been waiting for? After that Tejada HR on Sunday, the old Zambrano would have walked the next batter and hit the one after that.
Buster Olney: Gray: Yes, this is the new and improved Big Z, the real deal. His emotional progress is the reason why I think they'll win the division; he's a great anchor to that staff.
Yep, emotional maturity. That's the only way you can explain a decline in a player's walk rate over two starts.
I understand it has to be difficult to talk about baseball for pay at the beginning of the season. It's probably not endearing to ones editors to say "I don't know what this means" 30 or 40 times in response to questions from paying customers during a chat.
But armchair psychology annoys the crap out of me. I only try to tell you what a player's talent level is based upon his performance, and even then I add in a qualifier now and again. I hardly think it's possible to tell what a player's emotional state is based upon his walk rate.
Brendan, NY: Whats your take on Sorianos proper batting order slot? Any chance they still get Roberts b/c they might resolve a lot of those issues and DeRosas looked pretty bad at second so far.
Buster Olney: Brendan: There is no perfect place to hit him, other than No. 7. he strikes out way too much to hit anywhere from 3-6, and you can't hit him eighth; he'd get less than nothing to hit. You can't hit him seventh, because the Cubs are paying him way too much money to stick him in that slot in the lineup, and he's made it very clear -- in Washington, and with the Cubs -- that he is most comfortable leading off. So Lou basically has to grit his teeth and write in Soriano at No. 1 until the Cubs get Roberts.
I'm not going to rehash Soriano's splits; I've done that already here and here, and don't plan on revisiting that unless something new comes to mind to say on the topic. Olney really puts too much weight on how often a hitter strikes out, though.
I love how Olney acts like it's still inevitable for the Cubs to get Roberts - no need to eat crow on that one!
Mike (cleveland): Buster, Just finished "three nights in August"...the epilogue was very interesting. Buster where do you stand on the current pseudo-standoff between pure stat analysis and traditional scouting and player development? I see the value in both...but I have to say that experience is the best teacher in most pursuits, and I dont see how a statistics degree from MIT should add any level of expertise to scouting. You learn the game by playing it and/or managing no?
Buster Olney: Mike: I think there's a great mix to be found between the two approaches, a middle ground. Some scouts don't pay enough attention to the numbers, and some stats guys don't acknowledge that personality can and does play a role in what happens (for example, the long-held belief that a lot of relievers are interchangeable). The Indians and Padres are the best teams, I think, at combining the two schools of thought...
Bob, Chicago: Soriano had the third best OPS+ on the team. How can you say he can't bat anywhere from 3-6?
Buster Olney: Bob: This is a classic example of the whole scouting vs. numbers thing I just mentioned. The numbers say one thing, but if you've been around Soriano and watched his hitting with RISP, he just is not good in big spots, against good pitchers; he just destroys rallies...
And then, about face! It's not his strikeouts, its his personality!
Here's the thing: we keep score with numbers! If it's about winning and losing, then the numbers are what matters. You want to end the game with more of the right numbers than the other team.
I really hate criticizing Olney for analysis because I don't think it's central to who and what Olney is. But... Olney is bad at analysis. As a clearinghouse for information and sourcing he's good, but he's no Robothal.
So much for Lou sticking with Theriot. At least for today, Ronny Cedeno is playing shortstop. This demands that I play his theme music:
(For those of you just joining us - I call Cedeno the Great Destroyer because of his incredible .189 ISO last season, even - or especially - because it accompanied a .203/.231/.392 batting line. He doesn't walk enough to be a real Three True Outcomes kind of hitter, but I think he could be the closest a shortstop has come to that ideal in a while.)
I'm pressed for time, but this occasion really demands some Infield Predation. I'll see what I can do about that today.
Now, Ryan Theriot has a lot of qualities that lend toward being likable, but he's also not very good at baseball. I think that what's happening here is that a lot of fans remember how poor he was last September, are combining it with his slow start and coming to the conclusion that whatever magic Theriot once possessed, he no longer has. It's not necessarily the correct conclusion, even if it agrees with the correct conclusion - Theriot never had any magic to begin with.
So I'm torn - part of me agrees with T.S. Eliot: "The last temptation is the greatest treason; to do the right thing, for the wrong reason." On the other hand, there's much to like in what Winston Churchill said: "If Hitler invaded hell, I would make at least a favorable reference to the Devil in the House of Commons."
But the only opinion that really matters is that of the Cubs braintrust, specifically Lou Piniella and Jim Hendry. And I've seen no indication that they're souring on Theriot just yet. That makes sense - they're more likely to try and sort through these things rationally than fans, who by-and-large tend to experience these things emotionally first and foremost. And obviously the Cubs organization thinks they know something about Ryan Theriot's true talent level that I don't. But this bears watching nonetheless.
- Kansas City Royals
- Washington Nationals
- Milwaukee Brewers
- San Diego Padres
The four least winningest:
- Detroit Tigers
- Atlanta Braves
- Chicago Cubs
- Oakland A's
- Minnesota Twins
- Colorado Rockies
- San Francisco Giants
There was actually a six-way tie for second-worst team in the majors right now.
So, riddle me this: what do the top teams have in common? What do the bottom teams have in common?
Derrek Lee, .222/.222/.444
Aramis Ramirez, .154/.313/.385
Alfonso Soriano, .059/.059/.111
If you think that there's a chance that our Big Three hitters will play like that the rest of the season, then go ahead and panic. If you think that, presuming they're able to play, there is almost no way in hell that the three of them hit like Ryan Theriot for the rest of the season, then relax.
[Before I catch hell for that comment: Lee's OPS currently is .666. Ramirez's is .698. Theriot's OPS last season was .672. Just saying.]