The Other Fifteen

Eighty-five percent of the f---in' world is working. The other fifteen come out here.


Jeff Samardzija in Pitch F/X

If you want actual, well, good analysis, go over to Harry’s and take a look. He’s been doing this pitch ID stuff a lot longer than I have.

But I think I was able to duplicate one of the graphs from Harry’s page, or at least come close.

I used Mat Kovach’s parser to download data from MLB’s servers. (It seems to work fine for me, but it’s “pre-alpha” and not documented as of yet, so caveat emptor. Also, I Am Not A Programmer, so all code samples that follow are to be taken with more than a hint of salt.)

Then, in MySQL, I ran the following query against the data:

SELECT a.*, p.*
FROM gameday_atbat a, gameday_pitch p
WHERE a.gameid = p.gameid
    AND a.num = p.atbat_num
    AND a.pitcher = 502188;

Not the prettiest SQL I’ve ever written, and it returns more data than I need, but that’s fine. Then I export the data to a CSV file. There’s one pitch out in the dataset that I remove.

Well, now what? I use GNU R, personally, for all my graphing and K-means clustering needs. Code:

Samardzija <- read.table("C:/Retrosheet/saved queries/pitchfx/Samardzija first start.csv", header=TRUE, sep=",")
cl <- KMeans(model.matrix(~-1 + pfx_x + pfx_z, Samardzija), centers = 3, iter.max = 10, num.seeds = 10)
plot(Samardzija$pfx_z~Samardzija$pfx_x, col=cl$cluster, xlim=c(-20,20), ylim=c(-20,20))

Which produces the following graph:

samardzija_072508

In fairness to Harry, I cheated – in the second line of the program, I tell the clustering algorithm how many “center” to look for – in this case, how many pitches I want it to look for. I told it three. Why? Because that’s what Harry’s graph shows. I don’t really know how to determine the “right” number of centers as of yet.

Even so, I have one pitch that differs from his – I think he changed that ID manually, but I’m not sure. I can tell you that one cluster is green and one is black, but as far as calling one a splitter and one a slider, that’s something I have to work on.

(That graph, by the way, is ugly, and I know it’s ugly. I know I can make it look better, but in this case it’s a question of how much time I really want to invest in prettying up Pitch F/X graphs before I figure out what it is I’m actually doing with them. It’s called premature optimization.)

Labels: , , ,

2008 Cubs Opening Day Roster, by WAR, Part II

This is really rushed - suffice it to say that I don't think a lot of people will be pleased with what I'm going to call the Kerry Wood Issue, and I don't have a good answer for you right now. Otherwise, I like what I'm seeing with our pitching staff.

war_chart_03272008p

This is park adjusted. Thanks to Sam Larson for some tweaks to the relief pitcher calculations.

I'm really not married to any of these forecasts, and hope to publish a revised set of both charts this weekend. I just know that if I didn't publish something, I would forget to do so entirely.

Labels: , , ,

Greg Couch's column goes downhill after five words.

It starts off well:

Spring training doesn’t predict anything.

Everything after that is pretty much an utter vortex of suck. Mostly because he ignores those five words, and goes on to talk about how he isn't convinced about Rich Hill.

The best part:

Rich Hill could be a problem for the Cubs this year, not because he’s bad but because he’s being counted on as a legit No. 3 starter. Are we sure he’s that good?

The Cubs have left themselves one front-end-of-the-rotation starter short. They’re counting on Hill developing into that starter, and he might. But at this point, why has he been guaranteed a spot in the rotation at all?

Ohboy. "[W]hy has he been guaranteed a spot in the rotation at all?" Because he was one of the top-ten strikeout pitchers in the National League last season!

I keep telling myself that I shouldn't let myself get so excited about these things; after all, these guys have to write something about spring training, and I'm sure that you don't sell a lot of newspapers by referring to everything as "meaningless" for over a month straight. But sometimes I can't help it.

Labels: , , , , ,

How meaningless is spring ERA?

This is not a study. I am not proving anything. This is an illustration, meant to showcase just how meaningless this stuff is. (There isn't a robust database of spring training results, as there is for the regular season, so the data set is limited to well below a meaningful sample size by my limited ambition to talk about this topic. So all of the figures and conclusions below are to be taken with a grain of salt; they simply illustrate the concept. They're examples.)

I took the spring training ERA and FIP-ERA from everyone who pitched in Cubs camp last year, and took a look at the weighted average error of those ERAs compared to what those pitchers did in the regular season.

ERA: 1.74

FIP-ERA: 1.94

So, a 4.00 ERA in spring training could mean a guy is a 2.26 ERA pitcher, or a 5.74 ERA pitcher. That's pretty much the difference between being the Cy Young winner and being designated for assignment. It's so absurdly large as to be absolutely meaningless.

So when you read things like this:

Jon Lieber's four shutout innings Saturday against Arizona put him in good position for the Cubs' fifth starter's spot.

But Ryan Dempster and Jason Marquis also have pitched well, making it a tough call for manager Lou Piniella.

"That's what competition for spots does," Piniella said.

Take it with a pinch of salt. Do not hyper ventilate or overanalyze. It's all meaningless; Lieber's four shutout innings are an utterly meaningless indicator of his future performance. Now, they way he pitched those innings could indicate his future performance; I have no worries that the Cubs have scouts who have at least some ability to divine difference in performance that ERA can't capture at this point.

But we armchair analysts know pretty much nothing more than what we knew yesterday, which is basically what we knew the day before.

I will continue to state this come April. In fact, excluding learning new ways of looking at the data I already have (and I won't pretend that's not a possibility), the absolute earliest I will know something new about the abilities of any individual ballplayer is probably May. Maybe June. Just so's everyone's aware.

Labels: , , ,

A sea change

There is a blog post waiting to be written, flaming the Cubs and Piniella for refusing to consider the youngsters for the rotation:

"We're going to pitch them here in spring training, but this is not a year for this," Piniella said when asked specifically about right-hander Sean Gallagher and, by extension, other young starters in camp. "I hate to say that, but we've got veteran pitchers here to consider; they're going to get every chance, first and foremost.

"We like Gallagher. I like (Jeff) Samardzija. Good young starters with good young arms, but truthfully not coming out of camp. I'd be lying if I said that something like that would happen.

"We've got seven veteran starters. They're all healthy. They're all throwing the ball well, and they're all going to compete. Unless we have a streak of bad luck where we have some injuries, this is not the spring for (young pitchers)."

This won't be that post. Maybe in the morning, but not tonight.

I just want to note - the Cubs used to be a franchise that was focused on developing young pitching. And for a while we were very good at it. We still seem to be developing some nice, young starting pitching.

So it's very unnerving, for me at least, to see the Cubs so thoroughly abandoning that approach. Maybe it's for the better, but I don't have to like it.


The other interesting bit to that narrative:

Piniella has bumped right-hander Ryan Dempster up to the No. 3 spot in the rotation behind Carlos Zambrano and left-hander Ted Lilly.

The Sun-Times puts it a bit less strongly:

For now, right-hander Carlos Zambrano and left-hander Ted Lilly are locked into the 1-2 spots, with lefty Rich Hill taking No. 4 and Piniella looking for a righty among Marquis, Lieber and Ryan Dempster for the No. 3 spot.

If Dempster, who looks like the early favorite for the third spot, instead becomes the odd veteran out, he could return to the bullpen.

And the Trib is also a little more tentative:

Converted closer Ryan Dempster is the early front-runner to grab the No. 3 spot behind Carlos Zambrano and Ted Lilly, while Rich Hill has already sealed the No. 4 spot. The main battle is expected to be between Jon Lieber and Jason Marquis for the fifth spot, with Sean Marshall as the dark-horse candidate.

But where there's smoke, there's probably fire.

Now, the Cubs employ a great many scouts that are much better at these things than I am, and are actually in Mesa. So if they say that Dempster looks impressive, well... then he probably looks impressive, at least. There's an information asymmetry here, so for now I guess I have to take their word for it.

And PECOTA, for one, sees a lot of upside to Dempster, as well as a lot of downside. And, well, I love upside risk. Absolutely love the hell out of upside risk. So I'm willing to ride this out for a while and see where it goes. Just so long as everyone understands that this could end very badly.

And this is all very premature - remember who won the fifth starter competition last season between Mark Prior, Neal Cotts and Wade Miller?

But let's presume that Dempster is our number three starter to start the season and run with it. That leaves Marquis and Lieber "fighting" for the fifth starter's job. Now, let's put on our intuitive thinking caps and figure this out:

  1. The Cubs knew that Dempster was getting a shot at the rotation.
  2. They signed Jon Lieber anyway.
  3. Lieber signed with the Cubs despite other offers from teams without so much starting pitching depth.

So... who here thinks that Jason Marquis is the odd man out?

Labels: , , ,