The Other Fifteen

Eighty-five percent of the f---in' world is working. The other fifteen come out here.


People are still smarter than monkeys

I don't know about the rest of you, but mostly this means we need to find smarter monkeys.

For those of you wondering what the hell I'm talking about, Tango published the full results of his 2007 Community Forecasts for hitters and pitchers. The community beats the Marcels the Monkey forecasts. For hitters, it looks like the community forecast has the advantage over mighty PECOTA, as well.

First, just so everyone understands me: we are not talking a substantial difference here. The basic concept of projecting future performance for baseball players aren't trade secrets at all - use more than one year of data, account for aging, regress to the mean. They account for most of what's going on in any projection system, and it works just as well whether people or computerized monkeys are doing it.

So, what's going on here, exactly? Well, if you don't mind, we can discuss this "out of school" for a second, and just think about why this might be the case. I can't prove any of these ideas, and quite frankly I don't know how I would.

But I think the difference has something to do with information asymmetry. Marcels has three years of playing time data, the league mean, and an aging curve. (I go over how Marcels works here.) That's all the information that Marcels has. And that information seems to get you most of the way; Marcels routinely hangs in with the other, more advanced projection systems. But if you want to beat the Marcels, you need more information than the monkey has.

So, what information does the Community have that Marcels doesn't? Well... quite a lot, and quite a lot of that is reflected in systems like PECOTA. So, what does the Community know that we can't account for (yet) in computerized projection systems? Here's some ideas:

  1. Health. Two injured pitchers with similar performance will probably look the same to a projection system; fans, on the other hand, know that one had shoulder issues that are likely to linger into next season, while the other had a leg injury that's fully healed and shouldn't significantly impact his performance.
  2. Utilization. The computer doesn't know that the Royals want Zach Grienke to start; the fans do.

I should go ahead and add that there is a flaw with the Community Forecasts; they seem to be routinely too optimistic about how players will perform. It's so routine that you can easily work around it; simply baselining everything to the average takes care of that.

So, in addition to the challenge of beating the monkey, advanced forecasting metrics have the added challenge of trying to beat the fans. I don't envy Sean Smith or Nate Silver or Dan Szymborski one bit. (Okay, well maybe a little. Okay, so maybe a lot. Still, they've got a tough row to hoe.)

Labels: ,

0 Responses to “People are still smarter than monkeys”

Post a Comment