April Numbers and You: Care and Feeding of Early Season Statistics

Okay you shouldn't feel bad, but you should reevaluate methods and reassess your conclusions probably.

Okay you shouldn’t feel bad, but you should reevaluate methods and reassess your conclusions probably.

If you’re an analytically inclined baseball fan, April truly is the cruelest month. Okay so yeah, I admit that T.S. Eliot wasn’t actually talking about the month (or more) long battle of wills between people quoting month (sometimes only weeks) long data samples as if they mean something and the people screaming “small sample size!” into the yawning void. That said, it can be difficult to know when the “roots that clutch” are solid enough to start to trust what the statistics are telling you, but we (and some much smarter people from other parts of the internets), are here to help.

There are a few key things to understand about advanced metrics, especially early in the year.

The first, and this is really a general key to using and understanding statistics, is that the data is only as good as the person analyzing it. The vast majority of the time when statistics seem to be telling us something nonsensical: it’s not that the data is wrong, it’s that whichever of us is looking at the data doesn’t clearly understand what the data is really measuring, or someone is willfully misinterpreting the data to prove a point. This is simple when dealing with basic counting stats like home runs, because there’s not a lot to analyze or interpret. Someone hits a lot of home runs or they don’t, and while you can get into things like park factor, you’re talking about reasonably minor tweaks to the raw data, not wholesale redefining of what a number or set of numbers means.

Advanced metrics, unfortunately, have a barrier to entry in that what they’re really measuring isn’t immediately apparent without some research. The key is to remember that if a stat just doesn’t make sense, either you need to look harder at what the stat is really measuring, or what you’re looking at is an anomaly that in a larger sample size will even out.

Small sample size is something you’ll see people shouting a lot this time of year, and for good reason. It’s so easy to be lured in by the shine of favorable statistics for players you want to have big years (Hunter Pence has a 143 OPS+!) or to needlessly worry about players who are starting out slow (Buster Posey only has four extra base hits and NO HOME RUNS OH MAYS SAVE US ALL!).

The thing is, though, neither of them have had 60 at bats. Posey hasn’t even had 50 plate appearances yet, and as Voros’ Axiom (or Voros’ Law, if you prefer) states, “anybody can hit just about anything in 60 at bats.”

Take for example how well Joaquin Arias hit when he took over while Pablo Sandoval was on the DL last year. It seemed like he was on base every game, he was great with RISP, but this was a guy who had never put up an OPS+ that was even league average over an extended sample at any point in his career. Don’t get me wrong, it was a hugely positive development for the Giants that Arias hit well enough to hold down the fort while Sandoval was out, and given Arias’ injury history early in his career it’s certainly not hard to believe that Arias is better than the 60 OPS+/ 56 wRC+ he put up in 2010, but he’s fundamentally an average hitter with limited power. Useful to have, absolutely, but not Babe Arias like he was for that random short stretch last summer.

But anybody can hit just about anything in 60 at bats. In fact, most hitting statistics take 100 at bats or more to stabilize, and pitching statistics take even longer.

When in doubt, I suggest two things. 1) Consult this handy reference guide for the number of plate appearances or batters faced it takes for various statistics to stabilize. 2) Consult career statistics where available. Larger sample sizes are more reliable, unless there is some extenuating factor to make them not relevant (before and after a major injury, for example), and over time most players tend to fall in line with their long term trends. It’s obviously more difficult with players with limited playing time, but when you’re talking about cases of a big star off to a slow start or a journeyman off to a white hot one, the career numbers will guide you.

And when in doubt, repeat to yourself over and over that “xFIP is the mind killer, Matt Cain fears no xFIP”, and remember that Buster ain’t havin’ it. And in this case, “it” is a bad offensive year.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s