Self Editor

Forum: base
Page 18971
Subject: A lengthy review of Covering the Bases

Posted by: Sludge - [16109168] Tue, Dec 19, 2006, 14:39

I would ask that comments, arguments, criticisms, etc. be placed in another thread separate from this one. Obviously I cannot control who does what, but I’m asking for this one small courtesy.

First off, the fine print:

I am not at all familiar with Bill James and his work except in the most passing of ways. The only words written by Bill James that I have read are the quotes contained in the present work. As such, I have no idea about the context of the quotes contained therein. For all I know, the majority of them may be taken completely out of context, or none of them. Thus, I cannot comment on Bill James and his work except in general overall terms about what his body of work is attempting to accomplish. Furthermore, I can’t comment exhaustively on the validity of the criticisms of Walsh and McFall. Fortunately, however, one need not be intimately familiar with James’s work to spot the glaring holes in Walsh and McFall’s arguments.

Unless specifically stated otherwise, all quotes contained herein will be from Walsh and McFall (2006), Covering the Bases: Making Sense of Bill James’ Statistical Nonsense, iUniverse, Lincoln, NE.

I will also be going through and commenting on the book in sequential order. There are many themes which will be repeated, but it is easier on my limited time to do it in this manner rather than group things together by the types of errors they make.

I am also going to apologize in advance for the length of the quoted sections of the book. In most cases, I feel they are entirely necessary to maintain the correct context.

Now, without any further ado…

The Good, the Bad and the Sabermetric

I’ll start with the following quote to set the tone. They constantly deride James for his ego and arrogance, but they apparently suffer from no shortage of it themselves.

With this in mind, we will now perform a “literary surgery” on the works of bill James. We will pick them apart until we have exposed his musings for what they truly are: works of fiction. We will peel back the epidermis of statistical baseball knowledge which he hides behind and reveal a man who actually knows little about the real game of baseball. We will dissect his words and find out if there is any substance to what he has to say. We will place his formulas and methods squarely under the microscope of truth. We will look at some of the more pompous and ridiculous comments he has made and sever them with logic and reason. The many works of the so-called “guru of baseball statistics” are now on the operating table and we are going to cut until they bleed the truth. (pg. 8)

Earlier in the chapter, they make the following statement concerning OPS (On-base percentage Plus Slugging percentage) and its inventor Pete Palmer, who they deride as a member of the “James Gang”.

Palmer developed OPS in an attempt to determine which players were not only reaching base but advancing as well. On-base plus slugging is currently baseball’s hippest stat and is constantly referred to by analysis and announcers alike. What these people fail to comprehend is the false logic the statistic is rooted in. We need only look at its two components, on-base percentage and slugging percentage, to properly illustrate the lack of value behind OPS. On-base percentage is figured by dividing times on base by plate appearances. Slugging percentage is calculated by dividing total bases by at-bats. Therefore, since the two stats don’t share a common denominator, they can not be expected to yield a telling number when added together. For example, if you have 1/4 of an apple and we have 1/4 of an orange, combining our cache of fruit does not yield 1/2 of some bizarre apple/orange hybrid. However, if we each have 1/3 of an apple, combining our resources will ultimately yield 2/3 of an apple. Like adding apples and oranges, the integration of plate appearances and at-bats through the OPS formula yields nothing more than a coalescence of numbers with no true meaning. Amazingly, little more than basic logic and simple mathematics have managed to expose Palmer’s most popular stat as irrelevant. (pp. 4-5)

Their criticisms of OPS are way off-base (no pun intended). They are dismissing it as “irrelevant” for the fact that the denominators of the two stats are different. The problems with this tack are two-fold. Firstly, as we’ll see, the authors seem to despise any formula which is overly complex. So I think it is perfectly well within reason to think that had Palmer come up with a statistic that combines the two measures in such a way that the denominators were the same (e.g. on-base percentage + slugging percentage * at-bags / plate appearances), they would have criticisms about the complexity of the formula and the validity of the term on the right-hand side of the sum. Palmer used traditional statistics in a simple manner. Secondly, and most importantly, they do nothing to show that the statistic does not measure what it is purported to measure, which is really all that is important when showing something as “irrelevant”. As stated by them, it was developed “to determine which players were not only reaching base but advancing as well.” (pg. 4) Does it do that? Is there a correlation between OPS and the trait being measured? Of course there is. Players who reach base often in any fashion will have the largest share of their OPS from the on-base percentage, but the OPS will increase the more often they reach base. Players who do not walk often, get hit by pitch often, but hit well for power will have the largest share of their OPS from the slugging percentage, but the OPS will increase the more doubles and triples they hit (holding their batting average fairly constant). There is no underlying number being estimated here, but the fact remains that there will be a correlation between OPS and a players ability to not only reach base, but advance past first due to hitting a double, triple, or home run. A player who rarely reaches base will have a low OPS, while a player who hits for average and takes walks will have an OPS comparable to a player who hits for power and strikes out a lot, which will be lower still than a player who hits for average and power and rarely strikes out.

Is OPS perfect? Of course not. How could it be. It’s measuring a trait that’s somewhat hard to pin down. Have Walsh and McFall shown it to be “irrelevant”? Nowhere close. Are there better statistics to measure the trait of interest? Probably. But the existence of a better estimate of an underlying trait or parameter does not show that the worse estimate is invalid. It is simply not as precise, but both are still measuring the same thing. (“Parameter” is the term used to describe a numerical feature of a population. For example, the true average height of all students here at Texas Tech would be the parameter. If a sample of 100 students is collected, and I compute the average height of those 100 students, that is my estimate of the parameter of interest.)

Oh, and their apples and oranges analogy is horrible.

Sludge
ID: 45541422
Thu, Dec 21, 2006, 00:54

Let’s Create Some Runs

Is Player A better than Player B? How good would Player C have been if he played with the Yankees instead of Pittsburgh? Obviously, both are oft-asked questions along with many other similar questions. Equally as obvious is that we will really never know. There are often clear divisions between players, but those are never interesting. For the interesting cases, the divisions begin to blur. So what if we were to try and take an educated guess at these questions? What if we were to try and estimate how Player C would have performed for the Yankees? Is there anything wrong with that? What if we try and estimate how Players A and B would have performed on an average baseball team, to try and level their playing fields a bit?

As best as I can tell, that’s what James’s Runs Created formula attempts to do, in part. We all know that it is easy to figure out how many runs a player actually had a hand in. We have their runs scored (R) and runs batted in (RBI) totals available to us with just a couple of mouse clicks. But if the games were to be played over again and the player was to put up the same offensive statistics, what is the average number of runs the player would have had a hand in? Won’t there occasionally be players with sub-par offensive numbers that produce above-par R and RBI simply due to their circumstances? Won’t the reverse also be true?

Again, I’m not trying to defend the runs created formula. But the basic idea of trying to remove the effects of those circumstances is not a bad one.

I’ll begin at the end, and quote a block from near the end of the chapter:

Part of the problem lies in James’ belief that the hard, countable numbers are lying to us. In his 2003 bestseller Moneyball, Michael Lewis reprinted one of James’ most popular quotes on this subject reminding us that Bill believes that, “baseball statistics are not pure accomplishments of men against other men…they are accomplishments of men in combination with their circumstances.” The point James is trying to convey is that a man who hits a double with runners on second and third is no better than a man who hits a double with the bases empty; he is simply profiting from the luck of his circumstances. With his runs created formula James has attempted to factor out all of the accomplishments of a player’s teammates, and thus the “luck” involved therein, and merely look at the individual’s achievements. This is why, throughout his works, James is constantly referring to the average fan’s obsession with RBI as “infantile”. He believes that RBIs are not based on the accomplishments of the individual hitter, but those of his teammates, and therefore he discounts them. The idea behind this theory is that a home run is a home run. The difference between a grand slam and a solo shot relies on the ability of the hitter’s teammates. While this is partially true, James misses the point that, without RBI-men hitting those homers runs would not be scored. By the same token, no runs would be batted in without men on base to score them. James must not be familiar with Sir Isaac Newton’s third law (for every RBI there must be a run scored, or something like that). (pp. 21-22)

The last sentence is unnecessary, but I couldn’t resist including it. I feel they’re flat-out wrong here. James has not discounted the presence of teammates. He has attempted to replace the teammates’ actual accomplishments with an average set of accomplishments. (Whether he’s done that successfully or not, I don’t know. But that’s not what the authors debate here.) The basic argument they are making is that it’s illogical and futile to try and estimate a player’s run production when we already know what it is. We simply have to look at their R and RBI, and there we have it! With that attitude, however, we would never be able to place Player A and B on a more level playing field to better compare them. We would never be able to take an educated guess as to how Player C would have done with the Yankees.

Back to the beginning, the authors seem to think that it’s possible to create an estimator that nails the parameter being estimated 100% of the time, and anything less is a dismal failure.

According to Bill James the runs created formula works 90% of the time and “…it will almost always give you a total with 5% (James 2003)” of the actual runs scored by a team or individual. Using this formula he claims that “…you can make a pretty good estimate of how many runs a hitter has created (James 2003).” Umm…pretty good? Within 5%? Almost always? 90% of the time? What about the other 10% of the time? Are we supposed to guess how many runs a player or team created in those instances? Why couldn’t Bill James create a formula which was right on the money 100% of the time instead of one which provides only a rough estimate (within 5%) of the actual numbers? (pp. 9-10)

Anyone who’s taken a basic statistics course should be familiar (perhaps with a bit of review) with the concept of a confidence interval. The three parts of a confidence interval are the estimate, the margin of error, and the level of confidence. What James has stated matches that concept quite well. Yet the authors seem to dismiss the idea of a confidence interval out of hand in this case as being absurd. They’re stuck on this idea that estimating (as described above) is a dumb idea.

And just what about the other 10% of the time? What about those players whose actual runs and estimated runs created fall more than 5% apart? Where the authors cringe and see this as failure, I immediately see this as one of the best potential uses of such an estimate. These players that fall outside the norm are exactly those players who were victims of or who benefited most from their circumstances! Identifying these players would be, in my mind, one of the points of the whole exercise.

James presents a formula which is supposed to estimate how many runs a player has created over a given period of time. (pg. 9)
…

He states that, if “pretty good” is not good enough, “…there are four things you can do to make the estimate more accurate (James 2003).” The key word in that quote is estimate… (pg. 10)

Emphasis not added. In fact, the emphasis they added is the point of the above quotes.

The authors also seem to have a problem with complex formulas. Even though the formula they present isn’t really that complex. It is simply a linear combination of the offensive statistics that I’ll repeat so you can see for yourself.

Therefore, the “B” factor became this God-awful bouillabaisse of numbers:

(1.125 x 1B) + (1.69 x 2B) + (3.02 x 3B) + (3.73 x HR) + .29 x (BB – IBB + HBP) + .492 x (SH + SF + SB) – .04 x K (pg. 13)

The authors should familiarize themselves with the concept of parsimony. In mathematical and statistical modeling, we say that a model is parsimonious if it is only as complex as it needs to be. Is the above model parsimonious? I don’t know, but the authors certainly don’t use that standard when judging the formula overly complex, however.

The authors know exactly what James is trying to accomplish with his runs created formula, but they still don’t seem to understand or accept the merits of it:

In his new version of the runs created formula, James attempted to put the hitter into what he calls a “neutral solution” of eight “ordinary hitters.” The idea behind this solution is to see how many runs a “typical team” creates with and without the hitter in question. This scenario purports to place the hitter in a neutral solution which could be applied to any other batter but, as you will see later on in this chapter, that assumption is false. (pg. 13)

The only attempt to show this assumption false that I could find is the following:

However, Pujols would have to go 1 for 2 with a home run and two walks, without grounding into a double play, in order to “create” two runs (actually 2.1085 according to the runs created formula). Does it matter if the home run was a grand slam or a solo shot? Of course not. Runs batted in are a mere triviality in Bill James’ runs created formula. Pujols could have driven in anywhere from one to seven men (had the walks come with the bases loaded and the out been a sac fly) yet, through the runs created formula, he is credited with two runs, regardless of the hard factual numbers. Finally, we have to agree with Bill James. This is illogical. (pg. 17)

To pound it into the ground once again, the formula is not ignoring the “hard factual numbers”. What it is doing is attempting to estimate the average numbers of runs created for a batter that went “1 for 2 with a home run and two walks, without grounding into a double play”. Frankly, I do not see why the authors have a problem with that exercise.

They also have these mind-boggling sentences:

Next, we subtract the hitter’s expected hits with runners in scoring position (yes, his expected hits, so James is speculating once again) from the product. This is done by multiplying the player’s batting average by his at bats with runners in scoring position. He then subtracts the hitter’s expected (we know, we know…) home runs with men on base. (pg. 14)

If a batter hits 0.300, how many hits would you expect to see (on average) out of 10 at bats with runners in scoring position? Three sound about right? If I flip a coin 10 times, how many times would you expect to see heads come up (on average)? Five sound about right? Why is this concept so foreign to them? Now, one could quibble and say that a player’s average with men on base really is different from his average without men on base. That would be an argument with merit, but the authors don’t make it! Perhaps they do in a roundabout way, but that’s stretching it.

Then the authors go and criticize the runs created formula because it can’t be used on a game-by-game basis!

If it is the first game of the season, Pujols has no batting average with runners in scoring position. Because of this, there is no way to figure his expected hits in that situation. Similarly, if it is the second game of the season and Albert was 1 for 1 with men on base in the previous game, and in that at bat he hit a homer, he can not be expected to go yard each time that particular situation arises. Yet that is what James’ formula calls for. The fact that runs created can only be figured given a large sampling of data should limit its usage without question. (pg. 17)

They have just said that an estimator that can’t provide accurate estimations for extremely small samples is one that should be tossed in the garbage. I wish to estimate the average height of Texas Tech students. If I can’t find a way to do it that works just as well with a sample of 1-2 students as it does with a sample of 100-200 students, then I need to give up? Absurd!

I’ll close with what I think is the most absurd statement in the entire chapter:

Fair question: what is missing from this picture? Bill James has actually done a pretty fair job of attempting to incorporate every event which can occur while a hitter is at the plate into his runs created formula. He has worked all of his “run elements” into the equation including hits, walks, at-bats, and sacrifices, among others. However, James is attempting to find runs created, yet he uses neither runs scored nor runs batted in anywhere in his formula. (pg. 18)

I am a wildlife researcher studying black bears in Louisiana. I tranquilize a bear and determine its sex, measure it’s length, head girth, and chest girth. However, me and Slim Jim can’t really carry around a scale capable of weighing a 200-400 pound bear without putting additional significant stress on the bear. Nonetheless, we still want to determine how much the bear weighs. Our only recourse is to estimate it. Fortunately for us, there’s a formula that I can plug the measurements taken above to provide us with just such an estimate. (Of course, some bears had to be actually weighed and measured to come up with the formula, but we didn’t have to do that. Researchers before us did all that work.)

So here’s a question: Should the weight of the bear be one of the measurements I plug into an equation meant to estimate the weight of the bear? According to the block quoted above, the authors indicate they would answer that in the affirmative! Because James’s runs created formula does not include the very things it’s trying to estimate, they hold it up as another reason that it is illogical, invalid, useless, pick your word. Literary surgery, indeed. Severed with logic and reason, indeed.

Forum: basePage 18971Subject: A lengthy review of Covering the Bases

Forum: base
Page 18971
Subject: A lengthy review of Covering the Bases