RotoGuru Baseball Forum

View the Forum Registry

XML Get RSS Feed for this thread


Self-edit this thread


0 Subject: The Baseball Prospectus

Posted by: Steve Biz - [53243810] Wed, Mar 08, 2006, 11:46

Guru mentioned 4 sources that he used for projections this year. All were internet sources and I'm curious if anyone else is using The Baseball Prospectus to help them with their projections this year? Or if you've used it in the past, what are your opinions? I'm trying it out for the first time this year and their projections seem pretty great. Thanks.
1blue hen
      ID: 38135621
      Wed, Mar 08, 2006, 11:55
Not this year specifically, but I have used them in the past. They are great. Really, really great. They also have a book that comes out which I also enjoy, and a premium web site that is also really good.
2Guru
      ID: 330592710
      Wed, Mar 08, 2006, 12:11
I used them last year. Looking back, here are some of the impact players that they projected significantly better or worse than other services:

Better:
Troy Glaus
Adam Dunn
Jim Edmonds
Alfonso Soriano
Derrek Lee
Johnny Damon
Andruw Jones
Scott Podsednik
Randy Johnson
Pedro Martinez
John Smoltz
Dontrelle Willis

Worse:
Tim Hudson
Carlos Delgado
Francisco Rodriguez
Carl Crawford
Ichiro Suzuki
Mariano Rivera
Miguel Tejada
Adrian Beltre
Chipper Jones
Marcus Giles
Joe Nathan
Rich Harden


Some of those differences turned out to be inspired, and some missed the mark. I suspect I could come up with a similar list for any service.
3biliruben
      Leader
      ID: 589301110
      Wed, Mar 08, 2006, 12:40
I've bought their book every year for the last 4, and it is very good (~$12). Their projection system is a bit odd and takes some getting used to, but I love their commentary. They take each team and delve into their apparently plan and vision and shred or applaud past decisions in attempting to putting a competitive on the field.

And back it up with stats. Lots of stats. Yum.

I just drafted my home, live, drink-lots-of-beer-and-make-horrible-decisions league last weekend, and I'm a bit peaved to have not recieved my book yet. Terrible draft, and I blame their tardy butts!

I've never subscribed to their website, but the free stuff they offer is always good.
4Steve Biz
      ID: 53243810
      Wed, Mar 08, 2006, 14:26
Yeah I got the book and it seems pretty good. How about this prediction for Johan Santana: 16 W, 232 K, 3.00 ERA, 1.04 WHIP. I think those numbers are pretty realistic.
5Guru
      ID: 330592710
      Wed, Mar 08, 2006, 14:48
I just did a quick test of last year's projections.

I ran a correlation of the rankings based on each projection from a year ago against a similar ranking based on full year 2005 actual stats.

Obviously, there are some players who, due to injury, were missed by everyone.

But I tried looking at the top 200 hitters, and the top 150 pitchers, just to see how correlated the results were. Here are the correlation coefficients:

Sample RotoWire RotoWorld RotoTimes Baseball
Prospectus
average
Top 200 hitters 59.5% 64.6% 64.1% 61.9% 64.7%
Top 150 pitchers 51.4% 56.5% 56.6% 57.6% 58.3%

On the surface, it looks like they were fairly comparable. RotoWire seems to have done a worse job for both pitchers and hitters - maybe meaningfully worse. RotoWorld was the best. And the averages of the four sets were slightly better than any of the individual sets.

I don't know if these differences are statistically meaningful. But it's interesting, nonetheless.

6biliruben
      ID: 531202411
      Wed, Mar 08, 2006, 17:33
Nice to know that rotowire hasn't changed much!

Thanks, Guru.

Anyone heard anything about Fantistics?
7biliruben
      Leader
      ID: 589301110
      Wed, Mar 08, 2006, 18:45
Just recieved the book in the mail!

Now I can confirm how crappy my draft was.
8Stuck in the 60s
      Dude
      ID: 274132811
      Wed, Mar 08, 2006, 18:55
I use fantistics and their drafting software is outrageous!
I can track any number of drafts and it projects out the top players based on a particular league's settings.

It's my first year, but I'm loving the results so far.

Don
9biliruben
      Leader
      ID: 589301110
      Wed, Mar 08, 2006, 19:01
One of my live-drafters used it in the league I run locally, and he drafted better than he usually does, granted, but there were also some completely off-the-wall picks, because he was trying to maintain a "balanced squad."

Guys that shouldn't have been drafted at all in this shallow league, but they had the right mix of BA and runs or something.
10 Steve Biz
      ID: 47247818
      Wed, Mar 08, 2006, 19:48
Thanks for the awesome data Guru- that's why I always come here first. How do you go about averaging all of the different projections Guru? That seems like an awfully time consuming task. For example, how do you have all of that data in a workable form on your computer?
11Guru
      ID: 330592710
      Thu, Mar 09, 2006, 11:04
Because I'm a pack rat.

My general approach is to rank each roto scoring category using a mean and standard deviation. A category ranking of 0 would mean that the player is projected to be at the mean for the category. A ranking of +1 means that the player is projected to be one standard deviation above the mean. I add up these ranking stats for each category to produce a composite ranking for the player. There is a bit of judgement, as the means are calculated only for the universe of players that are expected to be drafted.

The calculation for average categories (like ERA, WHIP, and batting average) is more complicated. You can't simply calculate the standard deviation of ERA. I solve this by calculating the marginal impact on a team ERA assuming that I have the player in question plus a roster-full of average players. That is the stat that I calculate the mean and S.D. for.

If I apply this same process to each set of projections, I develop a composite ranking value for each player. I then line up these values in a spreadsheet that allows me to sort by any projection set, or by the average, so that I can easily screen for consensus rankings and also identify those players that have significantly different projections from different sources.

I also ran the same process on last year's actual stats, which gives me a basis for comparing the projected rankings vs. the related actual stats.

On the surface, a correlation coefficient of only 60+% might not sound so hot, but there are always some players who are significant mis-projected - last year's examples include:

Pitchers who most significantly exceeded projections: Dontrelle Willis, Todd Jones, Andy Pettitte, Chris Carpenter, Jon Garland

Pitchers who most significantly underperformed projections: Curt Schilling, Jason Schmidt, Zack Greinke, Oliver Perez, Joel Pineiro, Eric Gagne

Hitters who most significantly exceeded projections: Grady Sizemore, Derrek Lee, Tony Clark, Felipe Lopez, Chase Utley

Hitters who most significantly underperformed projections: Barry Bonds, Jim Thome, Scott Rolen, Carlos Beltran, Jeff Bagwell, Corey Patterson

On the margin, all of the projections for those players were way off. It would probably be a better test if players who had significant injuries were eliminated from the sample, but that's more work than I wanted to do.
12biliruben
      ID: 531202411
      Thu, Mar 09, 2006, 11:26
It would probably be a better test if players who had significant injuries were eliminated from the sample...

I'm don't think that's right.

I guess it depends on what question you are trying to answer, and removal would be appropriate if you were asking "given my player stays reasonably healthy, how will impact my fantasy team?"

But that's generally not the question you are trying to answer.

For instance, PECATA (BP's projection system) takes into account body type and injury history and a bunch of other things (It's been a while since I read the details) and attempts to come up with projection based on the performance of other players with those similar traits. For example they project that Nomar will have only 321 plate appearances and 9 dingers. They are trying to answer how much you can count on Nomar producing this year, taking into account that he might get injured.

Perhaps a better way would be to meld Nomar's stats with some "top WW replacement player" (it seems to be Randy Winn the last few years in my shallow league) and come up with the best projection for the impact Nomar will have on your team, which likely includes a few hundred ABs from Randy Winn.

Also, I don't really understand why you can't calculate a SD for ERA?
13Guru
      ID: 330592710
      Thu, Mar 09, 2006, 11:51
I don't really understand why you can't calculate a SD for ERA?

You can calculate it, but it's not particularly meaningful, because a pitcher with 250 IP has a much greater impact on a team ERA than a pitcher with 80 IP.

For example, Joe Nathan had an ERA of 2.70 last year. Johan Santana had a 2.87 ERA, and John Smoltz finshed at 3.06. But Santana and Smoltz would each have improved an average team's ERA more than Nathan would have.
14blue hen
      ID: 38135621
      Thu, Mar 09, 2006, 11:53
But what if, as in 2005, Randy Winn overachieves?
15Guru
      ID: 330592710
      Thu, Mar 09, 2006, 11:57
BTW, I noticed an error in my table in post 5. The pitchers stats reflect only the top 100 pitchers, not the top 150. The correlations for the top 150 pitchers were all lower, with RotoWire still the worst (41%). RotoWorld was the best (48.6%), followed by RotoTimes (47.5%) and BP (46.6%). The average of the 4 was 48.2%.

Actually, the best correlations seem to be the average of three sets (excluding RotoWire). The RotoWire-less averages have slightly higher correlations than the 4-set averages.
16biliruben
      ID: 531202411
      Thu, Mar 09, 2006, 12:06
I see, Guru. My fix for that is to use a stat weighted using IP, but I realize now that in my rush last week, I didn't incorporate that with my standardized values, and don't really know how I would, now that I think about it. Could you give a bit more detail on your method? I thought I had this figured out last year, when I got fancy and brought it into a stat package. No time for that this year, but I'll go back and look at the code.

BH - I was just using Winn as an example. You could substitute the stats of the best guy at the top of the WW list in August last year if you wanted.

My sim hoops league uses Mark Macon's stats from 20 years ago for a slightly different purpose, but you can some fictional (not fantasy) player's stats if you think that would be more representative of the player's stats you would have to use after Nomar gets hurt.
17blue hen
      ID: 38135621
      Thu, Mar 09, 2006, 12:10
This is only one year. 2004, anyone?
18Guru
      ID: 330592710
      Thu, Mar 09, 2006, 13:18
I seem to recall the BP actually did a projection comparison for some earlier season (maybe not 2004), which was posted at their site last year. As I recall, they fared well, and RotoWire was again near the bottom.

bili - one way to do the ERA ranking:

Assume that the league average ERA is 4.00. (This number will depend on the number of pitchers who are in play, and/or the IP limit) If you have an IP limit of 1300 innings, then calculate a team average ERA assuming that you have one specific pitcher and the rest of your staff is average.

For example:
Santana had a 2.87 ERA in 232 IP. If you add in 1168 IP with a 4.00 ERA, the team average is 3.80.

Nathan had a 2.70 ERA in 70 IP. Add in 1230 IP at 4.00, the team average is 3.93.

So the ERA ranking stat for Santana is 3.80, and the stat for Nathan is 3.93. Calculate the mean and SD for those.
19Ref
      Donor
      ID: 539581218
      Thu, Mar 09, 2006, 13:45
Do any of these sites use obp%? I always see avg. but I can't ever seem to find one that allows you to input your own league's scoring system that allows you to include obp% over avg.
20Guru
      ID: 330592710
      Thu, Mar 09, 2006, 13:51
All provide raw projected stats, so you can easily calculate OBP.
21biliruben
      Leader
      ID: 589301110
      Thu, Mar 09, 2006, 13:56
Thanks, Guru. Seems reasonable.

Here's the BP comparison of projected stats.

Here's last year's discussion.
22Steve Biz
      ID: 35254108
      Fri, Mar 10, 2006, 10:18
Looking at the The Baseball Prospectus projections from top to bottom, it occurs to me that they're predictions are rather timid. The highest win total for any pitcher is 16 (Santana), and only a handful have more than 14 wins. It seems that most (I did not do a formal statistical analysis) of their ERA and WHIP projections are worse than the actual stats the pitcher posted in '05.

I was thinking another interesting way of analyzing how good these services are is if you did this, using the projections that a service provides rank every player and sort him at his position. Assign every player a number based on their ranking at their position (with 1 being the best player, AROD at third base for example would get a 1). Correlate those numbers with what their actual ranking ends up being at each position. The results may be very similar, but I'm not sure.
23biliruben
      Leader
      ID: 589301110
      Fri, Mar 10, 2006, 12:36
Prospectus does like to err on the side of the conservative projection. Rotowire, for the young players they adore, does the opposite, often giving way to rosy a projection.

Maybe averageing them would be good!

Why do you think ranking is better than looking at the raw stats to assess the predictive ability?
Rate this thread:
5 (top notch)
4 (even better)
3 (good stuff)
2 (lightweight)
1 (no value)
If you wish, you may rate this thread on scale of 1-5. Ratings should indicate how valuable or interesting you believe this thread would be to other users of this forum. A '5' means that this thread is a 'must read'. A '1' means that this is a complete waste of time.

If you have previously rated this thread, rating it again will delete your previous rating.

If you do not want to rate this thread, but want to see how others have rated it, then click the button without entering a rating, or else click here.

RotoGuru Baseball Forum

View the Forum Registry

XML Get RSS Feed for this thread


Self-edit this thread




Post a reply to this message: (But first, how about checking out this sponsor?)

Name:
Email:
Message:
Click here to create and insert a link
Click here to insert a random spelling of Mientkiewicz
Ignore line feeds? no (typical)   yes (for HTML table input)


Viewing statistics for this thread
Period# Views# Users
Last hour11
Last 24 hours11
Last 7 days22
Last 30 days1513
Since Mar 1, 20072012758