RotoGuru Baseball Forum

View the Forum Registry

XML Get RSS Feed for this thread


Self-edit this thread


0 Subject: Stats Question -- For Experts in stats

Posted by: Toral
- [575542418] Fri, Jan 30, 2009, 12:57

This is a question for statistical experts. Not folks who know what is the record for most consecutive seasons with at least 5 wild pitches by a left-handed pitcher, but for folks who do correlations, regressions, and stuff like that.

The question is this:

Assume that Team A has .560 talent and plays .560 ball consistently throughout the season. Team B has .440 talent and plays .440 ball throughout the season.

After how many games, in a 162-game schedule, could someone who knew nothing in advance about the talents of the teams, but knew that teams play according to their talents, say with 95% certainty that team A was better than Team B?

==================================================

To try to anticipate questions, with a simpler model of .600 and .400 teams, after 10 games, Team A is 6-4, Team B is 4-6. Proves nothing.

After 40 games, Team A is 24-16, Team B is 16-24. In this model there are no injuries that make a team better later than it is earlier. After 40 games...I'd tend to believe that Team A was better than team B, but not anything near certainty.

After 60 games, 36-24 vs, 24-36. That's one I'd be about ready to call, but not quite.

So when?
---------------------------------------------

2. The question is about determining which team is better *NOT* which will have the better record that season. (will explain upon request)

I know there are formulae for things because I've seen them used.

Toral
1blue hen
      Dude
      ID: 710321114
      Fri, Jan 30, 2009, 13:30
I've only taken basic statistics, but there's a question like this that was addressed.

When you have 30 or more items in your sample, you are dealing with something that is statistically significant. That is, it's much safe to assume that the standard deviation of your sample is the standard deviation of the entire population.

So why does standard deviation matter? Well, it turns out that 95% of all data falls within two standard deviations of the mean. If the standard deviation of wins was 4, you could assume that 95% percent of teams would fall within 8 wins of their current level of performance.

One problem here is that it doesn't mean 30 games. It means 30 teams who have had that winning percentage, and you can calculate how they finished to get a standard deviation.

The bigger problem is wins and losses itself. A 4-6 team can have outscored its opponents by 20 runs. You'd get a better measure if you used run-ratio instead of winning percentage. But what about the team that is very unlucky and happens to have left a lot of men on base? Now, you're better off finding a way to estimate runs based on player stats rather than actual runs.

... and so forth. There is no perfect metric to compute the value of a team's performance. The best you can do is estimate. But the more data you have, the better the estimate.

I think that should give you the answer you want, but I'll let Sludge or Madman explain it better.
2Toral
      ID: 575542418
      Fri, Jan 30, 2009, 13:41
Yeah I was hoping Sludge would magically show up. Madman would be good to point out holes in the hypothesis but I don't think I've ever see him run the stats on something like this. Madman was smart. He never argued with a professor of statistics about statistics, with a lawyer about law, with a former soldier about military tactics etc.

Your points about runs scored and runs against being a better measure are well taken, but I tried to deal with them with my assumptions about team talent -- Team A has ".560 talent". That should be reflected in their Runs/RA.

Toral
3Sludge
      ID: 16109168
      Fri, Jan 30, 2009, 13:49
Let p be the probability that Team A wins. Then (1-p) is the probability that Team B wins. What you're asking for is the power (http://en.wikipedia.org/wiki/Statistical_power) of the test of the (null) hypothesis H0:p = 0.5 vs. the alternative hypothesis H1:p != 0.5. I won't bore you with the calculations (see the Wikipedia link above and the external links it gives as well), but here's your answer assuming the standard alpha=0.05 test and a power of 0.95 (95% chance of rejecting H0):


p Num Games
----------------
0.51 32481
0.52 8116
0.53 3604
0.54 2025
0.55 1294
0.56 897
0.57 658
0.58 502
0.59 396
0.60 319
4Sludge
      ID: 16109168
      Fri, Jan 30, 2009, 13:51
You say my name, and 20 minutes later I have an answer! Voila!

How's everyone been doing? Haven't been around much, obviously! Nice to know I'm still thought of occasionally! :)
5dpr
      ID: 13443116
      Fri, Jan 30, 2009, 14:02
i think the question doesn't have the teams playing each other so you can't conclude that the teams winning percentages are dependent.

Seems the hypothesis would be A=B or A-B=0. In the case of his example A-B equals .120.

I forget my statistics to remember how to calculate the significance tho. I am confused on the standard error and how we calculate it
6Sludge
      ID: 16109168
      Fri, Jan 30, 2009, 14:04
Ah, right, right... two sample test of proportions. Also just as easily done.

But to answer your immediate question dpr, the standard error of a sample proportion p-hat = # successes / # trials is sqrt(p*(1-p)/n).
7Sludge
      ID: 16109168
      Fri, Jan 30, 2009, 14:13
So now p1 = probability team A wins, and p2 = Probability team B wins. Power of 0.95, and alpha of 0.05, you get the following (Num games is the number of games each team has to play):


p1 p2 Num games
---------------------
0.51 0.49 16237
0.52 0.48 4055
0.53 0.47 1799
0.54 0.46 1009
0.55 0.45 644
0.56 0.44 445
0.57 0.43 326
0.58 0.42 248
0.59 0.41 195
0.60 0.40 156


Note that p1 and p2 do not have to add to 1, but I did those calculations in the spirit of Toral's question, and just to give a "feel" for what the powers are.

Also, to be a bit more clear (or, to say it another way), these are the number of games that would need to be played to have a 95% chance of being 95% sure that one team is better than the other.
8dpr
      ID: 13443116
      Fri, Jan 30, 2009, 14:23
So since teams rarely go over .600 or below .400 you are saying that even over the course of a whole season we cant say that one team in better then any other?

Also when calculating the standard errors dont the p's have to be equal for each team since in the null hypothesis we assume that they are the same skill level and should have the same chance of winning
9Sludge
      ID: 16109168
      Fri, Jan 30, 2009, 14:25
Oh! And in my usual scatterbrained rush to type, I left out a couple points.

This assumes that:
(a) The probability of a win by Team A and Team B is constant for every game played, and
(b) The games played by A are independent of the games played by B.
10Sludge
      ID: 16109168
      Fri, Jan 30, 2009, 14:28
Under the null hypothesis H0:p1=p2, so we calculate a pooled estimate of the common proportion as phat = (# wins for A + # wins for B) / (# games for A + # games for B) and use that in the standard error calculation:

s.e. = sqrt(phat*(1-phat) * (1/n1 + 1/n2))

At least that's how the standard error is estimated when calculating the test statistic.
11Sludge
      ID: 16109168
      Fri, Jan 30, 2009, 14:36
So since teams rarely go over .600 or below .400 you are saying that even over the course of a whole season we cant say that one team in better then any other?

No, it's not saying that at all. What it's saying is that you'd need, e.g., 156 games to have a 95% chance to be 95% sure that one team is better than another when the actuality is that one has a probability of 0.6 of winning and the other has a 0.4 probability of winning.

With 162 games, you'd have a 44% chance of being 95% sure that one team is better than another if p1 is 0.55 and p2 is 0.45, for example. There's still a decent chance that you'll be able to discriminate between the two, but it doesn't rise near the level that Toral is asking for.
12Toral
      ID: 575542418
      Fri, Jan 30, 2009, 14:42
Is this formula something I can save to a spreadsheet myself?

The 2 teams do play against eash other, FWIW.

I'm fine, Sludge. How are you?

I think I'm firing that .440 manager (an Earl Weaver clone, in an organization with no talent luck) after a third of a season or so. They need to move on. Haven't won a title in 8 game-season years, which in RL is 40 years. (This is an old Strat league which I proceed with from time to time.)

Toral
13boikin
      ID: 532592112
      Fri, Jan 30, 2009, 14:43
Could you not increase the power of the test by using an exact test?
14Sludge
      ID: 16109168
      Fri, Jan 30, 2009, 14:46
Yeah, Toral, it's something you can plug in. I have a defense in 20 minutes, and some other work that needs to be done by then. If you don't mind waiting a couple days, I can probably tell you what you need to compute it in Excel.
15Sludge
      ID: 16109168
      Fri, Jan 30, 2009, 14:49
boikin - At the sample sizes we're talking about, there is so little difference between the binomial (the exact distribution) and its normal approximation, that it doesn't matter.

However, if you're referring to modelling the nature of the dependencies in the data and the fact that they don't face uniform opposition every game, yes you can get more power in that case. The trick then is in determining a correct model and deriving an appropriate test under that model.
16boikin
      ID: 532592112
      Fri, Jan 30, 2009, 14:52
I think if my calculations are correct the power will increase if you use odds ratio instead of the probability of winning.
17Toral
      ID: 575542418
      Fri, Jan 30, 2009, 14:53
Thanx Sludge.

There's no hurry.

Be nice to that aspirant.

Toral
18Sludge
      ID: 16109168
      Fri, Jan 30, 2009, 14:58
boikin - I'm unsure what you mean by "odds ratio". Are you talking about using a logistic regression with a single binary factor?
19boikin
      ID: 532592112
      Fri, Jan 30, 2009, 15:02
just using the log odds ratio with St.Error of root(1/n11+1/n22+1/n21+1/n21).
20Ref
      Donor
      ID: 539581218
      Fri, Jan 30, 2009, 16:39
Well I just got my fill of NUMB3RS tonight. I swear like I felt I was watching the show while reading Sledge's posts.
21Sludge
      ID: 16109168
      Mon, Feb 02, 2009, 10:19

n = p*(1-p)*(z_beta + z_alpha/2)^2
------------------------------
(p1-p2)^2


Let p1 and p2 are the true values of the probability of a win for each team. Then p = (p1+p2)/2 (just the average of the two). z_beta is the z critical value corresponding to beta = the power of the test... this z_beta would be 1.645 for 95% power. z_alpha/2 is the z critical value corresponding to alpha = the level of the test... this z_alpha/2 would be 1.96 for level 0.05 (typical).

Depending where you look, you'll get slightly different numbers depending on whether they used a continuity correction or not and, probably, other factors. But this will get you, pun intended, in the right little league ballpark.
22Nerfherders
      ID: 347242717
      Tue, Feb 03, 2009, 11:19
ack! math! *runs*
Rate this thread:
5 (top notch)
4 (even better)
3 (good stuff)
2 (lightweight)
1 (no value)
If you wish, you may rate this thread on scale of 1-5. Ratings should indicate how valuable or interesting you believe this thread would be to other users of this forum. A '5' means that this thread is a 'must read'. A '1' means that this is a complete waste of time.

If you have previously rated this thread, rating it again will delete your previous rating.

If you do not want to rate this thread, but want to see how others have rated it, then click the button without entering a rating, or else click here.

RotoGuru Baseball Forum

View the Forum Registry

XML Get RSS Feed for this thread


Self-edit this thread




Post a reply to this message: (But first, how about checking out this sponsor?)

Name:
Email:
Message:
Click here to create and insert a link
Click here to insert a block of hidden (spoiler) text
Click here to insert a random spelling of Mientkiewicz
Ignore line feeds? no (typical)   yes (for HTML table input)


Viewing statistics for this thread
Period# Views# Users
Last hour11
Last 24 hours11
Last 7 days54
Last 30 days1412
Since Mar 1, 20072065774