Wednesday, March 11, 2009

Setting Hypothesis for Categorical Data

We looked at setting categorical variables yesterday. Now we look at setting hypothesis for this data.

Suppose are categories are
1 = Those less than 18
2 = Those 18-70
3 = Those 70 or more

and we hypothesize that

p1: 20% are less than 18
p2:70% are 18-70
p3:and 10% are 70 or more

To test this we set
H0 null hypothesis (innocent):
p1: 20% are less than 18
p2:70% are 18-70
p3:and 10% are 70 or more

H1 alternative hypothesis (guilty):
not null (H0)

To test this we would take samples from the population and assess how close the proportions are to our hypothesized values. I.E. how many in our sample are 18 and under and so on....

Then we would calculate "goodness of fit" for how close our sample is to our hypothesis, this can be found by:

X^2 = sum (n(i) - e(i))^2 / e(i)

Where n is estimated proportion from our sample, and e is the expected or hypothesized proportion.

example:
We take a sample of 100 people and find that 25 are under 18 (25%) (we hypothesized 20%)
Thus X^2 = (25-20)^2 / 20 = 5^2 / 20 = 25/20 = 1.25

From this we get the X^2 test statistic which can be used to find the probability of being close enough using the chi-square distribution.

No comments: