## Problem

A survey was conducted among $320$ families, each with $5$ children. The gender distribution among the children is provided below. Is the data consistent with the hypothesis that male and female births are equally probable?

No. of boys:$5$$4$$3$$2$$1$$0 No. of girls:0$$1$$2$$3$$4$$5$
No. of families:$14$$56$$110$$88$$40$$12 ## Solution Let p_b the proportion of the children in the population that are boys and p_g be the proportion of the children in the population that are girls. The null hypothesis is H_0: the number of families in each category satisfies the binomial distribution with p_b=p_g=0.5 and the alternate hypothesis is H_a: H_0 is not true. ### Test Statistic and p-value computation CategoryObserved (O_i)Expected(E_i) 5B,0G$$14$${5 \choose 0}\cdot0.5^5\cdot 320= 10 4B,1G$$56$${5 \choose 1}\cdot0.5^5\cdot 320= 50 3B,2G$$110$${5 \choose 2}\cdot0.5^5\cdot 320= 100 2B,3G$$88$${5 \choose 2}\cdot0.5^5\cdot 320= 100 1B,4G$$40$${5 \choose 1}\cdot0.5^5\cdot 320= 50 0B,5G$$12$${5 \choose 0}\cdot0.5^5\cdot 320= 10 The value of the Chi-square test statistic is$$ \sum_{i=1}^6 \frac{(E_i-O_i)^2}{E_i} = 7.16 $$The number of degrees of freedom is 6-1=5. p-value is \mathbb{P}(\mathcal{X}_5^2 > 2) \approx .21. Therefore, we can accept H_0 at 5% level of significance. ## The Chi-Square Test of Homogeneity Suppose that we have independent observations from J multinomial distributions, each of which has I cells, and that we want to test whether the cell probabilities of the multinomials are equal—that is, to test the homogeneity of the multinomial distributions. If the probability of the ith category of the jth multinomial is denoted \pi_{ij}, the null hypothesis to be tested is H_0: \pi_{i1} = \pi_{i2} = \cdots = \pi_{iJ}, i = 1, \dots, I. We may view this as a goodness-of-fit test: Does the model prescribed by the null hypothesis fit the data? To test goodness of fit, we will compare observed values with expected values using Pearson’s chisquare statistic. We will assume that the data consists of independent samples from each multinomial distribution, and we will denote the count in the ith category of the jth multinomial as n_{ij}. Under H_0, each of the J multinomials has the same probability for the ith category, say π_i. The following theorem shows that the mle of \pi_i is simply n_i/n, which is an obvious estimate. Here, n_i is the total count in the ith category, n is the grand total count, n_{.j} is the total count for the jth multinomial. ### Theorem Under H_0, the mle’s of the parameter \pi_i, \pi_2, \dots, \pi_j are$$ \hat{\pi}_i = \frac{n_i}{n}, \text{$i=1,\dots, I} $$where n_i is the total number of responses in the ith category and n is the grand total number of responses. For the jth multinomial, the expected count in the ith category is the estimated probability of that cell times the total number of observations for the jth multinomial, or$$ E_{ij} = \frac{n_in_{.j}}{n} $$Pearson’s chi-square statistic is therefore$$ \begin{align*} \mathcal{X}^2 &= \sum_{i=1}^I\sum_{j=1}^J \frac{(O_{ij}-E_{ij})^2}{E_{ij}}\\ &= \sum_{i=1}^I\sum_{j=1}^J \frac{(n_{ij}-n_{i}n_{.j}/n)^2}{n_{i}n_{.j}/n} \end{align*} $$For large sample sizes, the approximate null distribution of this statistic is chi-square. (The usual recommendation concerning the sample size necessary for this approximation to be reasonable is that the expected counts should all be greater than 5.) The degrees of freedom are the number of independent counts minus the number of independent parameters estimated from the data. Each multinomial has I − 1 independent counts, since the totals are fixed, and I − 1 independent parameters have been estimated. The degrees of freedom are therefore$$ df = J (I − 1) − (I − 1) = (I − 1)(J − 1) $$## Problem A public opinion poll surveyed a simple random sample of 1000 voters. Respondents were classified by gender (male or female) and by voting preferences (Republican, Democrat or Independent). Based on the contingency table below, use an appropriate statistical technique to identify if the men’s voting preferences differ significantly from the women. -RepublicanDemocratIndependentTotal Male213$$141$$54$$408
Female$251$$299$$42$$592 Total464$$440$$96$$1000$

## Solution

In the above problem, we have $J=2$ as there are two multinomial distributions, one each for male and female. As there are three categories, Republican, Democrat and Independent, $I=3$.

We have

$$E_{RM} = \frac{464 \cdot 408}{1000} = 189.3\\ E_{RF} = \frac{464 \cdot 592}{1000} = 274.7\\ E_{DM} = \frac{440 \cdot 408}{1000} = 179.5\\ E_{DF} = \frac{440 \cdot 592}{1000} = 260.5\\ E_{IM} = \frac{96 \cdot 408}{1000} = 39.16\\ E_{IF} = \frac{56 \cdot 592}{1000} = 56.8$$

The following table gives the observed count and, below it, the expected count in each party for both males and females

-MaleFemale
Republican Observed$213$$251 Republican Expected189.3$$274.7$
Democrat Observed$141$$299 Democrat Expected179.5$$260.5$
Independent Observed$54$$42 Independent Expected39.16$$56.8$

The value of the chi square statistic is $28.44$. The number of degrees of freedom is $2$.

$\mathbb{P}(\mathcal{X}_2^2 > 28.44) \approx 0.0$. Since the p-value (0.0000) is less than the significance level (0.05), we reject the null hypothesis.