Suppose we have the following data about the relationship between people's race and their belief in life after death (Agresti 2007 An Introduction to Categorical Data Analysis: 206)
(This is another way of presenting a 3 x 2 contingency table)
race belief freq
1 white yes 1339
2 white no 300
3 black yes 260
4 black no 55
5 other yes 88
6 other no 22
Let's enter these data in R:
> race <- factor(rep(c("white","black","other"),c(2,2,2)))
> belief <- factor(rep(c("yes","no"),3) )
> freq <- c(1339,300,260,55,88,22)
> data <- data.frame(race,belief,freq)
Now we can use loglinear model to fit the data. There are two possibilities:
(1) race and belief are not independent, i.e. there is an interaction between them
and
(2) race and belief are independent
For the first possibility, we can fit a model in R like
> fit_1 <- glm(freq~race*belief, family="poisson",data=data)
and you can find the result by command
> summary(fit_1)
we know from the result that the interaction between race and belief are not significant through the last column. Thus we'll turn to the second model:
> fit_2 <- glm(freq~race+belief, family="poisson",data=data) # Please note that I replace "*" in the first model with "+" here
This is a simple additive model since it assume there are no interactions between race and belief.
From
> summary(fit_2),
it is clear that this model fit the data better. We can make the conclusion that race and belief are independent.
This model can tell us more information than that. For example, what is exactly the distribution of belief across different races, quantitatively. Here is the answer:
The most possible ratio of "believe in life" and "not believe in life" across all races is (If the first model holds, we can find different ratios for each category of races):
exp(1.4985) = 4.5 # 1.4985 is the value in the column Estimate
and we can be 95% sure that this ratio will fall within the range:
exp(1.4985-1.96*0.0570) < ratio < exp(1.4985 + 1.96*0.0570) # 0.0570 is the value in the column Standard Error
It's cool, right? So instead of just reporting a simple chi-squared statistic, we can obtain much more from a contingency table by using loglinear model.
And this model can be extended to 3-way, 4-way ... n-way contingency tables.
(This is another way of presenting a 3 x 2 contingency table)
race belief freq
1 white yes 1339
2 white no 300
3 black yes 260
4 black no 55
5 other yes 88
6 other no 22
Let's enter these data in R:
> race <- factor(rep(c("white","black","other"),c(2,2,2)))
> belief <- factor(rep(c("yes","no"),3) )
> freq <- c(1339,300,260,55,88,22)
> data <- data.frame(race,belief,freq)
Now we can use loglinear model to fit the data. There are two possibilities:
(1) race and belief are not independent, i.e. there is an interaction between them
and
(2) race and belief are independent
For the first possibility, we can fit a model in R like
> fit_1 <- glm(freq~race*belief, family="poisson",data=data)
and you can find the result by command
> summary(fit_1)
we know from the result that the interaction between race and belief are not significant through the last column. Thus we'll turn to the second model:
> fit_2 <- glm(freq~race+belief, family="poisson",data=data) # Please note that I replace "*" in the first model with "+" here
This is a simple additive model since it assume there are no interactions between race and belief.
From
> summary(fit_2),
it is clear that this model fit the data better. We can make the conclusion that race and belief are independent.
This model can tell us more information than that. For example, what is exactly the distribution of belief across different races, quantitatively. Here is the answer:
The most possible ratio of "believe in life" and "not believe in life" across all races is (If the first model holds, we can find different ratios for each category of races):
exp(1.4985) = 4.5 # 1.4985 is the value in the column Estimate
and we can be 95% sure that this ratio will fall within the range:
exp(1.4985-1.96*0.0570) < ratio < exp(1.4985 + 1.96*0.0570) # 0.0570 is the value in the column Standard Error
It's cool, right? So instead of just reporting a simple chi-squared statistic, we can obtain much more from a contingency table by using loglinear model.
And this model can be extended to 3-way, 4-way ... n-way contingency tables.
Last edited: