R Fisher's Exact Test 脚本

R 语言: Fisher's Exact Test 脚本

1. Why Fisher's Exact Test?

Because Chi-squared Test is not so accurate when the expected frequency is less than 5.

2. How to use the script compute_fisher.r?

It's very easy. Just copy all the code into R and change the fist line setwd() to the directory where you put contingency_table.txt?

3. What's contingency_table.txt?

It's a file for your data. It has 5 fields seperated by tab:

(1) Item is the linguistic unit you are studying. It can be word, n-gram, grammatical strutures, etc.
(2) O11 - O22 is the observed frequency in the contingency table. O11 means first row, first column; O12 means first row, second column ...

If you are familiar with Chi-squared Test, there will be no problem understanding this format.

4. Where's the result and how to interprete it?

The result is fisher_stat.txt.
In it you will find two fields are added to the original contingency table: odds_ratio and p_value.

The smaller the p_value, the more significant the frequency distribution.
If odds_ratio > 1, the linguistic unit is overused for the first row of contingency table. Otherwise, it is underused.

5. Can this script be used for contingency tables other than 2 x 2?

NO. For multivariable contingency tables, you have to consider using Logistic Regression or Loglinear Model.


  • contingency_table.txt
    109 bytes · 浏览: 11
  • compute_fisher.txt
    985 bytes · 浏览: 12
Last edited: