**R 语言： Fisher's Exact Test 脚本**

**1. Why Fisher's Exact Test?**

Because Chi-squared Test is not so accurate when the expected frequency is less than 5.

**2. How to use the script compute_fisher.r?**

It's very easy. Just copy all the code into R and change the fist line

**setwd()**to the directory where you put contingency_table.txt?

**3. What's contingency_table.txt?**

It's a file for your data. It has

**5 fields seperated by tab**:

(1)

**Item**is the linguistic unit you are studying. It can be word, n-gram, grammatical strutures, etc.

(2)

**O11 - O22**is the observed frequency in the contingency table. O11 means first row, first column; O12 means first row, second column ...

If you are familiar with Chi-squared Test, there will be no problem understanding this format.

4. Where's the result and how to interprete it?

The result is

**fisher_stat.txt**.

In it you will find two fields are added to the original contingency table:

**odds_ratio**and

**p_value**.

**The smaller the p_value, the more significant the frequency distribution.**

If odds_ratio > 1, the linguistic unit is overused for the first row of contingency table. Otherwise, it is underused.

5. Can this script be used for contingency tables other than 2 x 2?

NO. For multivariable contingency tables, you have to consider using Logistic Regression or Loglinear Model.

