**R 语言： Fisher's Exact Test 脚本**

**1. Why Fisher's Exact Test?**

Because Chi-squared Test is not so accurate when the expected frequency is less than 5.

**2. How to use the script compute_fisher.r?**

It's very easy. Just copy all the code into R and change the fist line

**setwd()**to the directory where you put contingency_table.txt?

**3. What's contingency_table.txt?**

It's a file for your data. It has

**5 fields seperated by tab**:

(1)

**Item**is the linguistic unit you are studying. It can be word, n-gram, grammatical strutures, etc.

(2)

**O11 - O22**is the observed frequency in the contingency table. O11 means first row, first column; O12 means first row, second column ...

If you are familiar with Chi-squared Test, there will be no problem understanding this format.

4. Where's the result and how to interprete it?

4. Where's the result and how to interprete it?

The result is

**fisher_stat.txt**.

In it you will find two fields are added to the original contingency table:

**odds_ratio**and

**p_value**.

**The smaller the p_value, the more significant the frequency distribution.**

If odds_ratio > 1, the linguistic unit is overused for the first row of contingency table. Otherwise, it is underused.

5. Can this script be used for contingency tables other than 2 x 2?

5. Can this script be used for contingency tables other than 2 x 2?

NO. For multivariable contingency tables, you have to consider using Logistic Regression or Loglinear Model.

#### 附件

Last edited: