I would like to ask for help about the extract of Passive and Active in a corpus tagged with CLAWS tagset 7.
According to Corpus-Based Languages Studies, the search pattern for passive are the followings
*_VB* *_V?N
*_VB* *_R* *_V?N
*_VB* *_R* *_R* *_V?N
*_VB* *_N* *_V?N
*_VB* *_N* *_R* *_V?N
*_VB* *_N* *_R* *_R* *_V?N
*_VB* *_P* *_V?N
*_VB* *_P* *_R* *_V?N
*_VB* *_P* *_R* *_R* *_V?N
After extracting them, I need to standarzied them since I have to compare results from 2 corpus.
However, as two corpsu are not in the same size. What should I take as the base in standarizing them?
(1) Should I use verb or the sentnece as base?
(2) How can I extract the frequency of sentence in ACTIVE VOICE?
(3) How can I get the total number of voices( Both Passive and Active)?
Your replies must be of a great help to me. Thanks.
According to Corpus-Based Languages Studies, the search pattern for passive are the followings
*_VB* *_V?N
*_VB* *_R* *_V?N
*_VB* *_R* *_R* *_V?N
*_VB* *_N* *_V?N
*_VB* *_N* *_R* *_V?N
*_VB* *_N* *_R* *_R* *_V?N
*_VB* *_P* *_V?N
*_VB* *_P* *_R* *_V?N
*_VB* *_P* *_R* *_R* *_V?N
After extracting them, I need to standarzied them since I have to compare results from 2 corpus.
However, as two corpsu are not in the same size. What should I take as the base in standarizing them?
(1) Should I use verb or the sentnece as base?
(2) How can I extract the frequency of sentence in ACTIVE VOICE?
(3) How can I get the total number of voices( Both Passive and Active)?
Your replies must be of a great help to me. Thanks.