How to extract the frequency of Passive Voice and Active Voice

I would like to ask for help about the extract of Passive and Active in a corpus tagged with CLAWS tagset 7.

According to Corpus-Based Languages Studies, the search pattern for passive are the followings

*_VB* *_V?N
*_VB* *_R* *_V?N
*_VB* *_R* *_R* *_V?N
*_VB* *_N* *_V?N
*_VB* *_N* *_R* *_V?N
*_VB* *_N* *_R* *_R* *_V?N
*_VB* *_P* *_V?N
*_VB* *_P* *_R* *_V?N
*_VB* *_P* *_R* *_R* *_V?N

After extracting them, I need to standarzied them since I have to compare results from 2 corpus.
However, as two corpsu are not in the same size. What should I take as the base in standarizing them?

(1) Should I use verb or the sentnece as base?
(2) How can I extract the frequency of sentence in ACTIVE VOICE?
(3) How can I get the total number of voices( Both Passive and Active)?

Your replies must be of a great help to me. Thanks.
 
回复: How to extract the frequency of Passive Voice and Active Voice

And one more question about the Passive.

From the compainion website ,
http://www.routledge.com/textbooks/0415286239/resources/default.html

then click the File-based search patterns for use with the search algorithms (case study 5) (14 k).

The search pattern for Agentless Passive and By Passive are the same. They are
*_VB* *_V?N
*_VB* *_R* *_V?N
*_VB* *_R* *_R* *_V?N
*_VB* *_N* *_V?N
*_VB* *_N* *_R* *_V?N
*_VB* *_N* *_R* *_R* *_V?N
*_VB* *_P* *_V?N
*_VB* *_P* *_R* *_V?N
*_VB* *_P* *_R* *_R* *_V?N

Can anyone explain the reason for that?
That means after the extraction, i need to distingusih them manually ?
 
回复: How to extract the frequency of Passive Voice and Active Voice

To understand these patterns, you will need to look at the CLAWS C7 tagset http://ucrel.lancs.ac.uk/claws7tags.html and know how to use wildcards like * (combination of any characters) and ? (any one character) in the Wordsmith Tools.
 
回复: How to extract the frequency of Passive Voice and Active Voice

To understand these patterns, you will need to look at the CLAWS C7 tagset http://ucrel.lancs.ac.uk/claws7tags.html and know how to use wildcards like * (combination of any characters) and ? (any one character) in the Wordsmith Tools.

I understnad the function of these.

I jsut want to know how to extract the frequency of active voice.
How to get the entire frequency and then subtract the part of Passive?
 
Back
顶部