Guardian wordlist for WordSmith 3

xiaoz

永远的超级管理员
Staff member
#1
Guardian newspaper corpus (over 90 million words) wordlist for use with WordSmith version 3

[DOWNLOAD]http://www.ling.lancs.ac.uk/corplang/zipfiles/guardian.zip[/DOWNLOAD]
 

xusun575

高级会员
#2
Thanks a lot for your efforts and i still think this list would've been more helpful if it had been otherwise sorted and re-collected as loads of clusters are not words but rather tokens. How would u consider it t be ,a wordlist or a token list?

18 AAA'S 5
19 AAAAA 3
20 AAAAAGH 3
21 AAAAAH 4
22 AAAAH 7
23 AAAARGH 2
24 AAAGH 13
25 AAAH 13
26 AAAHH 5
27 AAAI 4
28 AAARGH 7
29 AAB 2
30 AABB 2
31 AAC 3
32 AACHEN 56
 

xiaoz

永远的超级管理员
Staff member
#3
These may not be words in a conventional sense (they may be labels), they are nevertheless "words" as they do actually appear in texts. You can of course re-sort the list on the basis of frequency, for example using WordSmith Wordlist.
 

xusun575

高级会员
#4
回复:Guardian wordlist for WordSmith 3

以下是引用 xiaoz2005-7-25 0:49:48 的发言:
These may not be words in a conventional sense (they may be labels), they are nevertheless "words" as they do actually appear in texts. You can of course re-sort the list on the basis of frequency, for example using WordSmith Wordlist.
Aha but yes! As an old saying goes, anything that appears in a text should be taken as a word.-:))
 
顶部