[求助]lexical density tools needed

回复:[求助]lexical density tools needed

以下是引用 valeriazuo2005-9-19 21:17:10 的发言:
Mr.xiao, thanks for your link. what a pity! I couldn't operate it well. I opened a text file and applied a tool - Calculate LD(a la Ure/Stubbs) to it but there was no expected outcome. A dialog box popuped, which asked me to fill the number of content words and corpus size. In fact, I want it to count the number of content words and the size by itself but how can I order the tool to do this job? Thanks a lot for your kind advice.

You used ACWT properly but over estimated its capabilities. It doesn't do the
automatic calculation of the numbers of function words and content words.
(I am not aware of any lexical tools that do this automatically, and what counts
as function words/content words has to be decided by the researcher.)

That being said, it doesn't seem to be a terribly hard thing to do to figure out
the number of what you believe to be content/function words in your corpus.
Here is some suggestion:

1) Use an English/Chinese POS tagger to tag your corpus first;
2) Use a program to search/calculate the frequencies of the tags (not words)
of the function words in your definition;
3) Use the Ure/Stubbs method in ACWT to calculate the LD value.

The reason for searching function word tags in step 2) is that function words
tend to be a more limited set than content word classes. But you could do either
content or function classes and use the total corpus size to figure out the size
of the other class.
 
回复:[求助]lexical density tools needed

以下是引用 Xiaoz 在 2005-9-19 19:08:47 的发言:
Not excatly just the first thousand words of each text. Here is what Mike says about STTR:

"The standardised type/token ratio (STTR) is computed every n words as Wordlist goes through each text file. By default, n = 1,000. In other words the ratio is calculated for the first 1,000 running words, then calculated afresh for the next 1,000, and so on to the end of your text or corpus. A running average is computed, which means that you get an average type/token ratio based on consecutive 1,000-word chunks of text. (Texts with less than 1,000 words (or whatever n is set to) will get a standardised type/token ratio of 0.)"
===========

I see. In that case, I will probably base my calculations on every 100 or 200 hundred words rather than every 1000 words, as most learner texts are not that long. Thank you!

[本贴已被 作者 于 2005年09月19日 23时48分50秒 编辑过]
 
回复:[求助]lexical density tools needed

Function words (in any language) are expected to sit on top of a word frequency list. It should not be very difficult to have a rough estimate of their total frequencies using a POS tagged corpus.
 
回复:[求助]lexical density tools needed

以下是引用 xiaoz2005-9-20 0:36:41 的发言:
Function words (in any language) are expected to sit on top of a word frequency list. It should be very difficult to have a rough estimate of their total frequencies using a POS tagged corpus.


If the user knows the tagset (see a sample of the CLAWS1 tagset below) and
has access to such tools as WordSmith Tools, s/he can use the 'file-based
concordance' feature of WS Tools to search just those tags that mark
'function' words. It shouldn't be too hard a thing to do, I would think, even
though I haven't tried it myself.

Similar results can conceivably be obtained by using RegExp tools (e.g. PowerGrep).

In the worst case scenario the user just has to search each category and get a sum
out of the individual searches.

-----Sample Tags for Function Words-----
AT
singular article (a, an, every)
ATI
article (the, ze, no)
CC
co-ordinating conjunction (and, or, but, so, then, yet, only, for)
DTX
determiner/double conjunction (either, neither)
EX
existential THERE

....
 
My Godness. It's a really hard job for me to follow all of your instructions but I'll have a try anyway. Thanks for all of your considerate advice. Thanks a lot.
 
"My Godness", you appear to know little about POS tagging and about the CLAWS tagset. You will find more info in the section of Corpus Tagging and Annotation on this site.
 
回复:[求助]lexical density tools needed

Following up on this old thread...

Lexical Desity (LD) is an attaractive concept, but can anyone show me
some really nice studies that take advantage of (some versions of)
LD? I know there is a big literature in language development using
LD, in corpus linguistics except for Stubbs it seems like very few people
have used this concept.
 
回复: [求助]lexical density tools needed

求助文章:1. Developmental trends in lexical diversity
2. Investigating accommodation in language proficiency interviews using a new measure of lexical diversity
3.A New Measure of Lexical Diversity
 
回复: [求助]lexical density tools needed

求助文章:1. Developmental trends in lexical diversity
2. Investigating accommodation in language proficiency interviews using a new measure of lexical diversity
3.A New Measure of Lexical Diversity


Author? Journal/Book? Publisher? Date?
 
回复: [求助]lexical density tools needed

1. Durán, P., Malvern, D., Richards, B., & Chipere, N. (2004). Developmental trends in lexical diversity. Applied Linguistics, 25 (2), 220-242.
2. The Lexical Profile of Second Language Writing: Does It Change Over Time?
B Laufer - RELC Journal, 1994
 
回复: 回复:[求助]lexical density tools needed

Not excatly just the first thousand words of each text. Here is what Mike says about STTR:

"The standardised type/token ratio (STTR) is computed every n words as Wordlist goes through each text file. By default, n = 1,000. In other words the ratio is calculated for the first 1,000 running words, then calculated afresh for the next 1,000, and so on to the end of your text or corpus. A running average is computed, which means that you get an average type/token ratio based on consecutive 1,000-word chunks of text. (Texts with less than 1,000 words (or whatever n is set to) will get a standardised type/token ratio of 0.)"

How to Trace the Growth in Learners Active Vocabulary? A Corpus-based Study
Author: Agnieszka Le o-Szyma ska
Source: Language and Computers, Teaching and Learning by Doing Corpus Analysis. Proceedings of the Fourth International Conference on Teaching and Language Corpora, Graz 19-24 July, 2000. KETTEMANN, Bernhard and Georg MARKO (Eds.), pp. 217-230(14)
Publisher: Rodopi
in this article, the author mentioned Type/Token Ratio, the Standardised Type/Token Ratio and the Mean Type/Token Ratio, i wonder the difference between the Standardised Type/Token Ratio and the Mean Type/Token Ratio.
and how to get the the Mean Type/Token Ratio.
 
回复: 回复:[求助]lexical density tools needed

Richard has a good memory.

2005091822404312.jpg

if we use ACWT to calculate the lexical density, firstly,we ned to know th number of content words, and how to get it ?
 
回复: [求助]lexical density tools needed

想请教一下各位前辈:
如果想计算某一类特定的词汇(如连接词)在不同size的语料库中的词汇密度,可以用什么方法及软件呢?连接词的总type和token已知。
如果用STTR来计算,用什么软件可以实现呢?
 
Back
顶部