how to get an idiom list?

it seems that nowadays few softwares can search idioms and phrases in a corpus and make an idiomlist just like the wordlist
 
Try the cluster function of WordSmith. You will get a list of useful formulaic expressions if you set the a great score of MI (or other statistical measures). If you have a large large corpus, also set set a great minimum frequency (above 5 for example). See WST manual for how to do this using Concord or Wordlist. Note that the ways of doing such things are different in WST3 and WST4.
 
i will have a try. if WS4 can identify idioms or phrases, i am sure that it should has a very large dictionary of idioms and phrases of the English. and i think this is a very big problem just like the complexities of parsing Chinese words.
 
No, no dictionary of idioms is used in this kind of work. Such work is based on statistics (co-occurring frequencies, mutual information etc.).
 
could you please introduce me any articles to read in this field?
i don't know the computer how to identify the idioms like"the dos and don'ts"die like a dog "make ends meet"....
for example, i have a couple of texts, i want to compute how many idioms/phrases in them, what are these idioms? then i can get an idiomlist which looks like a wordlist.

you have already introduced two articles:
Capturing phraseology in an online dictionary for advanced users of English as a second language: a response to user needs

Two quantitative methods of studying phraseology in English

[本贴已被 作者 于 2005年08月01日 11时18分14秒 编辑过]
 
Testing your intuitions -

Do you know which 3-word-sequences are most frequently used in English conversations?

Do you know which 3-word-sequences are most frequently used in English in general?

You will find answers here shortly...
 
i am sorry, i have no idea. and i think different answers depend on the different data computed.

[本贴已被 作者 于 2005年08月01日 12时20分11秒 编辑过]
 
Two quantitative methods of studying phraseology
http://www.corpus4u.com/forum_view.asp?view_id=583&forum_id=34

This paper explains clusters, lexical bundles, phrases, idioms in corpus linguistics sense.
 
Most frequently used tri-gram in English conversations - they are not necessarily idioms in a conventional sense, but they are useful pre-fabs...

I DON'T KNOW
I DON'T THINK
DO YOU WANT
A LOT OF
WHAT DO YOU
I MEAN I
DO YOU KNOW
A BIT OF
HAVE YOU GOT
YOU KNOW WHAT
YOU HAVE TO
YOU WANT TO
MM MM MM
YOU KNOW I
AND I SAID
DON'T KNOW WHAT
HAVE A LOOK
YEAH I KNOW
YOU'VE GOT TO
I DON'T WANT
BUT I MEAN
NO NO NO
DO YOU THINK
I SAID TO
BE ABLE TO
I THINK IT'S
A COUPLE OF
IT WAS A
TO DO IT
YOU KNOW THE
NO I DON'T
THAT'S WHAT I
TO HAVE A
ONE TWO THREE
I DON'T LIKE
ONE OF THE
WHAT ARE YOU
AT THE MOMENT
AND HE SAID
I THINK IT
I THINK I
TO GO TO
WHAT I MEAN
I WANT TO
WELL I DON'T
I'VE GOT A
AND IT WAS
I'M GOING TO
ONE OF THEM
THE END OF
I SAID I
IN THE MORNING
A LITTLE BIT
TWO THREE FOUR
AND SHE SAID
DON'T WANT TO
I SAID WELL
IF YOU WANT
I TELL YOU
I MEAN YOU
I USED TO
OH I SEE
IT IN THE
ALL THE TIME
AND I THOUGHT
I'VE GOT TO
I HAVEN'T GOT
KNOW WHAT I
BUT I DON'T
WHAT IS IT
CAN I HAVE
I MEAN IT'S
YOU KNOW AND
YOU KNOW YOU
YOU KNOW THAT
ARE YOU GOING
I THOUGHT IT
THOUGHT IT WAS
I DIDN'T KNOW
WHAT DID YOU
TELL YOU WHAT
THE OTHER ONE
I WAS GONNA
GO AND GET
THERE WAS A
YOU CAN GET
SOMETHING LIKE THAT
GO TO THE
HAVE TO GO
TERMS OF THE
YOU'LL HAVE TO
ED OUCS UPDATED
END USER LICENCE
PART OF THE
AND THEN YOU
I'LL HAVE TO
I KNOW BUT
YOU CAN HAVE
PUT IT IN
I HAD TO
OUT OF THE
THE OTHER DAY
I KNOW I
ONE OF THOSE
I THOUGHT YOU
AND THEN I
TO DO WITH
AND A HALF
YOU'VE GOT A
IN A MINUTE
I HAVE TO
WHEN I WAS
WELL I THINK
LOOK AT THAT
TO GO AND
USED TO BE
LOOK AT THE
 
thanks Richard! but what do we use these frequent expressions for? it's not an easy job to identify the idioms , i think.
 
Language - especially spoken language - can be learnt not word by word, but pre-fab by pre-fab. That speeds up processing and improves fluency.

In statistically based lists, frequently used idioms are definitely covered (a little bit, a lot of, be able to, etc), but such lists include many more useful items (well I think, oh I see, but I mean, etc).

If you want a list of conventional idioms, a dictionary might be better. But some idioms may fall out of use gradually (rain cats and dogs, hen-pecked husband) whille new items become popular - a life circle.

I found the lists extracted using Michael Barlow's Collocate are more "idiom-like" than those from WST. Or even better is IdomPrinciple developed at Birmingham, which is for in-house use.
 
回复:how to get an idiom list?

以下是引用 xiaoz2005-8-1 20:39:06 的发言:
I found the lists extracted using Michael Barlow's Collocate are more "idiom-like" than those from WST. Or even better is IdomPrinciple developed at Birmingham, which is for in-house use.

That's interesting. Never compared them in this area. I wonder why
this would happen given that they are both string frequency based
calculations.

Another term for the pre-fabs listed above is 'lexical bundles',
which Biber and his associates use a lot (e..g. Longman Grammar).


[本贴已被 作者 于 2005年08月04日 02时08分50秒 编辑过]
 
回复:how to get an idiom list?

Another thing: from a pure mechanical point of view, this is
one of the places where one is better off using tags to generate
a list of interest, if the quick clusters/bundles don't foot the bill.
 
回复:how to get an idiom list?

以下是引用 xiaoz2005-8-1 10:27:37 的发言:
Try the cluster function of WordSmith. You will get a list of useful formulaic expressions if you set the a great score of MI (or other statistical measures). If you have a large large corpus, also set set a great minimum frequency (above 5 for example). See WST manual for how to do this using Concord or Wordlist. Note that the ways of doing such things are different in WST3 and WST4.

I know how to search for clusters with a minimum frequency, but how can I get a cluster list by setting a MI score with wordsmith 3?

[本贴已被 作者 于 2005年08月06日 12时29分13秒 编辑过]

[本贴已被 作者 于 2005年08月06日 14时12分48秒 编辑过]
 
Back
顶部