What corpora cannot do? 语料库语言学家十诫

回复: What corpora cannot do? 语料库语言学家十诫

语料库中主要存放文本文件。表格和图片一般不考虑。
如果一定要处理表格和图片,perl 和 python 等语言就有很多现成的 Library Module 可以使用。
 
回复: What corpora cannot do? 语料库语言学家十诫

A good idea, but it seems that every research undergoing applies corpus now, without corpus without research.
 
回复: What corpora cannot do? 语料库语言学家十诫

A good idea, but it seems that every research undergoing applies corpus now, without corpus without research.

确实是这样的,跟当年系统功能语法一样,一热大家都热,后来是认知语言学,还有什么 Learning Autonomy 等等,赶时髦的多。
不过,也不用替这个担心。满街都是开药铺子的,几年下来并不是每一个都能生存下来,因为蒙古大夫和狗皮膏药总不能一直蒙得了人。
先流行了再淘沙。
 
回复: What corpora cannot do? 语料库语言学家十诫

确实是这样的,跟当年系统功能语法一样,一热大家都热,后来是认知语言学,还有什么 Learning Autonomy 等等,赶时髦的多。
不过,也不用替这个担心。满街都是开药铺子的,几年下来并不是每一个都能生存下来,因为蒙古大夫和狗皮膏药总不能一直蒙得了人。
先流行了再淘沙。

说得好。顶一个。不管做什么,都要静下心来。陶老师、肖老师都是很好的榜样。
 
回复: What corpora cannot do? 语料库语言学家十诫

确实是这样的,跟当年系统功能语法一样,一热大家都热,后来是认知语言学,还有什么 Learning Autonomy 等等,赶时髦的多。
不过,也不用替这个担心。满街都是开药铺子的,几年下来并不是每一个都能生存下来,因为蒙古大夫和狗皮膏药总不能一直蒙得了人。
先流行了再淘沙。



说得好! 其实很多新事物都要经历这个过程.
 
this topic is quite interesting.

According to Kennedy, a corpus is a body of written text or transcribed speech which can serve as a basis for linguistic analysis and description.

here is a keyword: body or corspe

What shall we concentrate on?
Then we come back to a classic question: parole or langue; performance or competence.

The difference between dead and alive, I think, is equal to the contrast in reading Shakespeare and waching Shakespeare.

Now please study the following conversation:
A: I still have a son.
B: Well, that's OK!
A: I still have a dog.
B: Oh, I'm sorry!
(from Yule)

Even though the words are easy, out of the real situation, we need to guess or work out the meaning of the speakers.
However, if you watch or hear the conversation in person, or you know more information about the speakers, it's easy to understand.

Conversation through QQ is more problematic than common dialogues. And I think you know the reason.
 
what are starred examples?

"Starred examples" here may refer to the examples quoted by a linguist in his research paper. The examples are labeled by stars (as used in linguistics) to indicate their unacceptability testified by a native speaker's intuition instead of eviden from a corpus.
 
Any research approach has its limitation and strengths.
It is illusion to expect one approach to provide the comprehensive picture of language.
The key point is that we are approaching language from different ways.
 
回复: What corpora cannot do? 语料库语言学家十诫

我个人的感觉,corpus这个方向是正确的,NLP在5年内可能会出现质的飞跃。

A good idea, but it seems that every research undergoing applies corpus now, without corpus without research.
 
回复: What corpora cannot do? 语料库语言学家十诫

it is not time for us to consider the limitation, though we all know about it.
 
回复: What corpora cannot do? 语料库语言学家十诫

The existence of corpus linguistics is highly related to the IT development! It has a bright future anyway!
 
Last edited:
回复: What corpora cannot do? 语料库语言学家十诫

(3) The best information comes from direct data.
This points to the alternative of pure unadulterated texts, devoid of any annotation. Yet we still do not know how to properly handle them. The other alternative, annotation, now offered as the solution, is, to a varying extent, always biased and adulterates both the data input and results obtained. Hence, it should always be viewed as an alternative only.
读到这条时很有感触
 
Back
顶部