句长标准差能说明什么?

xiaoz

永远的超级管理员
Staff member
#2
Standard deviation is the most common measure of statistical dispersion, i.e. how spread out your data set is. Standard deviation is equal to the square root of the quantity of the sum of the deviation scores squared divided by the number of scores in a dataset. If the data points are all close to the mean, then the standard deviation is close to zero. If many data points are far from the mean, then the standard deviation is far from zero. If all the data values are equal, then the standard deviation is zero. When one uses standard deviation to measure the dispersion of a normally distributed dataset, i.e. where most of the items are clustered towards the centre rather than the lower or higher end of the scale, 68% of the scores lie within one standard deviation of the mean, 95% lie within two standard deviations of the mean, and 99.7% lie within three standard deviations of the mean. So when you get the mean sentence length (in how many words) and std dev, you know the distribution of sentences of various lengths in your corpus.
 
顶部