In empirical linguistics (such as corpus and computational linguistics), probability/statistics is a recurrent topic which cannot be circumvented. To understand this topic, many books tries to combine statistics and language studies in only one textbook. However, in my mind, it is better to read books dedicated to the essential theories of probability and statistics and their applications in other fields to gain a deeper understanding of it. The following books might be helpful for this purpose:
For theoretical statistics:
1. Introductory Statistics by Neil A. Weiss. This book includes many interesting examples that will make you read on. No complicated formulas and derivation process. The fourth edition has been published by Higher Education Press, China, but out of print right now. You may find it in a local library.
2. Probability and Statistics by Morris H. DeGroot. Strong on mathematics. Some formulas require readers' knowledge on calculus. A few German linguists such as Stefan Evert recommends this book in their articles.
For applied statistics:
3. Statistics for the Behavioral Sciences by Larry B. Wallnau. Very comprehensive and interesting. Can explain most of the statistics used in the field of SLA and applied linguistics.
4. Introduction to Machine Learning by Ethem Alpaydin. This book covers lots of topics such as parametric/non-parametric statistics and Markov/Hidden Markov model and their application in machine learning.
All these four books except for the first one can be found on the website library.nu. Hope all corpus lovers can list their favorite statistics books here in this thread.
For theoretical statistics:
1. Introductory Statistics by Neil A. Weiss. This book includes many interesting examples that will make you read on. No complicated formulas and derivation process. The fourth edition has been published by Higher Education Press, China, but out of print right now. You may find it in a local library.
2. Probability and Statistics by Morris H. DeGroot. Strong on mathematics. Some formulas require readers' knowledge on calculus. A few German linguists such as Stefan Evert recommends this book in their articles.
For applied statistics:
3. Statistics for the Behavioral Sciences by Larry B. Wallnau. Very comprehensive and interesting. Can explain most of the statistics used in the field of SLA and applied linguistics.
4. Introduction to Machine Learning by Ethem Alpaydin. This book covers lots of topics such as parametric/non-parametric statistics and Markov/Hidden Markov model and their application in machine learning.
All these four books except for the first one can be found on the website library.nu. Hope all corpus lovers can list their favorite statistics books here in this thread.
Last edited: