[语料发布] Buddhist Sacred-Texts Corpus

BUDDHIST SACRED TEXTS CORPUS

Source: http://www.sacred-texts.com/bud
Compiler: Jiayue Wang
Time: 8 December 2016

The texts were extracted from web pages downloaded from the website. Each line that begins with a hashtag (#) indicates the webpage and its relative path in the website.

The corpus was created in a Linux environment, encoded in UTF-8, using Unix-style line ending (LF).

Notes:
1. A small part of the texts were extracted from "index" and other web pages which are not Buddhist texts but website comments etc.
2. Although text extration was done in the order of filenames, e.g. ami01.txt > ami02.txt > ami03.txt, wrong orders may occasionally occur.
3. Use of the corpus data is restricted to non-commercial purposes.
4. The corpus can be freely re-distributed, provided the readme file is kept in the package.

----
Jiayue Wang arthur0421[AT]163.com
College of Foreign Studies
Guangxi University for Nationalities
Nanning 530006
China
 

附件

  • buddhist_sacred_texts_corpus.zip
    6.6 MB · 浏览: 36
Back
顶部