Named-entity recognition and co-reference resolution using knowledge engineering and association rules for the Indonesian language
Depok: Fakultas Ilmu Komputer UI, 2008
Lokasi : Perpustakaan Fakultas Ilmu Komputer
Tidak ada review pada koleksi ini: 25759
The Extraction of semi-structured or structured information from free text (Information extraction task) is one of the great challenges in its realization, especially for new language due to its different characteristic of languages. Many researchers have conducted research in this area for English language. In contrast, research examining the Indonesian language (Bahasa Indonesia), or simply Indonesian is lacking. Since there are different characteristics between English and Indonesian language, one needs a different model and technique to examine the latter. This dissertation presents results of research on information extraction for Indonesian in two phases. A manually created set of rules is used for knowledge engineering. For machine learning, association rules is used to obtain the rules from the training corpus. The aims of this research are to evaluate performances of the two approaches for named-entity recognition and co-reference resolution using a corpus that consists of 525 documents. The corpus is extracted from the online version of a national newspaper, Kompas (www.kompas.com). Experiments on named-entity recognition and co-reference resolution yield similar results, increasing number of features will also increase the performances. The experiments show that a knowledge engineering yields better results than association rules. The experiments also show that thenumber of training documents has good impact on the results on named-entity recognition, but it different on conference resolution. Adding more triining documents does not always improve the performance of the co-reference resolution . The comparison with the state- of the art method shows that association rules needs less number of training documents compaed to maximum entropy (the-state-of-the art method for named-entity recognition) in achieving comparable results. For co-reference resoution, the association rules method outperfors the dicision tree ( the state-of -the -art method for co-reference resolution).