ABSTRACT
Name : Aditya Rama
Program : Computer Science
Title : Cross Language Sense Transferring To Build Sense Tagged
Corpus and Word Sense Disambiguation for Bahasa Indonesia
Cross Lingual Word Sense Disambiguation (CLSWD) is one among the methods
in solving word sense disambiguation problem in NLP field. This approach utilize
a concept that a word can translated to many words depend on where that word
appear. Limitation of data (sense tagged corpus in Indonesian language) become
one problem that hold the development of research in Indonesian WSD. In this
research, CLWSD approach will be used to transfer sense from english sense tagged
corpus into Indonesian by using parallel corpora. Result of this research is a sense
tagged corpus in Indonesian language that will be tested by our implemented WSD
system. Based on the result of the experiment, we could see that Indonesian sense
tagged corpora has been built by the proposed method. Evaluation of the samples
in the sense tagged corpus itself produce 84.8% accuracy of sense given a word
in a context. Beside the sense tagged corpora, the WSD system itself also having
performance above the baseline with most frequent strategy on some given target
words and features. Combination of bag of words and POS Tag features produced
the highest F-Score of 0.682 (68.2%).
Keywords:
Cross Lingual, Word Sense Disambiguation
vii
|
|