ABSTRAK
ABSTRACT
Name : Lasguido
Study Program : Master of Computer Science
Title : Part of Speech Tagger on Indonesia Microblog
Twitter is one of the well known social network that is very popular in the world
especially in Indonesia. The text data provided in Twitter is usually not well
formated which makes it difficult to do part of speech tagging. Therefore, this
research aims to explore a technique to develop a robust part-of-speech tagger for
Twitter microblog data such as HMM, CRF, and Brill Tagger algorithm.
Furthermore, this research also utilizes a combination of part-of-speech tagger
algorithm, phonetic algorithm such as Soundex, Metaphone, and NYSIIS, and
brown clustering. The evaluation shows that the best result is 94,13% using the
combination of HMM and Brill Tagger algorithm. For the Out-of-Vocabulary
(OOV) words, the result shows 90,80% using Brill Tagger algorithm while for
non-OOV words the results achieves 94,65% using HMM algorithm.
Key words:
Part-of-speech tagging, HMM, CRF, Brill, phonetic algorithm, brown clustering
algorithm
|