Call Number | T-1007 (Softcopy T-716) Source Code T-158 |
Collection Type | Tesis |
Title | Pelabelan part of speech pada mikroblog bahasa Indonesia |
Author | Lasguido; |
Publisher | Depok: Fasilkom UI, 2013 |
Subject | |
Location | FASILKOM-UI; |
Nomor Panggil | ID Koleksi | Status |
---|---|---|
T-1007 (Softcopy T-716) Source Code T-158 | TERSEDIA |
ABSTRACT Name : Lasguido Study Program : Master of Computer Science Title : Part of Speech Tagger on Indonesia Microblog Twitter is one of the well known social network that is very popular in the world especially in Indonesia. The text data provided in Twitter is usually not well formated which makes it difficult to do part of speech tagging. Therefore, this research aims to explore a technique to develop a robust part-of-speech tagger for Twitter microblog data such as HMM, CRF, and Brill Tagger algorithm. Furthermore, this research also utilizes a combination of part-of-speech tagger algorithm, phonetic algorithm such as Soundex, Metaphone, and NYSIIS, and brown clustering. The evaluation shows that the best result is 94,13% using the combination of HMM and Brill Tagger algorithm. For the Out-of-Vocabulary (OOV) words, the result shows 90,80% using Brill Tagger algorithm while for non-OOV words the results achieves 94,65% using HMM algorithm. Key words: Part-of-speech tagging, HMM, CRF, Brill, phonetic algorithm, brown clustering algorithm