Library Automation and Digital Archive
LONTAR
Fakultas Ilmu Komputer
Universitas Indonesia

Pencarian Sederhana

Find Similar Add to Favorite

Call Number SEM-211
Collection Type Indeks Artikel prosiding/Sem
Title SISTEM DETEKSI BAHASA PADA DOKUMEN MENGGUNAKAN N-GRAM
Author Badrus Zaman, Eva Hariyanti, Endah Purwanti;
Publisher Prosiding Senatkom
Subject information retrieval, language detection, n-gram, bahasa Indonesia, english.
Location
Lokasi : Perpustakaan Fakultas Ilmu Komputer
Nomor Panggil ID Koleksi Status
SEM-211 TERSEDIA
Tidak ada review pada koleksi ini: 55584
ABSTRACT

Language detection on a very large collection of documents can be done to increasing performance of information retrieval system. One of popular method on language detection is n- grams, based on pieces of n-characters taken from a string. This research is developed language detection system based on n-gram that performs by Indonesian or English language. In general, the steps being taken there were 3 phases, namely creating profile of each language, system testing, and system evaluation. Fifty documents were used to creating profile of each language, i.e. 25 Indonesian and 25 English. Sixty documents were used for system testing. System performance was evaluated using F-measures. Based on the test, obtained F-measures for unigram, bigram, and unigram respectively 0.933, 0.917, and 0.933. Keywords: information retrieval, language detection, n-gram, bahasa Indonesia, english.