Library Automation and Digital Archive
LONTAR
Fakultas Ilmu Komputer
Universitas Indonesia

Pencarian Sederhana

Find Similar Add to Favorite

Call Number SEM - 362
Collection Type Indeks Artikel prosiding/Sem
Title Using frequent max substring technique for thai keyword extraction used in thai text mining ( hal 309 - 314)
Author Todsanai Chumwatana, Kok Wai Wong, Hong Xie;
Publisher Proceedings ICSIIT 2010: International conference on soft computing intelligent system and information technology 1-2 July 2010 Bali Indonesia
Subject Frequent max substring mining, text mining, information extraction
Location
Lokasi : Perpustakaan Fakultas Ilmu Komputer
Nomor Panggil ID Koleksi Status
SEM - 362 TERSEDIA
Tidak ada review pada koleksi ini: 47909
The amount of electronically stored information in thai language has grown rapidly in the past few years and the number of these documents is still increasing. this makes information extraction (IE) an essential task of extracting keywords from thai texts. thai texts are considered as un-delimeted language where the structure of writing is a string of symbols without explicit word delimiters. words in thai language are not naturally separated by any word delimiting symbols. due to this characteristic of thai written language, word segmentation is a challenging task and has become one of the important research topics. many word segmentation techniques have been proposed to segment thai texts into a set of words to support extraction keywords. however, most of the word segmentation approach required complex language analysis. they usually rely on language analysis or on the use of dictionary or corpus. in this paper, an alternative method for extracting important keywords. this approach looks for long and frequent substrings rather than individual words from given texts. as a result, this approach is language-independent. it does not rely on the use of dictionary or languge analysis . we refer this technique as frequent max substring mining or FM technique. applying the FM technique to thai texts yieldsa a set of keywords that are frequent and highly distinct from given texts. the set of extracted keywords from FM technique is able to contain all frequent substrings wihout information loss. therefore this technique uses less space for storing all frequent substrings in order to support the growth of thai electronic information.