Maximal frequent sequences (MFS) are the longest sequences that can be formed and still occur frequently in a document collection. A sequence can be considered to be frequent if it apprears in at least documents, where is the frequency threshold given. MFS can be used to represent the content of a document. MFS can also be used for indexing purposes complementing the usuak term-based indexing method. In this paper, we evaluated the suitability of MFS in improving the document ranking system containing indonesian documents, with both stemmed and non-stemmed version. The framework for testing purposes also includes basic term frequency boolean model, term frequency cosine (tfc) model and hybird (tfc and MFS combined) model. We applied those models to indonesian corpuses, a news corpuss and a scientific corpus. The result indicates that MFS indeed can improve the precision of a retrieval system,and that stemming improves the precision substantially regardless of the model used for retrieving the documents.
|
|