Library Automation and Digital Archive
LONTAR
Fakultas Ilmu Komputer
Universitas Indonesia

Pencarian Sederhana

Find Similar Add to Favorite

Call Number 025.04 Eva
Collection Type Indeks Artikel LNCS
Title Utaclir CLEF 2001-effects of compound spitting and N-Gram techniques
Author Turid Hedfund;
Publisher Springer, 2001
Subject
Location
Lokasi : Perpustakaan Fakultas Ilmu Komputer
Nomor Panggil ID Koleksi Status
025.04 Eva TERSEDIA
Tidak ada review pada koleksi ini: 41817
The tempere University CLEF research group participated in CLEF 2001 with four automated bilingual runs. Our cross-lingual software, UTACLIR, uses an automated method for query construction for Cross-Language Information Retrieval (CLIR). This method seeks to automatically extract topical information from request senteces written in one of the source languages and to create a target language quety, based on translation given by a translation dictionay. The new feature for the CLIR process from finnish, Swedish and German to English focus on translating and and matching proper names and other non-translatable words. Non-translatable words can also be components in compounds. The n-gram based method is clearly efficient in matching inflected proper names and spelling variants. However, using it for all non-identified and non-translatable words adds noise to eht query. For German-English we have tested two types of dictionaries (two runs). The first included all translatios form the standard dictionary. The second contained the same data, exceprt that all direct translatins of compounds were exluded. The test with two dictionaries for the German runs gives an indication that the new features for compound processing work well even with a limited dictionary.