Library Automation and Digital Archive
LONTAR
Fakultas Ilmu Komputer
Universitas Indonesia

Pencarian Sederhana

Find Similar Add to Favorite

Tipe Koleksi Indeks Artikel Jurnal
Nama Jurnal INTERNATIONAL JOURNAL OF INFORMATION AND MANAGEMENT SCIENCES
Volume [Vol. 24 (2013): 3]
Judul Artikel Hybrid focused crawling based upon VSM similarity, wordNet semantics and hub score learning
Penulis
Penerbit Jurnal Tamkang: University Press, 2013
Lokasi : Perpustakaan Fakultas Ilmu Komputer
Nomor Panggil ID Koleksi Status
TERSEDIA
Tidak ada review pada koleksi ini: 41643
New websites, together with new web pages, are mushrooming in every corner of the world and gigabytes of information is being upload, deleted or modified every unit of time. None of the existing search engines is able to cover the complete web as a whole for indexing due to the ever increasing size and hence is not able to provide complete and latest information all the times. Users still have to sequentially browse the search results to get the disered information. Also sometimes the search results are biased by wiling full acces of an unrelated page more times than a related page for some query. Focused crwler provides the solution for growing size of the web by browsing the portion of the web that is related to the specific domain. It covers the maximum web space looking for the contents related to the domain and provides the more recent and exact information. In this paper we present a focused crawler architecture based upon WordNet semantics, Vectors space model (VSM) and hub score learning. Crawling results for breadth first crawler, VSM based best first crawler, Naive bayes breadth first crwler, Naive Bayes best first crawler, and crawler based opon wordNet semantics. Vector space model (VSM) and hub score learning, are shown. The results show that the proposed crawler outperforms the others in terms of the precision and also outperform all but Naive Bayes breadth first crawler, which produces the worst precision among all the competitors, in terms of average time taken for collecting 1000 domain related pages.