Signature files seem to be a promising method for text retrieval and document retrieval [29,5,8,]. according to this method the documents are stored sequentially in on file (text file) while abstractions of the documents (signatures) are stored sequentially in another file (signature file). in order to resolve a query, the signature file is scanned first and many non-qualifying documents are immediately rejected. in this paper we present there old and one new signature extraction methods and compare their screening capacities. we derive exact and approximate formulas for the false drop probability of each method and discuss the new method in more detail.
|
|