In digital libraries image retrieval queries can be based on the similarity of objects, using several feature attributes like shape, texture, color or text. Such multi-feature queries return a ranked result set instead of exact matches. Besides, the user
wants to see only the k top-ranked objects. We present a new algorithm called Quick-Combine (European patent pending, nr. EP 00102651.7) for combining multi-feature result lists, guaranteeing the correct retrieval of the k top-ranked results. For score aggregation virtually any combining function can be used, including weighted queries. Compared to Fagin’s algorithm we have developed an improved termination condition in tuned combination with a heuristic control flow adopting itself narrowly to the particular score distribution. Top-ranked results can be computed and output incrementally. We show that we can dramatically improve performance, in particular for non-uniform score distributions. Benchmarks on practical data indicate efficiency gains by a factor of 30. For very skewed data observed speed-up factors are even larger. These performance results
scale through different database sizes and numbers of result sets to combine.
|