The need to automatically extract and classify the contents of multimedia
data archives such as images, video, and text documents has led to significant
work on similarity based retrieval of data. To date, most work in this
area has focused on the creation of index structures for similarity based
retrieval. There is very little work on developing formalisms for querying
multimedia databases that support similarity based computations and optimizing
such queries, even though it is well known that feature extraction and
identification algorithms in media data are very expensive. We introduce
a similarity algebra that brings together relational operators and results
of multiple similarity implementations in a uniform language. The algebra
can be used to specify complex queries that combine different interpretations
of similarity values and multiple algorithms for computing these values.
We prove equivalence and containment relationships between similarity algebra
expressions and develop query rewriting methods based on these results.
We then provide a generic cost model for evaluating cost of query plans
in the similarity algebra and query optimization methods based on this
model. We supplement the paper with experimental results that illustrate
the use of the algebra and the effectiveness of query optimization methods
using the Integrated Search Engine (I.SEE) as the testbed.