Review - Integrating Heterogenous Overlapping Databases through Object-Oriented Transformations.

Jason Atkins: Review - Integrating Heterogenous Overlapping Databases through Object-Oriented Transformations. ACM SIGMOD Digital Review 2: (2000) BibTeX

Review

As the title indicates, this paper presents an efficient method for integrating homogeneous databases. The method presented here calls for a wrapper module to be placed around each of the separate data sources, as well as a centralized mediator, all made up of the same DBMS. The mediator attempts to integrate the separate data sources by creating Integrated Data Types (IUTs). These allow information from the different data sources to be queried as if the information was contained in a single data source. These IUTs are treated as objects, with their member functions overloaded so as to be able to access the different data sources. Experimental results presented in this paper show that this structure, along with some IUT-specific query processing techniques, produces execution times significantly lower than those of previous methods.

As a first-year graduate student just now entering into the world of computer science research, I have not had a lot of experience reading and interpreting technical papers. This lack of experience tends to cause me to prefer papers that do a better job of clearly explaining and describing the issues involved. I feel that this paper, after a couple of careful readings, does a reasonably good job of accomplishing this task. In particular, I found many of the examples to be very illustrative of the concepts being discussed, and greatly aided my understanding of the material. A few of the abbreviations, however, left me somewhat confused. Some were not defined at all, while others were defined toward the beginning of the paper, then not used again until later. I also found the overall structure of the paper to be helpful in my understanding of the material, with the various components defined and explained separately before the integration technique as a whole. In general, the fact that I could probably give an accurate summary of the content of this paper to another interested party leads me to believe that this paper is both understandable and well-written.

In terms of content, the methods presented in this paper do indeed sound like they would be feasible in terms of attacking the problem of integrating heterogeneous data. This paper appears to thoroughly cover all of the issues involved (at least I was not able to locate any ^holes' in the arguments presented), as well as provide experimental results to support its conclusions. As I have not had the opportunity to read much of the current material available on this topic, I have no way of knowing just how novel this particular approach is. If this is indeed a new way of approaching the problem, then the experimental results presented would indicate that this strategy is a significant improvement over other available approaches. In addition to supporting the claims of the authors of the paper, I also feel that the experiments performed are relevant to real world situations. They make use of data sources (Microsoft Access) and network infrastructures (ISDN and Ethernet) that are widely used, as well as using the kinds of queries that are often performed. This kind of experimentation allows for the reasonable assumption that results similar to those shown in the paper are likely to occur in most real world settings.

One question I do have about the findings of this paper is its potential for scalability. The paper discusses overloading functions to account for cases where the desired data may be present in either or both of the data sources being referenced. For instance, with two data sources, A and B, a particular instance can be said to be one of three types, based on its location within the constituent data sources: A only, B only, or in both A and B. If a third data source, C, were to be introduced, the number of possible classes increases to seven: A only, B only, C only, A and B, A and C, B and C, and all three (A, B, and C). Thus, with n data sources, there could be up to 2n - 1 location classes for the data involved, requiring each integrated type to have 2n - 1 cases for each function. It would appear that as n grows, the difficulty in defining and executing these functions would increase exponentially.

Overall, while slightly outside my realm of experience, I found this paper to provide a relatively clear presentation of a new, more efficient data integration technique. The experimental results show that, if implemented carefully, this technique could indeed have a significant positive impact on real world implementations of data integration.

References

[1]: Vanja Josifovski, Tore Risch: Integrating Heterogenous Overlapping Databases through Object-Oriented Transformations. VLDB 1999: 435-446 BibTeX

BibTeX

Digital Review - DBLP: [Home | Search: Author, Title | Conferences | Journals]