Review - MOCHA: A Self-Extensible Database Middleware System for Distributed Data Sources.

M. Tamer Özsu: Review - MOCHA: A Self-Extensible Database Middleware System for Distributed Data Sources. ACM SIGMOD Digital Review 2: (2000) BibTeX

Review

... methodologies in client-server systems, or as an alternative method for building mediator-based distributed systems. The paper addresses issues at the intersection of these two lines of research. With respect to the first line, existing work has identified data shipping, where data is retrieved from the server(s) to the client for processing, and query shipping, where the query is sent to the server for execution. This paper proposes code shipping as an extended form of query shipping such that the code that corresponds to application-specific functions is also sent to the data server for execution. With respect to the mediator-oriented distributed system design, the difference that the paper proposes relates to flexibility. Most existing mediator-based systems can only process queries by subdividing them into subqueries, each of which is sent to a particular data source. A subquery can be executed at a particular data source if it uses functions that are defined at that source. New, application-specific functions need to be manually installed at the sources before they can be used in queries. The system described in this paper (called MOCHA for Middleware Based On a Code SHipping Architecture) automatically sends these functions to the sources as part of the query that is shipped to the source.

Within this framework, MOCHA addresses two issues: (1) "scalable, efficient and cost-effective mechanisms to deploy and maintain application-specific functionality used throughout the system", and (2) query processing to dynamically decide whether to ship the query and the necessary functions and execute them at the data sources or to data ship and execute the query and the functions at the query processing mediator.

Architecturally, MOCHA is a typical mediator-based system where data sources are wrapped by Data Access Provideers (DAPs) that are used by the Query Processing Coordinator (QPC) mediator to execute queries and access data. DAPs differ from traditional wrappers in their ability to load and execute application-specific code. QPC utilizes a code repository to manage this code, and a catalog that includes meta-data relevant to query processing.

The application-specific functions are written in Java as are the client applications. Thus, automatic code deployment is achieved by shipping the compiled Java classes that containt he code and data types for the application-specific function as well as query operators. The use of Java eliminates one of the difficulties associated with query shipping in object-oriented systems; there is no need to worry about shipping the state of the application program that may be referenced by the tightly-coupled query. While this works very nicely for client applications that are written in Java (as stand-alone applications or applets or servelets), it remains to be seen whether the approach can be easily adapted for environments where other languages are used.

For query processing, application-specific functions are classified either as data-reducing or as data-inflating. Data-reducing operators are those that filter data and produce less data then their arguments, while data inflating operators do the reverse. The simple way to describe the query processing methodology (at the risk of oversimplification) is that QPC executes data-reducing operations and queries at the data sources (actually at the DAPs) by code shipping, and it executes data-inflating operators and queries at the mediator by data shipping. The paper defines metrics to determine whether an operator or a query is data-reducing or data-inflating, but does not discuss heuristics to estimate these.

The paper also reports a performance study comparing the flexible query processing architecture of MOCHA against purely query-shipping systems and purely data-shipping systems. THe workload for this study is derived from the Seqouia benchmark. The results indicate the substantial advantages offered by the more flexible processing of queries in MOCHA.

For many of the details that are left unspecified in this paper, the reader is referred to the University of Maryland technical report CS-TR 4105.

References

[1]: Manuel Rodriguez-Martinez, Nick Roussopoulos: MOCHA: A Self-Extensible Database Middleware System for Distributed Data Sources. SIGMOD Conference 2000: 213-224 BibTeX

BibTeX

Digital Review - DBLP: [Home | Search: Author, Title | Conferences | Journals]