CAREER: Broadcast Disks: Data Management for Asymmetric Communications Environments

Michael Franklin

Dept. of Computer Science and
Institute for Advanced Computer Studies
University of Maryland

Contact Information

A.V. Williams Building
College Park, MD 20742
Phone: (301) 405-6713
Fax : (301) 405-2744
Email: franklin@cs.umd.edu

WWW PAGE

http://www.cs.umd.edu/projects/bdisk

Keywords

Data dissemination, broadcasting, data push, publish/subscribe, wide-area networks, caching, scheduling, performance.

Project Award Information

Project Summary

The Broadcast Disks project is investigating the use of broadcast-based data dissemination to provide improved performance, scalability, and availability in an increasingly important class of networked applications, namely, those with the property of Asymmetric Communications. In an asymmetric system, due to bandwidth limitations and/or workload characteristics, some machines (i.e., servers) send many more bits to other machines (i.e., clients) than vice versa. The Broadcast Disks paradigm is a unique and fairly radical combination of broadcast scheduling and client storage management algorithms that allow data to be more quickly delivered to clients compared to other data delivery techniques. Much of this work is being done in collaboration with Prof. Stan Zdonik at Brown University.

Goals, Objectives, and Targeted Activities

The proposed research for this grant involved the phased development of the Broadcast Disks approach. Starting from an assumption of a static environment with read-only data, the plan was to focus on dynamic client prefetching techniques, then on updates, and then on integrating ``pull'' requests through the use of a client backchannel, and finally on the impact of communication errors (i.e., noisy channels). These studies were to be done using a detailed simulation environment. In parallel with these algorithm development and performance analysis activities, a prototype was developed (using equipment and software donated by Intel and Microsoft respectively). The prototype has been used to validate the simulation results (and to identify differences) and for the development and analysis of algorithms that would be difficult to simulate accurately. Goals for this final year are to improve and extend the prototype, and to integrate the Broadcast Disks paradigm with other types of data dissemination approaches.

Indication of Success

The Broadcast Disks project has been quite successful, both in terms of what we have accomplished and in terms of the work that others who have built on the ideas have done. We have discovered some fundamental properties of broadcast data delivery that differ significantly from more traditional approaches. We have also demonstrated that despite these differences, it is important to apply a data management perspective to data dissemination, rather than simply a communications perspective, as has been taken in most previous work. More recently, the insight gained through the broadcast disks project has led us to a better understanding of data dissemination in general and we have proposed a framework that unifies various approaches to ``push-based'' data delivery (see our invited papers in OOPSLA 97 and SIGMOD 98). Our research accomplishments to date are documented in the publications listed below and we plan to release a version of the prototype software for others to use. The work initiated in this project has lead to additional funding from DARPA, a DOE proposal, and industrial support from and collaboration with Intel, NEC, and Draper Labs. Follow-on work from groups around the world has appeared in the database, real-time, communications, computer architecture, and most recently, in the theory communities (e.g., papers based on Broadcast Disks appear in the recent STOC and SODA conferences).

In terms of the stated goals, most have already been or will be achieved by the time the grant ends this summer. An aspect in the original proposal that has not been accomplished is a detailed study of the impact of communication errors on performance. Instead, we have focused additional efforts on the integration of multiple data dissemination techniques, a topic that was not foreseen in the original proposal. This change in emphasis was motivated due to the tremendous increase in interest for push-based technology (e.g., webcasting) that has arisen during the course of this grant.

Project Impact

In addition to the activities listed above, the Broadcast Disks work served as the basis of a funded DARPA contract on data dissemination and a proposal that has been recently submitted to the DOE. These proposals integrate the Broadcast Disks ideas with other key information systems technologies such as heterogeneous data management.

Project References

Constructing User Profiles Incrementally: A Multi-Modal Approach
Ugur Cetintemel, Michael J. Franklin, and C. Lee Giles
Submitted to VLDB '98, Februrary, 1998.

"Data in Your Face": Push Technology in Perspective (Invited Paper)
Michael Franklin and Stan Zdonik
ACM SIGMOD Conference, Seattle, WA, June, 1998 (to appear).

Scheduling for Large-Scale On-Demand Data Broadcasting
Demet Aksoy and Michael J. Franklin
IEEE INFOCOM '98, San Francisco, March, 1998 (to appear).

A Framework for Scalable Dissemination-Based Systems (Invited Paper)
Michael J. Franklin and Stan Zdonik
ACM OOPSLA Conference, Atlanta, GA, October, 1997.

Balancing Push and Pull for Data Broadcast
Swarup Acharya, Michael J. Franklin, and Stan Zdonik
ACM SIGMOD Conference, Tucson, AZ, May, 1997.

Dissemination-Based Information Systems
Michael Franklin, and Stan Zdonik
IEEE Data Engineering Bulletin, Vol 19, No 3, September, 1996.

Disseminating Updates on Broadcast Disks
Swarup Acharya, Michael J. Franklin, and Stan Zdonik
22nd VLDB Conference, Bombay, India, September, 1996.

Prefetching from a Broadcast Disk
Swarup Acharya, Michael J.Franklin, and Stan Zdonik
12th Int'l Conference on Data Engineering (ICDE 96), New Orleans, LA, February, 1996.

Dissemination-based Data Delivery Using Broadcast Disks
Swarup Acharya, Michael J. Franklin, and Stan Zdonik
IEEE Personal Communications, Vol 2, No 6, December 1995.

Broadcast Disks: Data Management for Asymmetric Communications Environments
Swarup Acharya, Rafael Alonso, Michael J. Franklin, and Stan Zdonik
ACM SIGMOD Conf., San Jose, CA, June, 1995. (Note: this paper also appears in Mobile Computing, Imielinski and Korth, Eds., Kluwer Academic Publishers, 1996.)

Are 'Disks in the Air' Just 'Pie in the Sky'?
Stan Zdonik, Michael Franklin, Rafael Alonso, and Swarup Acharya
IEEE Workshop on Mobile Computing Systems and Applications,
Santa Cruz, CA, December, 1994.

Area Background

In the past few years there has been an explosion in the number and variety of data-intensive applications being deployed. Ongoing advances in communications and connectivity, such as the proliferation of the Internet and intranets, the development of wireless and satellite networks, and the impending availability of asymmetric, high-bandwidth links to the home, have fueled the development of a wide range of new "dissemination-based" applications. These applications involve the timely distribution of data to a large set of consumers, and include stock and sports tickers, traffic information systems, electronic personalized newspapers, and entertainment delivery.

In order to meet the demands of such applications, a growing number of companies and research groups have been developing new approaches to data delivery in distributed information systems. On the commercial front, companies such as Pointcast, Marimba, BackWeb, and AirMedia have been developing Internet-based "push" technology that can provide information to users without them having to specifically request it. Distributed object interconnection protocols such as CORBA have been extended to support a "publish and subscribe" mode of interaction. There have also been a number of new commercial offerings in high-bandwidth satellite data delivery. On the research front, there have been a number of projects on data broadcast, selective dissemination of information, and support for mobile applications.

A Dissemination-Based Information System (DBIS) incorporates a large number of data delivery mechanisms, which vary from standard, pull-based unicast connections, as used in current web browsing and client-server database technology, to periodic data push over a broadcast channel, as used in Broadcast Disks. The nodes of a DBIS are organized as data sources and consumers, which are interconnected by information brokers. By creating hierarchies of these brokers connected by various data delivery mechanisms, the information flow can be tailored to the needs of many different applications. The goal of this work is to provide a toolkit of components that can be used to construct a DBIS.

Area References

A collection of papers that provide an introduction to the general area of data dissemination can be found in: IEEE Data Engineering Bulletin, Vol 19, No 3, September, 1996.
(This is available from the Data Engineering Bulletin web site.)

Potential Related Projects

There is interesting potential for collaboration with groups doing work in the areas of collaboration technology, heterogeneous and semi-structured data management, distributed agent architectures, and wide-area information access.