Digital Symposium Collection 2000  

 
 
 
 
 
 

 





















Comparing Hierarchical Data in External Memory

Sudarshan S. Chawathe

  View Paper (PDF)  

Return to Changes and Temporal Data

Abstract
We present an external-memory algorithm for computing a minimum-cost edit script between two rooted, ordered, labeled trees. The I/O, RAM, and CPU costs of our algorithm are, respectively, 4mn+7m+5n , 6S , and O(MN+(M+N)S 1.5 ) , where M and N are the input tree sizes, S is the block size, m=M/S , and n=N/S . This algorithm can make effective use of surplus RAM capacity to quadratically reduce I/O cost. We extend to trees the commonly used mapping from sequence comparison problems to shortest-path problems in edit graphs.


References

Note: References link to DBLP on the Web.

[AHU76]
Alfred V. Aho , Daniel S. Hirschberg , Jeffrey D. Ullman : Bounds on the Complexity of the Longest Common Subsequence Problem. JACM 23(1) : 1-12(1976)
[CAW98]
Sudarshan S. Chawathe , Serge Abiteboul , Jennifer Widom : Representing and Querying Changes in Semistructured Data. ICDE 1998 : 4-13
[CGM97]
Sudarshan S. Chawathe , Hector Garcia-Molina : Meaningful Change Detection in Structured Data. SIGMOD Conference 1997 : 26-37
[CRGMW96]
Sudarshan S. Chawathe , Anand Rajaraman , Hector Garcia-Molina , Jennifer Widom : Change Detection in Hierarchically Structured Information. SIGMOD Conf. 1996 : 493-504
[LGM96]
Wilburt Labio , Hector Garcia-Molina : Efficient Snapshot Differential Algorithms for Data Warehousing. VLDB 1996 : 63-74
[MM85]
Webb Miller , Eugene W. Myers : A File Comparison Program. SP&E 15(11) : 1025-1040(1985)
[MP80]
William J. Masek , Michael S. Paterson : A Faster Algorithm Computing String Edit Distances. JCSS 20(1) : 18-31(1980)
[Mye86]
Eugene W. Myers : An O(ND) Difference Algorithm and Its Variations. Algorithmica 1(2) : 251-266(1986)
[Sel77]
Stanley M. Selkow : The Tree-to-Tree Editing Problem. IPL 6(6) : 184-186(1977)
[SK83]
...
[Tic85]
Walter F. Tichy : RCS - A System for Version Control. SP&E 15(7) : 637-654(1985)
[Vit98]
Jeffrey Scott Vitter : External Memory Algorithms. PODS 1998 : 119-128
[WC76]
C. K. Wong , Ashok K. Chandra : Bounds for the String Editing Problem. JACM 23(1) : 13-16(1976)
[WCM+94]
Jason Tsong-Li Wang , Gung-Wei Chirn , Thomas G. Marr , Bruce A. Shapiro , Dennis Shasha , Kaizhong Zhang : Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results. SIGMOD Conference 1994 : 115-125
[WF74]
Robert A. Wagner , Michael J. Fischer : The String-to-String Correction Problem. JACM 21(1) : 168-173(1974)
[WMG90]
Sun Wu , Udi Manber , Gene Myers , Webb Miller : An O(NP) Sequence Comparison Algorithm. IPL 35(6) : 317-323(1990)
[WSC+97]
Jason Tsong-Li Wang , Dennis Shasha , George Jyh-Shian Chang , Liam Relihan , Kaizhong Zhang , Girish Patel : Structural Matching and Discovery in Document Databases. SIGMOD Conference 1997 : 560-563
[WZJS94]
Jason Tsong-Li Wang , Kaizhong Zhang , Karpjoo Jeong , Dennis Shasha : A System for Approximate Tree Matching. TKDE 6(4) : 559-571(1994)
[Yan91]
Wuu Yang : Identifying Syntactic differences Between Two Programs. SP&E 21(7) : 739-755(1991)
[ZS89]
Kaizhong Zhang , Dennis Shasha : Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SIAM J. Comput. 18(6) : 1245-1262(1989)
[ZWS95]
...

BIBTEX

@inproceedings{DBLP:conf/vldb/Chawathe99,
  author    = {Sudarshan S. Chawathe},
   editor    = {Malcolm P. Atkinson and
                Maria E. Orlowska and
                Patrick Valduriez and
                Stanley B. Zdonik and
                Michael L. Brodie},
   title     = {Comparing Hierarchical Data in External Memory},
   booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
                Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
                UK},
   publisher = {Morgan Kaufmann},
   year      = {1999},
   isbn      = {1-55860-615-5},
   pages     = {90-101},
   crossref  = {DBLP:conf/vldb/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },


























Copyright(C) 2000 ACM