ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity.

William W. Cohen: Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity. SIGMOD Conference 1998: 201-212
@inproceedings{DBLP:conf/sigmod/Cohen98,
  author    = {William W. Cohen},
  editor    = {Laura M. Haas and
               Ashutosh Tiwary},
  title     = {Integration of Heterogeneous Databases Without Common Domains
               Using Queries Based on Textual Similarity},
  booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
               on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-995-5},
  pages     = {201-212},
  ee        = {http://doi.acm.org/10.1145/276304.276323, db/conf/sigmod/Cohen98.html},
  crossref  = {DBLP:conf/sigmod/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Most databases contain ``name constants'' like course numbers, personal names, and place names that correspond to entities in the real world. Previous work in integration of heterogeneous databases has assumed that local name constants can be mapped into an appropriate global domain by normalization. However, in many cases, this assumption does not hold; determining if two name constants should be considered identical can require detailed knowledge of the world, the purpose of the user's query, or both. In this paper, we reject the assumption that global domains can be easily constructed, and assume instead that the names are given in natural language text. We then propose a logic called WHIRL which reasons explicitly about the similarity of local names, as measured using the vector-space model commonly adopted in statistical information retrieval. We describe an efficient implementation of WHIRL and evaluate it experimentally on data extracted from the World Wide Web. We show that WHIRL is much faster than naive inference methods, even for short queries. We also show that inferences made by WHIRL are surprisingly accurate, equaling the accuracy of hand-coded normalization routines on one benchmark problem, and outperforming exact matching with a plausible global domain on a second.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ... Online Version (ACM WWW Account required): Full Text in PDF Format

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Laura M. Haas, Ashutosh Tiwary (Eds.): SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA. ACM Press 1998, ISBN 0-89791-995-5 BibTeX , SIGMOD Record 27(2), June 1998
Contents

Online Edition: ACM SIGMOD

[Abstract]
[Full Text (Postscript)]

References

[Abiteboul and Vianu, 1997]
Serge Abiteboul, Victor Vianu: Regular Path Queries with Constraints. PODS 1997: 122-133 BibTeX
[Arens et al., 1996]
...
[Atzeni et al., 1997]
...
[Barbara et al., 1992]
Daniel Barbará, Hector Garcia-Molina, Daryl Porter: The Management of Probabilistic Data. IEEE Trans. Knowl. Data Eng. 4(5): 487-502(1992) BibTeX
[Bartell et al., 1994]
Brian T. Bartell, Garrison W. Cottrell, Richard K. Belew: Automatic Combination of Multiple Ranked Retrieval Systems. SIGIR 1994: 173-181 BibTeX
[Bayardo et al., 1997]
Roberto J. Bayardo Jr., William Bohrer, Richard S. Brice, Andrzej Cichocki, Jerry Fowler, Abdelsalam Helal, Vipul Kashyap, Tomasz Ksiezyk, Gale Martin, Marian H. Nodine, Mosfeq Rashid, Marek Rusinkiewicz, Ray Shea, C. Unnikrishnan, Amy Unruh, Darrell Woelk: InfoSleuth: Semantic Integration of Information in Open and Dynamic Environments (Experience Paper). SIGMOD Conference 1997: 195-206 BibTeX
[Boyan et al., 1994]
...
[Chaudhuri et al., 1995]
Surajit Chaudhuri, Umeshwar Dayal, Tak W. Yan: Join Queries with External Text Sources: Execution and Optimization Techniques. SIGMOD Conference 1995: 410-422 BibTeX
[Cohen and Singer, 1996]
William W. Cohen, Yoram Singer: Context-sensitive Learning Methods for Text Categorization. SIGIR 1996: 307-315 BibTeX
[Cohen et al., 1997]
...
[Cohen, 1997a]
...
[Cohen, 1997b]
...
[Duschka and Genesereth, 1997a]
Oliver M. Duschka, Michael R. Genesereth: Answering Recursive Queries Using Views. PODS 1997: 109-116 BibTeX
[Duschka and Genesereth, 1997b]
...
[Fang et al., 1994]
...
[Felligi and Sunter, 1969]
...
[Fiebig et al., 1997]
...
[Fuhr, 1995]
Norbert Fuhr: Probabilistic Datalog - A Logic For Powerful Retrieval Methods. SIGIR 1995: 282-290 BibTeX
[Garcia-Molina et al., 1995]
Hector Garcia-Molina, Dallan Quass, Yannis Papakonstantinou, Anand Rajaraman, Yehoshua Sagiv, Jeffrey D. Ullman, Jennifer Widom: The TSIMMIS Approach to Mediation: Data Models and Languages. NGITS 1995: 0- BibTeX
[Hernandez and Stolfo, 1995]
Mauricio A. Hernández, Salvatore J. Stolfo: The Merge/Purge Problem for Large Databases. SIGMOD Conference 1995: 127-138 BibTeX
[Huffman and Steier, 1995]
...
[Kilss and Alvey, 1985]
...
[Knuth, 1975]
Donald E. Knuth: The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition. Addison-Wesley 1973
BibTeX
[Konopnicki and Shmueli, 1995]
David Konopnicki, Oded Shmueli: W3QS: A Query System for the World-Wide Web. VLDB 1995: 54-65 BibTeX
[Korf, 1993]
Richard E. Korf: Linear-Space Best-First Search. Artif. Intell. 62(1): 41-78(1993) BibTeX
[Levy et al., 1996a]
Alon Y. Levy, Anand Rajaraman, Joann J. Ordille: Querying Heterogeneous Information Sources Using Source Descriptions. VLDB 1996: 251-262 BibTeX
[Levy et al., 1996b]
Alon Y. Levy, Anand Rajaraman, Joann J. Ordille: Query-Answering Algorithms for Information Agents. AAAI/IAAI, Vol. 1 1996: 40-47 BibTeX
[Lewis, 1992]
...
[Mendelzon and Milo, 1997]
Alberto O. Mendelzon, Tova Milo: Formal Models of Web Queries. PODS 1997: 134-143 BibTeX
[Monge and Elkan, 1996]
Alvaro E. Monge, Charles Elkan: The Field Matching Problem: Algorithms and Applications. KDD 1996: 267-270 BibTeX
[Monge and Elkan, 1997]
Alvaro E. Monge, Charles Elkan: An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records. DMKD 1997: 0- BibTeX
[Newcombe et al, 1959]
...
[Nilsson, 1987]
...
[Porter, 1980]
...
[Quinlan, 1990]
...
[Salton, 1989]
Gerard Salton: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley 1989, ISBN 0-201-12227-8
BibTeX
[Schäuble, 1993]
Peter Schäuble: SPIDER: A Multiuser Information Retrieval System for Semistructured and Dynamic Data. SIGIR 1993: 318-327 BibTeX
[Suciu, 1996]
Dan Suciu: Query Decomposition and View Maintenance for Query Languages for Unstructured Data. VLDB 1996: 227-238 BibTeX
[Suciu, 1997]
...
[Tomasic et al., 1997]
Anthony Tomasic, Rémy Amouroux, Philippe Bonnet, Olga Kapitskaia, Hubert Naacke, Louiqa Raschid: The Distributed Information Search Component (Disco) and the World Wide Web. SIGMOD Conference 1997: 546-548 BibTeX
[Turtle and Flood, 1995]
Howard R. Turtle, James Flood: Query Evaluation: Strategies and Optimizations. Inf. Process. Manage. 31(6): 831-850(1995) BibTeX

Referenced by

  1. Todd D. Millstein, Alon Y. Levy, Marc Friedman: Query Containment for Data Integration Systems. PODS 2000: 67-75
  2. Mengchi Liu, Tok Wang Ling: A Data Model for Semistructured Data with Partial and Inconsistent Information. EDBT 2000: 317-331
  3. Laura M. Haas, Renée J. Miller, B. Niswonger, Mary Tork Roth, Peter M. Schwarz, Edward L. Wimmers: Transforming Heterogeneous Data with Database Middleware: Beyond Integration. IEEE Data Eng. Bull. 22(1): 31-36(1999)
  4. Greg Barish, Dan DiPasquo, Craig A. Knoblock, Steven Minton: An Efficient Plan Execution System for Information Management Agents. Workshop on Web Information and Data Management 1999: 1-5
  5. Zachary G. Ives, Daniela Florescu, Marc Friedman, Alon Y. Levy, Daniel S. Weld: An Adaptive Query Execution System for Data Integration. SIGMOD Conference 1999: 299-310
  6. Daniela Florescu, Alon Y. Levy, Alberto O. Mendelzon: Database Techniques for the World-Wide Web: A Survey. SIGMOD Record 27(3): 59-74(1998)
  7. William W. Cohen: Providing Database-like Access to the Web Using Queries Based on Textual Similarity. SIGMOD Conference 1998: 558-560
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:43 2009