ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Type Classification of Semi-Structured Documents.

Markus Tresch, Neal Palmer, Allen Luniewski: Type Classification of Semi-Structured Documents. VLDB 1995: 263-274
@inproceedings{DBLP:conf/vldb/TreschPL95,
  author    = {Markus Tresch and
               Neal Palmer and
               Allen Luniewski},
  editor    = {Umeshwar Dayal and
               Peter M. D. Gray and
               Shojiro Nishio},
  title     = {Type Classification of Semi-Structured Documents},
  booktitle = {VLDB'95, Proceedings of 21th International Conference on Very
               Large Data Bases, September 11-15, 1995, Zurich, Switzerland},
  publisher = {Morgan Kaufmann},
  year      = {1995},
  isbn      = {1-55860-379-4},
  pages     = {263-274},
  ee        = {db/conf/vldb/TreschPL95.html},
  crossref  = {DBLP:conf/vldb/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Semi-structured documents (e.g. journal articles, electronic mail, television programs, mail order catalogs, ...) are often not explicitly typed; the only available type information is the implicit structure. An explicit type, however, is needed in order to apply object- oriented technology, like type-specific methods.

In this paper, we present an experimental vector space classifier for determining the type of semi-structured documents. Our goal was to design a high-performance classifier in terms of accuracy (recall and precision), speed, and extensibility.

Copyright © 1995 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Umeshwar Dayal, Peter M. D. Gray, Shojiro Nishio (Eds.): VLDB'95, Proceedings of 21th International Conference on Very Large Data Bases, September 11-15, 1995, Zurich, Switzerland. Morgan Kaufmann 1995, ISBN 1-55860-379-4
Contents BibTeX

References

[BFOS84]
Leo Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees. Wadsworth 1984, ISBN 0-534-98053-8
BibTeX
[CACS94]
Vassilis Christophides, Serge Abiteboul, Sophie Cluet, Michel Scholl: From Structured Documents to Novel Query Facilities. SIGMOD Conference 1994: 313-324 BibTeX
[CM94]
Mariano P. Consens, Tova Milo: Optimizing Queries on Files. SIGMOD Conference 1994: 301-312 BibTeX
[GRW84]
...
[Hoc94]
Rainer Hoch: Using IR Techniques for Text Classification in Document Analysis. SIGIR 1994: 31-40 BibTeX
[Hon94]
...
[HS93]
...
[Jam85]
Mike James: Classification Algorithms. John Wiley 1985, ISBN 0-471-84799-2
BibTeX
[Jon71]
...
[LG94]
David D. Lewis, William A. Gale: A Sequential Algorithm for Training Text Classifiers. SIGIR 1994: 3-12 BibTeX
[ODL93]
Katia Obraczka, Peter B. Danzig, Shih-Hao Li: Internet Resource Discovery Services. IEEE Computer 26(9): 8-22(1993) BibTeX
[Qui93]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
BibTeX
[Sal89]
Gerard Salton: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley 1989, ISBN 0-201-12227-8
BibTeX
[Sch93]
Peter Schäuble: SPIDER: A Multiuser Information Retrieval System for Semistructured and Dynamic Data. SIGIR 1993: 318-327 BibTeX
[SIG94a]
...
[SIG94b]
Richard T. Snodgrass, Marianne Winslett (Eds.): Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24-27, 1994. ACM Press 1994
Contents BibTeX
[SLS+93]
Kurt A. Shoens, Allen Luniewski, Peter M. Schwarz, James W. Stamos, Joachim Thomas II: The Rufus System: Information Organization for Semi-Structured Data. VLDB 1993: 97-107 BibTeX
[SWY75]
Gerard Salton, A. Wong, C. S. Yang: A Vector Space Model for Automatic Indexing. Commun. ACM 18(11): 613-620(1975) BibTeX
[vR79]
C. J. van Rijsbergen: Information Retrieval. Butterworth 1979, ISBN 0-408-70929-4
BibTeX
[YMP89]
Clement T. Yu, Weiyi Meng, S. Park: A Framework for Effective Retrieval. ACM Trans. Database Syst. 14(2): 147-167(1989) BibTeX

Referenced by

  1. Serge Abiteboul: Querying Semi-Structured Data. ICDT 1997: 1-18
  2. Markus Tresch, Allen Luniewski: An Extensible Classifier for Semi-Structured Documents. CIKM 1995: 226-233
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:05 2009