ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Latent Semantic Indexing: A Probabilistic Analysis.

Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala: Latent Semantic Indexing: A Probabilistic Analysis. PODS 1998: 159-168
@inproceedings{DBLP:conf/pods/PapadimitriouRTV98,
  author    = {Christos H. Papadimitriou and
               Prabhakar Raghavan and
               Hisao Tamaki and
               Santosh Vempala},
  title     = {Latent Semantic Indexing: A Probabilistic Analysis},
  booktitle = {Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium
               on Principles of Database Systems, June 1-3, 1998, Seattle, Washington},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-996-3},
  pages     = {159-168},
  ee        = {http://doi.acm.org/10.1145/275487.275505, db/conf/pods/PapadimitriouRTV98.html},
  crossref  = {DBLP:conf/pods/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. We also propose the technique of random projection as a way of speeding up LSI. We complement our theorems with encouraging experimental results. We also argue that our results may be viewed in a more general framework, as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative filtering.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Load The ACM SIGMOD Anthology, CDROM Edition, Volume 1-3, PODS '82-'98. and ... Load The ACM SIGMOD Anthology, Silver Edition, DVD 1, Proceedings. and ... BibTeX

Printed Edition

Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington. ACM Press 1998, ISBN 0-89791-996-3
Contents BibTeX

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1052 KB]

References

[1]
...
[2]
...
[3]
...
[4]
...
[5]
...
[6]
...
[7]
...
[8]
...
[9]
Ronald Fagin: Combining Fuzzy Information from Multiple Systems. PODS 1996: 216-226 BibTeX
[10]
...
[11]
...
[12]
...
[13]
Norbert Fuhr: Probabilistic Models in Information Retrieval. Comput. J. 35(3): 243-255(1992) BibTeX
[14]
...
[15]
...
[16]
...
[17]
...
[18]
...
[19]
...
[20]
Mark Jerrum, Alistair Sinclair: Approximating the Permanent. SIAM J. Comput. 18(6): 1149-1178(1989) BibTeX
[21]
...
[22]
C. J. van Rijsbergen: Information Retrieval. Butterworth 1979, ISBN 0-408-70929-4
BibTeX
[23]
...
[24]
Howard R. Turtle, W. Bruce Croft: A Comparison of Text Retrieval Models. Comput. J. 35(3): 279-290(1992) BibTeX
[25]
...

Referenced by

  1. Jon M. Kleinberg, Andrew Tomkins: Applications of Linear Algebra in Information Retrieval and Hypertext Analysis. PODS 1999: 185-193
  2. Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, Prabhakar Raghavan: Scalable Feature Selection, Classification and Signature Generation for Organizing Large Text Databases into Hierarchical Topic Taxonomies. VLDB J. 7(3): 163-178(1998)
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:34:20 2009