Random Sampling over Joins

Surajit Chaudhuri*       Rajeev Motwani       Vivek Narasayya
Microsoft Research       Stanford University       Microsoft Research
surajitc@microsoft.com       motwani@cs.stanford.edu       viveknar@microsoft.com

Abstract

We study the problem of efficient implementation of sampling over join of relations. We present probabilistic results explaining the difficulty of this problem and setting limits on the efficiency that can be achieved. Based on new insights into the interaction between join and sampling, we develop techniques that are significantly more efficient than known techniques. We present experimental evaluation of our techniques on Microsoft SQL Server 7.0.