Digital Review dblp.uni-trier.de

Review - Hash Joins and Hash Teams in Microsoft SQL Server.

Guy M. Lohman: Review - Hash Joins and Hash Teams in Microsoft SQL Server. ACM SIGMOD Digital Review 2: (2000) BibTeX

Review

This paper gives a quite detailed peek into the implementation of the hash join in Microsoft's SQL Server. It deals with many of the practical aspects that a product must deal with, details that are often assumed away in academic papers, like "What happens if the data is so skewed that a single partition won't fit in memory?" It should therefore be required reading for graduate students. The authors admittedly weave together many ideas from the literature; the contribution is that the whole is greater than the sum of its parts! I'm biased toward papers that describe implemented systems, and it is abundantly obvious that all of the results of this paper have not only been implemented but tuned. The authors even recognize and deal with likely failures, such as robust handling of the misestimation of the number of rows by the optimizer.

The most interesting and novel aspect of this paper is the design of "hash teams", a sequence of hash joins, apparently (according to the paper) limited to hash joins "on the same set of columns". A "team manager" controls the partitioning of rows, as well as the spilling, restoring, etc. of those partitions, in all joins of the "team", to maximize memory utilization and minimize I/O of intermediate results. It's really doing N-way join planning instead of just binary joins. Sounds simple, but it's non-trivial to engineer for all eventualities that a product must face.

The paper is beautifully written and organized, as Goetz's papers always are. My only complaint is that the performance section is limited to a couple "success story" examples from TPC-D that seem a bit self-serving; all the implementation details left little space for a thorough and unbiased analysis of the strengths -- and weaknesses! -- of this technique over a wide range of conditions. In addition, the authors are a bit elusive about how memory management is really done, saying only that they "do not detail [their] final solution" of these problems. I suspect that these are considered trade secrets.

Copyright © 2000 by the author(s). Review published with permission.


References

[1]
Goetz Graefe, Ross Bunker, Shaun Cooper: Hash Joins and Hash Teams in Microsoft SQL Server. VLDB 1998: 86-97 BibTeX
BibTeX
Digital Review - DBLP: [Home | Search: Author, Title | Conferences | Journals]
Digital Review: Copyright © by ACM (info@acm.org),
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:57:27 2009