Normal
0
false
false
false
EN-US
ZH-CN
X-NONE
MicrosoftInternetExplorer4
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman","serif";}
As we know, the hash join algorithm executes in two phases known as the “build” and “probe” phases. During the build phase, it reads all rows from the first input(or left input), calculates the hash value on each row of the equijoin keys...till the hash table is built up and comes to the next phase. Let’s say it’s all in memory and not spilled.
I’d like to know what the hash table looks like. How is the
hash value computed via a table row? Is it accountable for the entire row or
just the join key? If of the entire row, how is this hash value compared with
the hash value of probe input since the joined rows could be different… I guess
it should be hash function computed only on the equijoin key(s)…
So I am a little confused by what Craig Freedman’s
describing on the algorithm in Inside SQL Server 2005-query tuning and
optimization:
for each row R1 in the build table
begin
calculate hash value on R1 join
key(s)
insert R1 into the appropriate
hash bucket
end
for each row R2 in the probe table
begin
calculate hash value on R2 join
key(s)
if h(R2(key)) hits -- ==> Here it leads to
corresponding hash bucket with possible one or more rows, right?
for each row R1 in the
corresponding hash bucket
if R1 joins
with R2
output (R1, R2)
end
The interesting thing is SQL Server has a “bailout”
algorithm to switch after a hash join has several rounds of spilling around
hash buckets. Does it take over the existing hash table and other disk pages
spilled, or maybe it’s simply doing from the scratch??