THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help

about hash join algorithm

  •  07-30-2008, 4:41

    about hash join algorithm

    Normal 0 false false false EN-US ZH-CN X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif";}

    As we know, the hash join algorithm executes in two phases known as the “build” and “probe” phases. During the build phase, it reads all rows from the first input(or left input), calculates the hash value on each row of the equijoin keys...till the hash table is built up and comes to the next phase. Let’s say it’s all in memory and not spilled.

     

    I’d like to know what the hash table looks like. How is the hash value computed via a table row? Is it accountable for the entire row or just the join key? If of the entire row, how is this hash value compared with the hash value of probe input since the joined rows could be different… I guess it should be hash function computed only on the equijoin key(s)…

     

    So I am a little confused by what Craig Freedman’s describing on the algorithm in Inside SQL Server 2005-query tuning and optimization:

     

    for each row R1 in the build table

       begin

          calculate hash value on R1 join key(s)

          insert R1 into the appropriate hash bucket

       end

    for each row R2 in the probe table

       begin

          calculate hash value on R2 join key(s)

          if h(R2(key)) hits  -- ==> Here it leads to corresponding hash bucket with possible one or more rows, right?

          for each row R1 in the corresponding hash bucket

             if R1 joins with R2

                output (R1, R2)

       end

     

     

    The interesting thing is SQL Server has a “bailout” algorithm to switch after a hash join has several rounds of spilling around hash buckets. Does it take over the existing hash table and other disk pages spilled, or maybe it’s simply doing from the scratch??

    Filed under:
View Complete Thread
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement