THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Erin Welker

SQL Server and Business Intelligence

  • Trials in reporting on cubes

    I'll admit that I haven't spent a lot of time in Reporting Services.  And it's been quite awhile since I've written a report against an Analysis Services cube - since SQL Server 2000, in fact.  I was kind of looking forward to using the MDX designer in Reporting Services 2005 but soon found it's limitations (which are well-documented).  I needed to implement drilldown on the Account hierarchy, which required pulling the preceding parent levels in a parent-child dimension.  This quickly moved me to the OLE DB data source type and out of the nifty query designer that is only available when using the Analysis Services type :-(  As such, I was back to implementing parameters by concatenating them in the query string - it was looking like SQL Server 2000 all over again.

    But that's not the jist of this blog entry.

    I got a nice looking report that drilled down a parent-child Account hierarchy on the rows and displayed various time periods on the columns.  Yes, it looked good, but it took, on average, 1 minute 37 seconds to refresh!  Every time the user would select a new job to display they'd have to wait over a minute and 1/2 for the report to come up.  Unacceptable!  What's worse is that the MDX was only taking 7 seconds to execute.  I can tune the MDX/cube but I had no idea what Reporting Services was doing that resulted in such a long render time.

    My first resort was to google on the problem.  I found very little information on the topic.  One useful blog by Chris Webb suggested that I remove the cell formatting options.  Alright, that was nice - it removed 15 seconds and my report now refreshed in 1 minute 23 seconds (average).  Still unacceptable.

    In thinking about the problem it seemed strange that RS required such a flat query that took an aggregated source, required you to unaggregate it, so that RS could re-aggregate it itself.  Yet all examples I could find put all measures (and only measures) on the columns axis, and all remaining dimension members on the remaining axes.  This returns a resultset that is long and narrow.  The report is short (relatively) and wide.  Why can't I make the query look just like the report?

    So that's what I tried.  I put the column header dimension members on the column axis.  Bingo - the report rendered in 15 seconds - now that's a significant improvement!

    Note that this solution will not fit the need of all reports.  In placing dimension members on the column axis I lose the ability to drill down or up that dimension - the column headers become static.  In my case, that's precisely what I wanted since that happened to be the design of the report.  It also is a little more tedious to manually enter the values associated with 10 static columns compared to the 1 column in the original report.  I felt this was a small price to pay for over 5x performance improvement.

                   SSRSDesignerFast

     

    MDX Before (condensed)

    SELECT NON EMPTY {[Measures].[Value]} ON COLUMNS, 

      {([Time].[Fiscal Month].[Current Month], {[Scenario].[All Members].[Actual], [Scenario].[All Members].[Forecast]}),
        ([Time].[Fiscal Month].[ITD], {[Scenario].[All Members].[Actual], [Scenario].[All Members].[Forecast]}),
        ([Time].[Fiscal Month].[EAC], {[Scenario].[All Members].[EAC], [Scenario].[All Members].[Budget]})}
    ON ROWS,

    NON EMPTY  ([Version].[All Members].[Version].[0], [Job].[Job_Hierarchy].[Level 03].[1234],
       { [MeasureType].[All Members].[Units] * DESCENDANTS([Account].[Account Hierarchy].[Total Costs],, LEAVES),
         [MeasureType].[All Members].[Dollars] * {DESCENDANTS([Account].[Account Hierarchy].[Total Costs],, LEAVES), [Account].[Account Hierarchy].[Total Revenue]}}) ON PAGES

    FROM [Job Planning]
    CELL PROPERTIES VALUE, BACK_COLOR, FORE_COLOR, FORMATTED_VALUE, FORMAT_STRING, FONT_NAME, FONT_SIZE, FONT_FLAGS

    MDX After

    SELECT  
      {([Time].[Fiscal Month].[Current Month], {[Scenario].[All Members].[Actual], [Scenario].[All Members].[Forecast]}),
        ([Time].[Fiscal Month].[ITD], {[Scenario].[All Members].[Actual], [Scenario].[All Members].[Forecast]}),
        ([Time].[Fiscal Month].[EAC], {[Scenario].[All Members].[EAC], [Scenario].[All Members].[Budget]})}

      * {[Measures].[Value]} ON COLUMNS, 

    NON EMPTY  ([Version].[All Members].[Version].[0], [Job].[Job_Hierarchy].[Level 03].[1234],
       { [MeasureType].[All Members].[Units] * DESCENDANTS([Account].[Account Hierarchy].[Total Costs],, LEAVES),
         [MeasureType].[All Members].[Dollars] * {DESCENDANTS([Account].[Account Hierarchy].[Total Costs],, LEAVES), [Account].[Account Hierarchy].[Total Revenue]}}) ON ROWS

    FROM [Job Planning]


  • ETL World Record!

    I just heard about this from Len Wyatt, who's a Principle Program Manager on the SQL Server Performance team.  Microsoft and Unisys announced a record for loading into a relational database, using SSIS (SQL Server 2008) to load over 1 TB of data in under 30 minutes!  The previous known record-holder was Informatica, who reported loading 1 TB in 45 minutes.  Note that that this is not a TPC benchmark, since none exists for loading tools (yet).

    Len blogged about the details on the SQL Server Performance team blog, where he mentions that a whitepaper will follow with more information on how they did it.  I, for one, can't wait to read all about it!


  • Partitioning enhancements in SQL Server 2008

    Horizontal partitioning can be a great benefit to data warehouses, as it greatly facilitates the management of very large databases. Of course, querying on partitions can also have performance benefits, especially when only a small percentage of the database is queried and partition elimination can occur. SQL Server 2005, however, can have some unexpected behavior when querying few partitions.

    In SQL Server 2005, one thread is allocated per partition when multiple partitions are queried. These threads then work in parallel to retrieve the rows to satisfy the query. If only one partition is queried, however, SQL Server 2005 allocates as many threads as are available to the one partition. As you can quickly see, there is a vulnerability when it comes to queries that query very few, but more than 1 partition.

    The common scenario is when only 2 partitions are queried. For example, let’s say you have a data warehouse database that is partitioned on a sales date by month. You write a query to compare last month’s sales to this month’s sales. Now, let’s say this runs on a 32-processor server. How many threads get allocated to this query? The answer is 2, one per queried partition.

    This last scenario can become even more complicated when a window of time causes the query to fluctuate between 1 and multiple partitions. Let’s say we compare today’s sales to the same day of the week last week. If both weeks are in the same month, we get full thread allocation. If the weeks span two months (= 2 partitions), however, we only get 2 threads allocated and the query can appear to slow down considerably, for no apparent reason (at least, probably not to the issuer of the query).

    SQL2005Partitioning 

    Note: All of these scenarios assume that the queries are able to eliminate partitions due to a direct search on the partitioning key. Be sure partition elimination is occurring in order to gain the best partitioned table query performance.

    SQL Server 2008 behavior changes the way in which threads are allocated. It allocates all available threads to each partition relevant to the query in a round robin fashion. This results in fast and consistent query execution, no matter how many partitions are queried.

    SQL2008Partitioning

    Note that this makes it all the more important to allocate partitions to filegroups that are spread across many disk spindles, allowing the query on a given partition to be as efficient as possible.

    But wait, there’s another improvement to partition query behavior in SQL Server 2008. SQL Server 2005 only allows lock escalation to the table level. Let’s say your query eliminates 75% of a table’s partitions, but scans all of the remaining partitions (1/4 of the table). If SQL Server decides to escalate the lock, all other queries will be locked out even if they are querying completely different partitions. SQL Server 2008 provides a table option to override this default behavior. Note that the default is still to escalate to table locks (at least for now) so this option will need to be changed to take advantage of partition-level lock escalation. Find out more about this, like I did, on über blogger Paul Randall’s post.


  • My Top 10 list for SQL Server 2008

    Long gone are the days when you can reasonably expect to know all aspects of SQL Server at a deep level.  When a new version starts to emerge, I try to put some scope around the features that I will dive more deeply into.  I'm publishing my list here so that like-minded SQL Server enthusiasts may become aware of a feature or two that was lost in that one-page Powerpoint slide you see in most of the presentations on SQL Server 2008.

    Note:  I put together this list during the summer as an roadmap for personal testing with the CTP releases.  In December 2007, a whitepaper, "An Introduction to New Data Warehouse Scalability Features in SQL Server 2008", was released that includes a short writeup on each of these features, as well as some additional improvements to SSAS, SSIS and SSRS.  If you are interested in the types of features I mention below, I highly recommend you check out this whitepaper. 

    My focus is on Business Intelligence and performance (particularly in terms of relational data warehouse performance).  With that said and in no particular order, here is my list of top 10 features  in SQL Server 2008, and why.  I plan to post more detail about several of these in the future:

    1. MERGE command - this is a new TSQL command that will allow you to combine an Insert with an Update command, sometimes referred to as an UPSERT.  This is particularly useful in loading a data warehouse.  You will no longer have to test for a row's existence in order to take one of two paths (INSERT or UPDATE, or even DELETE).
    2. Star Join - this could have a tremendous impact on queries in a relational data warehouse.  Data warehouse queries are characteristically performance hogs. Since a large percentage of data is usually selected, the query optimizer often cannot take advantage of indexes like it can with more selective OLTP queries.
    3. Change Data Capture (CDC) - this feature can be used in SQL Server 2008 data sources to automatically track changes in data that require a row to be re-sent to a data warehouse.  This makes ETL from a SQL Server 2008 data source far more efficient and straight-forward.
    4. INSERT INTO - no structural changes have been made to the statement, but minimal logging can be implement, much like BULK INSERT or SELECT..INTO, under the right conditions.
    5. Lookups in SSIS - though SSIS lookups were incredibly enhanced from DTS (where they were virtually unusable), the performance of this task has been tuned to improve performance and minimize resource utilization.  In lieu of blogging about this later, I'll refer to a blog post by Jamie Thomson who explains this feature in detail and far better than I could.
    6. Data compression - this is potentially huge!  First of all, compression is an option, so if if the CPU hit is an issue you can choose not to implement compression.  Data compression means more data on fewer pages, which has a domino effect on performance (more pages in memory, better memory utilization, and improved page life expectancy).
    7. Partitioning enhancements - there's an issue in SQL Server 2005 partitioning parallelism that affects queries on a few number of partitions but greater than 1.  If only one partition is queried, intra-partition parallelism is implemented effectively.  If greater than one, a single thread is used to process each partition, which results in under-parallelism with queries on just a few partitions.  This has been addressed in SQL Server 2008 (future post).
    8. Resource Governor - if you haven't seen the demo on this, you should.  You can implement resource governing rules that affect inflight queries.  One scenario I see for this is in environments where ETL occurs simultaneously with data warehouse queries, such as in a real-time environment.
    9. Backup compression - this will dramatically affect backup times for large data warehouses. This capability is currently available through third party vendors but some DBAs are currently unable to leverage these solutions due to company standards or budget constraints.
    10. Partition-aligned indexed views (IVs) - this allows for the use of IVs on fact tables.  Indexed views on partitioned tables is virtually unusable in SQL Server 2005 because of the requirement to drop and recreate dependent IVs whenever a fact table partition SWITCH is made.  I'll explain why I think this is such a big deal in a future post.

  • Installing SQL Server 2005 on Vista

    I sent this to our local user group when I installed Vista Ultimate on my new laptop a few months ago.  Yesterday, I did a quick search to see if this information was readily available and found that it still was not (at least not in one place), so I decided to post it here.  I hope it helps!

    ·       You must install SQL Server 2005 SP2.  I was able to successfully install RTM, then followed by SP2, but an attendee at our last user group meeting had no such luck.  One of our local Microsofties mentioned there is an SQL install that combines the core install of SQL with SP2.

    ·       OWC 11 component install fails.  This has a domino effect on the installation of most of the other SQL components.  You know, the important ones, like database services, SSIS, SSAS and SSRS.  The newsgroup solution was to uninstall OWC 11 through Add/Remove Programs.  This solution worked for me, as well.

    ·       IIS is required for Reporting Services.  Nothing new here, but the configuration of IIS was not as simple as I recall it on XP.  Be sure the following options are selected (Start --> Control Panel --> Programs and Features --> Turn Windows features on or off):

    o   Web management tools

    §  IIS 6 Management Compatibility

    ·         IIS 6 WMI Compatibility

    ·         IIS Metabase and IIS 6 configuration compatibility

    o   World Wide Web Services

    §  Application Development Features

    ·         ASP.NET

    ·         ISAPI Extensions

    ·         ISAPI Filters

    §  Common Http Features

    ·         Default Document

    ·         Directory Browsing

    ·         HTTP Redirection

    ·         Static Content

    §  Security

    ·         Windows Authentication

    ·       You must install Visual Studio SP1, then another patch for Vista.  I initially tried installing the patch for Vista first but it wanted SP1.  I was unable to run BIDS until I installed these.  Here are the download files (available for download at Microsoft.com):

    o   VS80sp1-KB926601-X86-ENU (VS SP1)

    o   VS80sp1-KB932232-X86-ENU (patch for Vista)

     

     

    There was mention of having to turn off UAC at one of our user group meetings.  I did not have this issue.  During the install, I was asked if I wanted to be added to the sysadmin role during the SQL SP2 setup, because Builtin\Administrators is not automatically added to sysadmin like it used to be.

     

     


  • Hi SQLBlog readers!

    I'm excited to be blogging with several of my SQL Server comrades.  My name is Erin Welker and SQL Server is my passion - I've been working with the product since version 1.11!  In the last few years I have been sharing my lessons learned through presenting and writing.  I've been honored to present at SQL PASS for the last 3 years and at Tech Ed 2007.  I was part of the Project REAL team and authored a whitepaper on partitioning as a result.  I also wrote 2 whitepapers on large relational data warehouses on SQL Server, one on managing and one on performance.  I'm currently working on my second article for SQL Server Magazine, which should appear in the March 2008 issue. 

    My focus is on Business Intelligence and performance, and that's what I intend to blog about most.  On my current engagement, I'm working with PerformancePoint Planning, so you may see that slip in here and there.

    I look forward to sharing information on this site and hope to learn from other bloggers and readers, as well.

    I hope you enjoy!

    Erin


Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement