THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Kalen Delaney

  • Did You Know: My Fall Travel Plans

    If you check my schedule, you'll see my summer and early Fall are spent mostly close to home. This is a good thing, because I have a book to write!

    But starting in October, I'll be gone at least every other week, with two trips to Europe and one to the East Coast of the US.

    In early October, I'll be presenting at a multi-city Summit for the Stockholm-based training company, Cornerstone.  Although I've been to Stockholm several times, the Summit will take me to 3 other Swedish cities which I have never visited. And of course, one of the best parts of the trip is that I'll get to spend time with my good friend Tibor.

    Following that is a private class in Connecticut, where I have several very good friends, including MVP Roy Harvey.

    In early November, I am very excited to be presenting 3 seminars in the UK for SQLKnowHow. This is a collaboration of UK-based SQL Server consultants and trainers who are putting together a whole suite of events. I'll be presenting two one-day seminars and one three-day seminar, and getting to work with my friend Tony, as well as his partners Simon Sabin, Allan Mitchell  Darren Green.

    I'll get back from the UK in time to go to Colorado Springs to help my mother celebrate her 90th birthday.

    Near the end of November is the big PASS conference, at which I'll be delivering a pre-conference seminar, as well as a spotlight session.

    In early December, I finish out my year's travels with a 5-day  public class at my oldest partner, SQL Soft+. It's in Portland, Oregon, which is about a 3.5 hour drive for me, so I can take my own car, and not worry about those silly airline baggage restrictions!

    I hope to see many of you at some point in my travels!

    ~Kalen


  • Did You Know: Specifying Checkpoint Duration

    I realize probably no one reads Books Online cover to cover, so there are probably lots of little details that easily slip by you. I was introduced to one of the hidden 'features' just last week. Here it is:

    CHECKPOINT can take a parameter!

    It's right there in BOL:

     CHECKPOINT [ checkpoint_duration ]

    This parameter allows you to specify how much time (in seconds) SQL Server should take to complete the checkpoint. If you specify a smaller value, SQL Server will apply more resources to completing the checkpoint, and if you specify a larger number, SQL Server will apply fewer resources. At first glance, this seems like a very cool feature. But then I realized there was very little practical use for it, for these reasons at least: 

    1) There is no way to know how much time a checkpoint should normally take, so giving it a number to use more or fewer resources is just shooting in the dark.

    2) This only applies to the manual checkpoint, not SQL Server's automatic checkpoint. So if you're experiencing a slight decline in throughput every minute or so when the automatic checkpoint starts running, you can't just ask SQL Server to slow down and release some of the resources.

    Maybe I'm missing something, but this seems like one of those features that was added 'just because they could'.  I'd be interested in hearing your ideas regarding the potential usefulness of this capability.

    But really, this post was about the fact that I'd never looked at the page for the CHECKPOINT command, and never realized there were any options at all.

     

    ~Kalen


  • Geek City: Reading the Transaction Log

    Sorry, I'm not actually going to tell you how to read the log. I'm just going to talk about it... and whether it's a good thing to be able to do or not, or whether it's an absolutely crucial feature that MS needs to provide for us immediately, in a hotfix, if not sooner. Forget about fixing bugs, I want to read the log because I forgot to set up a trace beforehand....

    You may have noticed that my blogging frequency has fallen off. One or two of you also noticed that I am no longer writing a regular article each month for SQL Server Magazine. Those two facts are related. I have cut back on non-essential activities to try to get my next book out as soon as possible.  It looks like I might even finish in time to get the book out on the shelves by early next year. Stay tuned...

    Since I couldn't bear to not do anything for SQL Server Magazine, I started writing the commentary in the weekly email newsletter.  Actually, I do it every week but the fourth week of the month. If you like, you can sign up for this free newsletter here.

    My commentary last Thursday seemed to have rattled some cages. Before I even woke up Thursday morning, there were already two comments on the site, and someone sent me a personal email about what I wrote.  By now, there are quite a few more comments. I basically wrote about the need for a log reader tool. It wasn't deeply technical; it's just a commentary after all. You can read it here:

    http://www.sqlmag.com/Articles/Index.cfm?ArticleID=100076

    But boy, did people get upset. They called me bad names... well, if 'mediocre' can be considered a bad name...

    So I responded as follows:

    Wow... I have never gotten so many comments so quickly about one of my articles. I must really have touched a nerve here!

    There is a difference between the actual data rows referenced by the log, and the log format. It's the log format, and giving people full details about what is in the log, that is propriatary information. There is nothing specifically bad about giving people that information. However, calling me names because I don't stand up on a soapbox and DEMAND that MS add this functionality seems a little extreme. There are plenty of other things MS could do with the product and providing a log reader tool is way down on the list.

    Yes I realize it is important to some people, but there are many other ways to get this information through tracing etc. If the developer resources are limited at MS, I would much prefer they spend their time on more important stuff. MS knows it's important that people have this information, that's why they added a great deal of additional tracing capabilities in SQL Server 2008.

    Also, keep in mind that a log reader tool wouldn't help you debug problems with logic, or with bad reports due to faulty SELECTs. If your WHERE clause was written badly, a log reader tool could tell you which rows were affected, but not WHY. You'd need a tracing tool for that. Vogelm's comment that a log reader tool would help troubleshoot bad queries from 3rd party apps is not true; you need to see the statements for that, not just the affected data.

    I do appreciate kbreneman's comment that the real problem is one of perception. MS should make clear that the transaction log is not an audit tool; if you want auditing, you need to set it up on your own, because you're the only one who knows what's important for you to capture.

    (The only way to respond to comments is to write a comment of my own, and then the form insists that I rate the article I am responding to. I always feel a bit weird having to rate my own articles.)

    Since I wrote the article, I have found out that Lumigent does have a log reader tool for SQL Server 2005, but I have heard less than stellar reviews about its ability to capture some of the more interesting datatype activities that are now possible in SQL Server 2005. And their website still doesn't list any version numbers.

    I can't stop thinking about this, so I thought I would open up the issue to a wider audience.

    How important do you think it is that Microsoft provide a log reader tool for us?

    Thanks!

    ~Kalen


  • Geek City: System Objects

    As you might know, metadata is one of my favorite topics, and I've written quite a bit about metadata and system objects. A very frequent question is: What exactly IS a system object?

    There are quite a few different definitions you could use. You could say a system object is one that was installed with your SQL Server installation.

    You could say it is one that starts with a special prefix, like 'sys' for objects if you're using SQL Server 2000 or earlier, or one in the sys schema if you're using SQL Server 2005.

    You could say a system object is one that has an object id of less than 100.

    In the old SQL 2000 Enterprise Manager, you could look at a list of objects in a database, and there was a column called 'type' with a value of either 'system' or 'user'. 

    The objectproperty function has an argument called 'IsSystemTable', that will show you which tables are system tables, but won't do anything for other system objects.

    The The objectproperty function also has an argument called 'IsMSShipped', which corresponds to a column in the SQL 2005 metadata view sys.objects called 'is_ms_shipped'.

    SELECT *
    FROM sys.objects
    WHERE
    OBJECTPROPERTY(object_id, 'IsMSShipped') = 1;

    But just because something is shipped by Microsoft, is it really a system object? There is an undocumented procedure called sp_MS_marksystemobject, that would change the properties of any object you created so that its 'is_ms_shipped' property would also show a value of 1. Does that make it a system object, just because you marked it as such?

    CREATE TABLE tiny
    (col1 int);
    GO
    EXEC sp_MS_marksystemobject 'tiny';
    GO
    SELECT *
    FROM sys.objects
    WHERE
    OBJECTPROPERTY(object_id, 'IsMSShipped') = 1;

    I usually think of system objects as ones that have special behavior. Procedures in the master database, that start with sp_ are special, in that you can access them from anywhere. It turns out that you can create views and tables in master, with names starting with sp_, and they will also be accessible from any database, without fully qualifying the name.

    System tables have always had special behavior, in that you couldn't update them without setting a configuration option for the whole server. Once you set 'allow updates' to 1, you could update system tables, in versions before SQL 2005. Metadata changed completely in SQL 2005, and I thought for a long time that the 'allow updates' configuration option  didn't do anything anymore. But I just recently found out that is not true. There is something special that 'allow updates' allows.

    If you run the following, you will be given an error message:

    sp_configure 'allow updates', 1;
    reconfigure;

    Configuration option 'allow updates' changed from 0 to 1. Run the RECONFIGURE statement to install.
    Msg 5808, Level 16, State 1, Line 2
    Ad hoc update to system catalogs is not supported.

    So it tells you to RECONFIGURE, but when you do, you still get an error. However, if you run RECONFIGURE WITH OVERRIDE, you get  no error:

    sp_configure 'allow updates', 1;
    reconfigure with override;

    It turns out that you if you set 'allow updates' to 1, any procedure that you create in the master database will automatically be marked as 'is_ms_shipped'. You can observe this either with the OBJECTPROPERTY function or the is_ms_shipped column in sys.objects. This doesn't seem to work for tables or views, and it doesn't give the procedure any special behavior, but it changes a internal status bit to make the object seem like something special.


    USE master;
    GO
    CREATE PROC today
    AS SELECT getdate();
    GO
    SELECT *
    FROM sys.objects
    WHERE
    is_ms_shipped =  1 AND name LIKE 't%';

    Have fun!

    ~Kalen


  • Did You Know? My Son is Entertaining Bill Gates

    I can't help it; I'm a mother. I am so excited about what my son is doing that I just have to share it. My #2 son  is a budding actor and vocalist, and to earn money this summer he is working as a clown for a large organization that provides clowns, rides, games, balloon artists and magicians to local events. I found out several months ago that he was going to be working at the Microsoft Picnic this summer, and I started wondering how I might go about getting in to see him in action. But today, as he was walking out the door in full clown makeup and regalia, I found out he was working at a private picnic for the Gates Foundation! At this very moment, he could be making balloon butterflies, or unicorns, or puppies, or bears, or swords, for Bill and Melinda's kids! So far, he hasn't allowed us to take a picture of him in his makeup, but maybe he'll relax about it and I can get one posted.

    And what is his poor mother doing while he is out entertaining the hoi polloi? I'm working on my next book. I spent much of yesterday researching the internal storage format of the new date and time data types, and today I started writing about them. It's actually pretty exciting and I'll blog a bit about it in the next day or two. Even though I'm not at the party, I still love what I do.

    Have fun!

    ~Kalen


  • Did You Know the History of SQL Server?

    I was just rereading a blog post by Euan Garden about SQL Server's history and he refers to the fact that you can read a chapter about the history in the Inside SQL Server books. In the rewrite of the book for SQL Server 2005, when we split the book into four parts, we had to remove the history chapter, so I took the chapter from the SQL Server 2000 book and made it available online. When reading Euan's post, I realized that most people, including Euan himself, probably didn't know it was available so I decided to make that information a little more public.

    You can find the chapter here.

    http://insidesqlserver.com/companion/History%20of%20SQL%20Server.pdf

    Have fun!

    ~Kalen


  • Did You Know? My First Meme

    I was just tagged by Andy Leonard, and truth to tell, I don't read Andy's blog all that often because he talks about really developer oriented stuff, which is a little outside my focus area now.  But I was intrigued by the concept of a techblog meme, which I'd never seen before. I'd only seen memes discussed on my daughter's food blog, but at least I knew what they were because of that. You can read hers here, here and here. And then I realized I shouldn't have been quite so curious, because he tagged me! I saw today that Denis already responded, and tagged a whole bunch of people. I thought there was supposed to some sort of limit on the number of people you could tag; otherwise all the good tagees would get tagged too quickly. :-)

    Although the meme is about software development, which I don't really do, I figured I could answer most of the questions, but for some of the answers I substitute 'working with computers' for 'programming'.

    How old were you when you first started programming?

    I was 16 years old and a Junior in High School. 

    How did you get started in programming?

    I took a night class at a local Community College.

    What was your first language?

    Fortran... and it was mostly boring programs, but the 'idea' of programming fascinated me!

    What was the first real program you wrote?

    Hmm, what counts as 'real'? Commercially viable? Or just one that other people wanted to use? My first was a game I wrote for the Atari 800 that helped preschoolers to memorize their phone number. (My daughter was a pre-schooler at the time.)  The teacher wanted a copy, then all the other teachers, then friends of the teachers, etc, etc....

    What languages have you used since you started programming?

    Fortran, Basic, PDP Assembly Language, Logo, C, Pascal, Lisp, Sequel, SQL, SQL, SQL, SQL.....
    (...and probably a few I've forgotten)

    What was your first professional programming gig?

    I've actually never had a 'programming gig'. My first paid job with computers was being a teaching assistant for Computer Science classes at UC Berkeley.

    If you knew then what you know now, would you have started programming?

    Too hard of a question. I absolutely love what I do, but I can see lots of other paths my life could have taken, if back when I was 16 I knew everything that I know today. :-)

    If there is one thing you learned along the way that you would tell new developers, what would it be?

    Have fun!

    Who are you calling out?

    Lara Rubbelke
    Tibor Karaszi
    Kimberley Tripp
    Linchi Shea


  • Did You Know? What's the Capital of Idaho?

    Can you say it out loud? You might have known the capital was Boise, but if you pronounced it Boy-zee, you were wrong. I was there last Wednesday for the kickoff meeting of the Boise SQL Server User Group, and was told that if I didn't pronounce the name correctly, everyone would know immediately that I wasn't from around there.  The correct pronunciation is Boy-sea, or as I was spelling it when writing to my friend there: Boycee.

    We had over 30 people show up for my presentation, in which I compared the two different concurrency models that SQL Server 2005 has available. I didn't dive into a really deep level discussion about the internals of locking and snapshot isolation, because there were people there with all different experience levels. I tried to stick to the behavioral differences and the costs involved with both choices.

    It was a very enthusiastic group, with lots of good questions. There was also lots of good food, much than we could eat! We had about 20 pizzas, 3 coolers full of cold drinks, several bags of chips,  and also several bags of cookies.

    I get a lot of requests to come talk at User Groups, and usually I am open to it when I am in town teaching a class. This was the first time that I actually traveled by plane to go somewhere just to give a free User Group talk, and I did it because my good friend Cindy Gross, who works for Microsoft in Boise, asked me to.  And Cindy made it well worth my while, giving me a wonderful guided tour of the Idaho capital, and a fabulous breakfast on Thursday!

    Hopefully, I'll make it back again and Cindy has promised to take me rafting down the river. That sounds great, if I get back before the Winter!

    ~Kalen


  • Geek City: Why I still need Sysprocesses

    I've said many times that my favorite new feature in SQL Server 2005 is the new metadata, in particular the new Dynamic Management Objects. When I have to do troubleshooting on a SQL Server 2000 system, it is worse than painful, not to have my favorite tools like sys.dm_tran_locks, sys.dm_exec_cached_plans and
    sys.dm_exec_query_plan. By now, on the eve of the release of the version AFTER SQL Server 2005, my transition to the new metadata is almost complete.

    With one notable exception...

    I have always used sysprocesses constantly for troubleshooting information. Although most of the useful information that allows me to see what sessions are using lots of memory, or performing lots of I/O, can found in sys.dm_exec_sessions, there is one piece of information that isn't there. Sysprocesses contains a columns called open_tran which reflects the transaction  nesting of each session. If a session issues four BEGIN TRAN commands, with no COMMITs or ROLLBACKs, their session will have an open_tran value in sysprocesses of 4. Any open_tran value greater than 0 might mean that a transaction is holding locks and blocking other processes, or it might be keeping the transaction log from being cleared. If you ever notice open_tran values in higher than 2 or 3, it's a pretty good indication that a developer doesn't know much about SQL Server transaction management. I use this value all the time, and once had to troubleshoot a system where dozens of connections had open_tran values in the 20s and 30s! (For more details about nesting transactions, see my earlier post.

    So imagine my surprise when I discovered that the sys.dm_exec_sessions view, which is supposed to 'replace' sysprocesses in SQL Server 2005, has no column to provide this information!  Another view, sys.dm_exec_requests, has a column called open_transaction_count, which you might think would be the same thing. And it actually is the same information, but the sys.dm_exec_requests view only returns rows for sessions that are currently active, i.e. currently running a query. The need for examining the open_tran value is greatest when looking at those sessions that aren't doing anything, but are just sitting there with an open transaction. Those are the processes that you need to troubleshoot. I had felt that the omission of open_tran (or open_transaction_count) from sys.dm_exec_sessions was just an oversight, and it would be 'fixed' in the next release. But it appears I am to be disappointed. I just checked sys.dm_exec_sessions in RC0, and there is still no open_transaction_count column.

    So long live sysprocesses!

    ~Kalen


  • Did You Know? What Happened to My DVDs?

     

    Last December, I announced that my SQL Server Internals course was being made into a DVD series, and I then had several follow up posts over the next few months. I flew to NY 3 times, and have filmed 3 lessons. I was waiting to film the 4th, for which I already have the slides and the scripts, until the 2nd one was released, or at least until the editing of the 2nd was done.

    It is with deep regret that I have to announce that the rest of the series will not be available any time soon. Due to problems with the editing and production, I am going to have to find another producer. 

    I am currently spending all my spare time working on my next book, so I have no bandwidth to organize another production situation. If I could have just kept recording under the same conditions as the first 3, I probably could have squeezed out the time to keep making new lessons. As it is, I will not be able to start negotiating with a new producer until the bulk of the work on the new book is done. At this time, I anticipate that will be sometime in October.

    (In the meantime, don't forget the I offer this training live in the classroom. I have several public training partners, and can offer private deliveries as well.  My schedule is available on my website. )

    Thanks for all your support!

    ~Kalen


  • Did You Know? Things Keep Changing

     

    My Thursday commentary for the SQL Server Magazine Update e-newsletter discussed quiet changes in SQL Server 2005 and I just realized there's another one I wanted to mention.

    If you use Indexed Views at all, you're probably aware that there is a set of SET options that must on set appropriately in order for your indexed views to work as planned. One of the requirements is that ARITHABORT must be ON. In SQL Server 2000, this value had to be set explicitly. (But frequently it was set by your connection, so you didn't have to worry about it.) 

    In SQL 2005, if you (or your connection) enabled the setting ANSI_WARNINGS, ARITHABORT is automatically enabled.

    I dug around and found the reference to this change in the BOL at
    http://msdn.microsoft.com/en-us/library/ms190306.aspx

    What the article doesn't make completely clear is that even if you explicitly set ARITHABORT to OFF, as long as ANSI_WARNINGS is ON, SQL Server will behave as if ARITHABORT is ON. 

    The SET options are particularly important when updating a table on which an indexed view is built. If the SET options have the incorrect settings, the update will actually fail.

    Let's look at an example on SQL Server 2005, using the old pubs database.

     

    -- First, create the view and the index

    USE pubs
    GO
    CREATE VIEW sum_sales WITH SCHEMABINDING
    AS
    SELECT type, sum(isnull(ytd_sales,0)) AS total_sales,
                 count_big(*) AS number_sales
    FROM dbo.titles
    GROUP BY type;
    GO
    CREATE UNIQUE CLUSTERED INDEX idxv_sales_by_type
       ON sum_sales(type);
    GO

    -- Next, verify the settings

    SELECT ansi_warnings, arithabort
    FROM sys.dm_exec_sessions
    WHERE session_id = @@SPID;

    -- You should see that both options are ON (1). 
    -- Update the titles table, and it should succeed.

    UPDATE dbo.titles
    SET ytd_sales = ytd_sales + 1
    WHERE title_id = 'BU1032';

    --Now change ARITHABORT TO OFF, verify the settings, and update:

    SET ARITHABORT OFF;
    GO
    SELECT ansi_warnings, arithabort
    FROM sys.dm_exec_sessions
    WHERE session_id = @@SPID;
    GO
    UPDATE dbo.titles
    SET ytd_sales = ytd_sales - 1
    WHERE title_id = 'BU1032';

    -- The above should succeed.
    -- Now change ANSI_WARNINGS to OFF:

    SET ANSI_WARNINGS OFF;
    GO
    SELECT ansi_warnings, arithabort
    FROM sys.dm_exec_sessions
    WHERE session_id = @@SPID;
    GO
    UPDATE dbo.titles
    SET ytd_sales = ytd_sales - 1
    WHERE title_id = 'BU1032';

    -- The above should fail.

    -- If we change to SQL 2000 compatibility level,
    -- just setting ARITHABORT OFF will cause the update to fail:

    EXEC sp_dbcmptlevel pubs, 80;
    GO
    SET ANSI_WARNINGS ON;
    GO
    SET ARITHABORT OFF;
    GO
    SELECT ansi_warnings, arithabort
    FROM sys.dm_exec_sessions
    WHERE session_id = @@SPID;
    GO
    UPDATE dbo.titles
    SET ytd_sales = ytd_sales + 1
    WHERE title_id = 'BU1032';

    I'm sure there are lots more quiet changes. They're just so quiet I haven't found them yet!

    Have fun

    ~Kalen


  • Did You Know? Microsoft is/are Still Fixing Bugs On SQL Server 2005

    I taught a private class last week near Portland (Oregon) and although the company has a lot of SQL Servers, there was not much excitement at all about SQL Server 2008. They still have many SQL Server 2000 installations, and are debating whether to upgrade those to 2005. Most of the interest was in the real value of the new 2005 features, like partitioning, and the new large object types, as well as interest in the new metadata, such as the DMVs.

    They're trying to run a business, and keep their systems running well. They don't have a lot of time to get excited about features that are way in the future.

    I can't believe that my client is the only organization that is very interested in the continued health of SQL Server 2005, but I was surprised to see no blog posts here on SQLBLOG about the latest Cumulative Update for SQL 2005.

    Yesterday, Microsoft released CU#8 and you can read about it here:

    http://support.microsoft.com/kb/951217

    So what are you waiting for?

    ~Kalen


  • Did You Know: Jim Gray Tribute at UC Berkeley

    Almost as soon as the event was over, this blog post appeared on the NY Times site:

    http://bits.blogs.nytimes.com/2008/05/31/a-tribute-to-jim-gray-sometimes-nice-guys-do-finish-first/index.html

    The author neglected to mention that in addition to the fact that “The audience was a cross-section of the computer industry’s best and brightest”, it also included regular people, like me. I flew down to the Bay Area on Friday with my husband, and spent the day on the UC Berkeley campus, where I had been a student for 8 years, and lecturer for another 4. Although I was not there concurrently with Jim, we had many professors and colleagues in common. Attending the tribute and the technical sessions afterwards was an awesome experience.

    At end of morning, we had about 15 minutes before lunch, right after a couple of speakers were sharing about Jim's early days at IBM and Tandem. The moderator, Mike Stonebraker, asked audience members to share their stories and recollections of Jim from the 70's and 80's. I assumed there would be another chance in the afternoon for people to share stories from the 90's and beyond, but that didn't happen. So I will share my story here.

    As I was finishing my first book (upgrading Ron Soukup's Inside SQL Server, from version 6.5 to version 7) I needed to find someone to write the foreword. Jim Gray, who I only knew about from his book, had written the foreword to the 6.5 edition, and I asked a few colleagues at Microsoft if they thought he might agree to write a foreword for me. I was encouraged to contact him, and right after I sent the email to him, I discovered that he had just been awarded the Turing Award! Well of course, I couldn't expect that a Turing Award winner would respond to an email from a nobody like me, so I started looking around for someone else. I didn't spend long in the search, because Jim responded in a day, saying he'd be delighted to write the foreword, and also asking if I was going to be at the PASS conference.

    This was in 1998, and the very first PASS conference was being held in Chicago. I told him I was going to be there, and he suggested getting together for coffee. What could I say but "OK"?

    I arrived in Chicago late in the evening and stumbled into a strange hotel and went to bed. Chicago was 2-hours earlier than my normal time zone, so I planned on sleeping until 8:30. At 7 AM, the phone rang.  To the day, I clearly remember my impulse to shout into the phone "What in &#%* do you want at this ungodly hour in the morning?" But I stifled that impulse, and to this day I am grateful to my guardian angel for that decision. Because, as I'm sure you guessed, it was Jim. In a bright and chipper voice he asked "Would you like to go get a cup of coffee now?". Of course, I REALLY needed some. So we had coffee. And a wonderful meeting. I still think about that phone call, and try to remember to always answer the phone cheerfully, because you never know who is calling. I'm not always successful, but I try. And knowing Jim, and based on what I heard about on Saturday, I would guess that even if I hadn't stifled my first impulse, Jim would have forgiven me. He would have just apologized for the early call, and asked me out for coffee anyway.

    I met with Jim several more times after that over the years, including once by accident in the Microsoft cafeteria. He asked me to join him for lunch. I imagined he would be lunching with a group of people and was asking me to join the group, but it turned out he was by himself and just wanted someone to sit with. I was glad I was there. We talked about teaching and training, and how amazing he found it that some people could actually type while they talked!

    I just searched and found over 20 emails from Jim still in my Outlook folders, and in fact he's under 500 in my Xobni rankings. I doubt I'll ever be able to delete those mails.

    So Jim did write the foreword for my first book, as you can see:

    image

    He also wrote the foreword for my next 2 books. But by the one after that, he was gone. So I dedicated that book to him.

    You can read about Jim's accomplishments on his site at Microsoft Research:
    http://research.microsoft.com/~gray/

    ~Kalen


  • Did You Know? or rather, What Did You Wish You Knew?

    Thank you so much for all the responses to the DBA Blunders post! As I mentioned, however, that question was asked on behalf of a friend of mine. Now I have a question of my own.

    I am currently training a group of new junior DBAs. One of them has already started assisting with some client operations, but is still closely supervised.

    For those of you that are SQL Server DBAs:

    What did you wish you knew BEFORE your first day on the job?

    (I'm particularly looking for what you consider to be gaps in your education, but anything that answers the question is fine!)

    THANKS!!

    ~Kalen


  • Geek City: What's Worse Than a Table Scan?

    I have frequently heard SQL Server developers and DBAs gasp when a query plan is indicating that SQL Server is performing a table scan, thinking that is the worst thing that could ever happen to a query. The truth is, it's far from the worst thing and in addition, not all table scans are created equal.

    One thing that is far worse that a table scan is to execute a query plan that uses a nonclustered index, and having that plan look up every single row in a table! Although that is a horrible thing to behold, it is not the topic of this post.

    Today, I'm going to show you that two different table scans on the same data in a heap can give very different performance.

    The behavior has to do with a technique that SQL Server uses when a row in a heap is increased in size so it no longer fits in the original page. This usually occurs when a variable length column is updated to take more space.  If SQL Server just moved the row to another page, any nonclustered indexes would have to be updated to indicate the new page address.  (Remember, if the underlying table is a heap, nonclustered indexes point to the data row using a actual address.) Since there can be up to 249 nonclustered indexes on a single table, that could potentially be a LOT of work. So instead, when a row in a heap has to move, SQL Server leaves behind a forwarding pointer in place of the row that has moved. The nonclustered indexes continue to point to the old location, and then SQL Server just needs one more page lookup to find the new location. For just a few lookups, this expense is minimal and more than made up for my the savings of not having to update all the nonclustered indexes every time a row moves.

    However, what happens when there are LOTS of forwarding pointers?

    The metadata function sys.dm_db_index_physical_stats has a column that indicates how many forwarded records are in any table. For tables with clustered indexes, this will always be 0.

    Let's look at an example. I'll make a copy of the Person.Address table in the AdventureWorks database, and add a new varchar column to it. Initially, the column takes no space.

    USE AdventureWorks;
    GO
    IF EXISTS (SELECT 1 FROM sys.tables
                WHERE name = 'Address2' AND schema_id =1)
            DROP TABLE dbo.Address2;
    GO
    SELECT *, convert (varchar(500), 'comments') AS comments
       INTO Address2
    FROM Person.Address;
    GO
    -- note that the pages are almost full and there are no forwarded records
    SELECT index_type_desc, page_count, avg_page_space_used_in_percent,
         avg_record_size_in_bytes,forwarded_record_count
    FROM sys.dm_db_index_physical_stats(db_id('AdventureWorks'),
          
    object_id('Address2'),null, null, 'detailed');
    GO

    Now I'll increase the length of all the new columns and check the physical stats again:

    UPDATE Address2
    SET comments = replicate('a', 500);
    GO
    SELECT index_type_desc, page_count, avg_page_space_used_in_percent,
         avg_record_size_in_bytes,forwarded_record_count
    FROM sys.dm_db_index_physical_stats(db_id('AdventureWorks'),
          
    object_id('Address2'),null, null, 'detailed');
    GO

    The output shows me I have 1763 pages in the table and 15961 forwarded records.

    Let's see what happens when we read every row in the table:

    SET STATISTICS IO ON;
    SELECT * FROM Address2;
    SET STATISTICS IO OFF;

    The logical I/O value tells us that instead of just reading through every page, for a total of 1763 reads, SQL Server jumps out of sequence and follows the forwarding pointer for every forwarded record. So the number of logical reads is the sum of the number of pages plus the number of forwarded records:

    1763  +  15961= 17724

    I was discussing this behavior with my friend and colleague Tibor Karaszi and he proposed an explanation for this behavior. He related it to the same behavior that Itzik Ben-Gan has described for why SQL Server will always follow page pointers when scanning a clustered index if consistent reads are desired. The alternative would be to just read the pages in disk order, or page number order, which can be determined by examining the IAM structures for the object. For clustered tables, we need to follow the page pointers instead of the IAMs  to make sure that if a row is moved due to an update while the scan is occurring, that we don't read the same row twice (if the row is moved to a higher page number) or skip the row altogether (if the row is moved to a lower page number.)

    But what about a heap? Are there potential problems scanning a heap while updates are occurring? Could we potentially read the same row twice or skip a row, since there is no 'ordered list' to read? Tibor suggested the following:

    I believe that forwarding pointers take care of just that. Because of forwarding pointers, the "root" location for a row is stable. So, even if the row moves during a scan, the "root location"(forwarding stub) is at the same position. We have concluded that the scan uses the forwarding pointers when reading the rows. This means that a scan is not sensitive to row movements during the scan. It cannot "skip" rows that are there, or read the same row twice.

    So a few forwarding pointers are not a bad thing, but having lots of them can increase the work done during scans or partial scans by a considerable amount.

    So how do you get rid of forwarding pointers? There are 3 ways:

    1. If the row is updated, so that its size decreases, AND if there is still room on the page where the row came from, it will be moved back. This is not dependable, so it isn't really recommended as a solution.  When I updated my Address2 table, many of the forwarded records were moved, but not all:

    UPDATE Address2
    SET comments = '';
    GO
    SELECT index_type_desc, page_count, avg_page_space_used_in_percent,
         avg_record_size_in_bytes,forwarded_record_count
    FROM sys.dm_db_index_physical_stats(db_id('AdventureWorks'),
          
    object_id('Address2'),null, null, 'detailed');
    GO

    My results showed that I am still left with 1080 forwarded records. This is a great improvement over 15961, but it's still more forwarded records than there are pages in the table.

    2. Forwarded records will be cleaned up when you shrink the data file. This is definitely NOT recommended as a solution; I am only mentioning it for completeness. SQL Server does so much moving of data and updating nonclustered index pointers when shrinking a file, that updating the forwarded records is not very much extra work at all.

    3. Since forwarded records only exist in heaps, the best solution is to make the table not a heap. Build a clustered index, and all the forwarded records will go away. If you really don't want the clustered index, you can then drop it.

    Hopefully, this information will be useful to you.

    ~Kalen


More Posts Next page »

This Blog

Syndication