A while back, honorary SQL geek Sarah Sjolander (blog | twitter) posted her bucket list after watching the movie of the same name. I have yet to see that movie, but thought I’d share my own personal bucket list that (until now) resided only in my head. These items are listed in descending order of the possibility of them ever occurring. Enjoy!
Attend a Weird Al Yankovic concert – been wanting to get to one of Al’s shows since I discovered his music in 4th grade
Go Skydiving – my mother will kill me
Visit Hoover Dam – and go on the most technical tour they offer!
A long time ago I can remember wondering what happens when you try to update a value by setting it equal to itself – whether the database engine will actually carry out the update or just ignore it. At the time I didn’t think much of it, but it popped into my head again last week and I decided to find out. I’m sure I’m not the first to try this, but after a few searches yielded nothing I decided to blog about it too! Here’s what I came up with.
In my mind, there’s 2 types of update queries that should end up doing nothing:
1) Queries where the optimizer might not know the value matches, but the storage engine should figure it out:
SET col = 5 -- when col already equals 5
WHERE key = 'foo';
I can’t expect the optimizer to know that ‘col’ is already 5, but once the row gets looked up I would imagine the storage engine would check to see if the value will be changing before actually performing the update.
2) Queries where the optimizer should know that nothing will happen:
SET col = 5
WHERE col = 5;
I don’t by any means have inside knowledge of how the optimizer works, but I know it’s a smart cookie and I’m pretty sure it can figure out the above query will do nothing.
What I’m Looking For
So how will I know whether the database engine is really doing nothing on these queries? I really won’t, but I think I’m taking a reasonable approach by looking at both the transaction log and the buffer pool. Data being updated should result in entries being written to the transaction log as well as the appropriate data pages being marked “dirty” in the buffer pool so they can be flushed to disk during the next checkpoint. If either of those occur, I believe it’s safe to assume that a change has taken place.
Here’s the exact code that I ran along with screenshots of my results. Feel free to follow along on your own if you like, but please be sure to run this on a development box. As always, it is your responsibility to understand any code you are running on your system.
The test server I used is running SQL Server 2008 SP1, and the actual database used the full recovery model and contained no other tables.
First off, we want to make sure that our checks for dirty pages in the buffer pool are as accurate as possible. If pages in the pool are dirty and automatic checkpointing runs before we can query for them, we’ll see inaccurate results. Fortunately Microsoft has given us Trace Flag 3505 to disable automatic checkpointing. Be forewarned that this is an instance-level option affecting all databases, so this should not be used on a production machine! After that, we’ll create a table named ‘t’ and populate it with 5 rows, and then issue a manual checkpoint so all the operations associated with creating & filling the table are flushed from the log.
-- kill automatic checkpoints
DBCC TRACEON (3505);
CREATE TABLE t (
CONSTRAINT PK_t PRIMARY KEY CLUSTERED (a)
INSERT INTO t VALUES (1,'A');
INSERT INTO t VALUES (2,'B');
INSERT INTO t VALUES (3,'C');
INSERT INTO t VALUES (4,'D');
INSERT INTO t VALUES (5,'E');
Next we’ll take a look at the active portion of the transaction log to make sure it has in fact been cleared. To view the log, run this:
You should see two rows returned, representing the beginning and ending of the most recent checkpoint. Records previously written to the log are still there (until the log is backed up since this DB using the full recovery model) but are no longer in the active portion of the log since their changes were flushed to disk by the checkpoint. This is as clear as the log can get for us, so we’ll consider this to be what an “empty” log looks like.
Another point of interest is what data pages make up the table we just created. To do this you can use the DBCC IND command. You’ll need to specify the name of your database as the first parameter:
DBCC IND ('DB_Name','t',1);
You should see two rows returned like I just did. The PageIDs for pages assigned to the ‘t’ table are 153 and 154 in my case (your values may differ). The “PageType” column shows that page 154 is a IAM page (PageType = 10), and page 153 is the data page (PageType = 1), storing the data contained in the table. This information will come in handy shortly. The next set of queries will update a value to itself, check the transaction log, and check the buffer pool for dirty pages.
-- perform an update
SET b = 'C'
WHERE a = 3;
-- check log
SELECT [Current LSN], Operation, Context, [Transaction ID], AllocUnitName
FROM fn_dblog(null, null);
-- check buffer pool for dirty pages
select * from sys.dm_os_buffer_descriptors
where database_id = db_id() AND is_modified = 1
order by page_id;
You’ll notice right away that the messages say 1 row was affected, the active part of the transaction log has 4 entries (in addition to the 2 checkpoint ones discussed earlier) and there’s a dirty page in the buffer pool, so my theory is clearly debunked and I can end this post here. Not quite. If you look at the dirty page in the buffer pool, you’ll see it’s page #1129493 (in my case, your page number may vary), which is a far cry from the #153 and 154 we saw earlier as belonging to the ‘t’ table. In fact if you look at the transaction log entries, you’ll see that the modified page belongs to the sys.sysobjvalues system table. According to BOL, this table “contains a row for each general value property of an entity”. You’re welcome to query it if you like, but you’ll have to use the Dedicated Administrator Connection to do so, and nothing in it is user-readable anyway. Bottom line – the table in question, ‘t’, was not updated by the update statement.
For grins we’ll try it again, doing the same thing with a different row, but we’ll issue a CHECKPOINT beforehand to ensure the active portion of the transaction log is cleared.
-- clear log
-- perform a different update
SET b = 'B'
WHERE a = 2;
-- check log
SELECT [Current LSN], Operation, Context, [Transaction ID], AllocUnitName
FROM fn_dblog(null, null);
-- check for dirty pages
select * from sys.dm_os_buffer_descriptors
where database_id = db_id() AND is_modified = 1
order by page_id;
This time, we find that the log is “empty”, and the buffer pool contains no dirty pages at all. From my experiments, it appears that the sys.sysobjvalues table is only updated the first time a query is run on a table. If you repeat these tests with the second type of query I mentioned at the beginning, you should see similar results.
Once you’re all done experimenting, you’ll want to be sure to re-enable automatic checkpoints:
DBCC TRACEOFF (3505);
From what I’ve been able to observe, it appears the query optimizer and storage engine are indeed smart enough to realize when a value is being updated to be itself and instead opt to do nothing. This may have been common knowledge for some, but not for me. I welcome comments about my methods and/or results!
SQL SERVER LIES!! Maybe not, but I got a kick out of how the “Messages” screen returns “1 row(s) affected” when it’s evident that no rows are updated. I guess it depends on your definition of “affected”. The first thing I think of is that something was changed, but I guess it could also mean that a row was “touched” or “checked” by the storage engine, but not actually updated.
This was the first non-database related book that I’ve read in a long time, and its full title is actually “Dr. Deming: The American Who Taught the Japanese About Quality“. I was expecting to read about the life of W. Edwards Deming, but instead the book was more of a summary of his teachings and management methods. I was still glad I read it, as it takes an insightful look at the differences between the American and Japanese auto industries as well as management methods in general and how Deming’s methods improved both in Japan.
While I can imagine his name might be covered in business school classes, techies like myself may be in need of an introduction. William Edwards Deming was a professor, statistician, and consultant with a focus on improving quality through both statistical and managerial methods. His earlier years were spent at Western Electric’s Hawthorne Works in Cicero, Illinois, the remnants of which I pass by daily on the train ride to work. While at Western Electric, he worked with Walter Shewhart, known by many as the father of statistical quality control. Deming developed the statistical sampling methods used for the 1940 U.S. Census, and his experiences led to him being selected by the U.S. Army to help plan for the 1951 Japanese Census. In Japan, Deming gave lectures on process control through statistics and the overall idea of quality to hundreds of scientists and engineers. He also lectured top executives on how improving quality will reduce expenses while increasing both productivity and market share.
Deming later developed his practices into a series of lectures which he was hired to give to companies all over the World. The details of his lectures that were mentioned in the book are too numerous to cover here, but many of them centered around his list of 14 principles of management. Despite being geared towards industrial and manufacturing jobs, they can be applied to any position. I drew a lot of parallels to IT jobs, and a few of them piqued my interest:
You can’t inspect quality into a product. Deming argued that no level of inspection or testing could ensure a quality product. Quality must instead be built into a product from the very beginning. Drawing parallels to software design, I’d argue he’s spot on. Unit testing and other methods can’t ensure software will be error-free, as there are too many possible situations to cover in a test environment. I’ve seen lots of buggy software that has passed unit testing because either the tests weren’t sufficient or unforeseen changes happened elsewhere that caused the product to fail.
Eliminate management by numbers and numerical goals. Substitute leadership. I once worked in a shop where upper management was extremely numbers-oriented. If a decision could in any way be made based on values calculated in a spreadsheet, they’d find a way to do that. Deming argued that true leaders will know what needs to be done in order to further the business, and wouldn’t have to depend on numeric values for justification.
Cooperation over competition. Cooperation is a fundamental ingredient that leads to improvement, even when occurring between rival companies. Cooperation between all players helps to ensure that the industry as a whole moves forward, and everyone will benefit as a result. A lot of standards (e.g. Compact Discs, the USB interface, network protocols) have come from cooperation between competitors. Similarly a lot of promising technologies have gotten off to a very rocky start when competitors didn’t cooperate – one such example is the great VCR war between Betamax and VHS. In the beginning, consumers weren’t sure which one to buy, movie studios had to produce two versions of each movie on tape, and in the end a lot of people were left with Betamax VCRs that nobody made tapes for. I think the SQL Server community is an excellent example of cooperation over competition. The community has so many professionals with a lot of great tricks up their sleeve, but rather than keep them a secret they share their knowledge freely with anyone who’s interested. The more DBAs who can effectively solve problems as a result of this knowledge, the better off the SQL Server platform will be.
Dr. Deming was born on October 14, 1900; yesterday would have been his 110th birthday.
As this book was written in 1991, it does an excellent job of showing how the quality of Japanese products (particularly automobiles) is leading to Japan straining the US auto industry, and correctly predicts that Japan’s auto sales will overtake the US in the future. The emergence of China wasn’t really on the radar 19 years ago, and I’d be interested to see what the author’s take on more current events is. I’d imagine one could successfully argue that China’s manufacturing dominance is more linked to dirt-cheap prices than to quality, which pretty much throws the book’s main argument out the window. That being said, I won’t hold a book liable for not being able to predict all aspects of the future.
This Month’s T-SQL Tuesday topic comes from Sankar Reddy (blog | twitter) and he’s asking for SQL Server myths and misconceptions to be de-bunked. I’ve got 2 myths I’d like to set the record straight about. They’re rather closely related, and concern the “Lock Pages in Memory” setting.
What’s it do?
Without getting into an entire lesson on how operating systems manage memory, I’ll try to shrink this down into about a 15 second blurb for benefit of those who might not already know. Every computer has a finite amount of memory (RAM). Regardless of how much memory you have, all of the processes running on your computer combined will probably allocate more memory than is physically available. Since not every process needs all it’s memory at any given second, operating systems employ a memory management scheme knowing as “paging”, which means that inactive data stored in RAM will be written to disk in a file called the “page file”. This typically happens unbeknownst to the application whose data has just been paged. When the application needs this data again, it is read back from the page file into memory and the process continues. The downside of all this, of course, is that reading and writing to/from disk takes time, and is slower than if the data had just stayed in memory to begin with.
“Lock Pages in Memory” (LPM) allows some processes to skirt around this by declaring their allocated portions of memory as “locked”, therefore preventing the operating system from paging it to disk. This can result in significant performance gains for high-demand applications such as SQL Server. To grant a process the rights to utilize LPM, check out Microsoft’s instructions.
Now that I’m done with the setup, let’s get to the myths. Myth the first is that LPM is only available in SQL Server Enterprise Edition. This once was true, but is no longer the case. Around April of 2009 it was announced that users of SQL 2005 and 2008 Standard Edition are also able to join the party and utilize LPM. To take advantage of this, systems must be upgraded to at least:
SQL Server 2005 Cumulative Update 4 for Service Pack 3
SQL Server 2008 Cumulative Update 2 for Service Pack 1
So now let’s say you’re running Standard Edition of a version listed above (or later) and you have the Lock Pages In Memory security setting enabled for the SQL Server Service account, just like you’d do for Enterprise Edition. You’re now good to go, right? Not exactly, but it’s commonly thought that it should work at this point, hence the second myth. Just because you’re using Standard Edition, you’ll also need to set Trace Flag 845 to take advantage of LPM.
Please be forewarned that “Lock Pages In Memory” can actually decrease performance in certain cases. It is not a silver bullet, and should be tested thoroughly in a development environment before deploying to production. Side effects may include nausea, vomiting, and constipation. Do not use Lock Pages In Memory if you are pregnant or thinking about becoming pregnant. If pages in memory remain locked for more than four hours, you should seek immediate medical attention.
To set a Trace Flag on startup, open SQL Server Configuration Manager and select “SQL Server Services” from the list on the left. Then select the desired SQL Server service from the list on the right. Right-click on it, select “properties”, and choose the “Advanced” tab. In this case, you will want to make a change to the “Startup Parameters” value, by adding ";-T845" to the end of it. It should look something like the screenshot below. Once it does, click OK. The trace flag won’t take effect until the SQL Server service is restarted.
Once the service is restarted, look in the SQL Error Log. Shortly after startup you should see an entry that reads “Using locked pages for buffer pool.” The presence of this entry means that Lock Pages In Memory is active.
What about AWE?
Typically when Lock Pages in Memory is mentioned in conversation, Address Windowing Extensions (AWE) isn’t too far behind. AWE isn’t unique to SQL Server, it’s actually part of Windows – an API that allows 32-bit processes to map memory outside of their virtual address space. Did AWE in Standard Edition get the same treatment as LPM? It depends on your version.
For 32-bit, yes – AWE and Lock Pages in Memory (LPM) go hand-in-hand. Processes can’t utilize AWE without the ability to lock pages – it would be a lot more complicated otherwise. If you try enabling AWE without LPM, you’ll see error 5845, which tells you that AWE requires the ability to lock pages, which the current process doesn’t have. Conversely, pages will only be locked if they’re above the 32-bit process 4GB threshold, so granting the ability to use LPM without having AWE enabled will do nothing at all. In order to take advantage of your newly-minted right to lock pages in Standard Edition, Microsoft *had* to give you AWE, or it would be completely pointless.
If you’re using 64-bit SQL Server, you don’t need to care. AWE only applies to 32-bit processes, so the AWE setting is totally ignored in 64-bit versions. It’s not necessary anyway, as 64-bit process have enough address bits to go around. In fact, the feature is deprecated as of SQL Server 2008 R2. This makes sense, as it was announced over three years ago that Windows Server 2008 R2 would be the last Windows Server OS available for 32-bit architectures.
My “quick and dirty” lesson on paging barely scratches the surface – there’s really a lot more to it. If you’re curious and want to learn more, the Wikipedia article on paging is a great place to start
Lock Pages In Memory only applies to the SQL Server Buffer Pool, which is only one area of memory used by the database. Because LPM (and therefore AWE) can only benefit the buffer pool, it’s perfectly possible to have a server with a large quantity of memory returning “out of memory” errors, not because there’s not enough available memory, but because there’s not enough available memory in the right place.
When I started this blog I had zero intention of reviewing movies. I wanted to stick to technical topics with perhaps a few book reviews in-between. That goal hasn’t changed, but after seeing “The Social Network” this past weekend I decided an exception would be appropriate. In my opinion, the SQL Server community gets a lot of its “community” feel from social networking. This is especially true for Twitter, but I’m Facebook friends with a lot of my database tweeps as well.
What I liked
First thing I enjoyed was that I felt the movie had an acceptable level of nerd content. There was obviously a story to tell and it had to appeal to a wide variety of people, so I didn’t expect to see any hardcore programming syntax or the like. There were, however, quite a few references to linux, apache, commands such as wget (especially right at the beginning) and MySQL was mentioned at least once. There was also a sequence that took place in a lecture hall where Memory-Mapped I/O was being discussed which brought back some memories of undergrad!
I got a kick out of the portrayal of Mark Zuckerberg. They totally made him out to be an uber-geek who was nothing but cold, snarky, and pretty much emotionless throughout the entire film. I can understand that he is a nerd, and he always had a dozen things flying around in his head (and still does I’m sure) but I don’t really believe that he never had fun at any time throughout the process of creating Facebook.
Finally, I enjoyed the trip down memory lane this movie brought me by showing some of the facebook pages in their original design. While Facebook currently boasts over 500 million members, the vast majority of them really don’t know what facebook was like when it debuted. I feel privileged to say that I’ve been a user since the very beginning or really close to it: I signed up on May 3, 2004, which I know because I was able to find the original confirmation email. Your profile used to include the date you joined – I wish they still had that! Back in the day, (the)Facebook was pretty much just a profile and you could friend people and poke them and that was about it. Besides the general information fields about yourself, they had a tool for adding what courses you were in so you could find other people in your classes. You had 1 picture for your profile – photo albums didn’t come until much later. I can remember it being a huge deal when they added notifications that other people changed their profiles – before that you had to check all your friends individually because there was no other way.
What I didn’t like
I can’t pick out anything specific in the movie that I didn’t like. I thought it did an excellent job of telling a story, but throughout the movie I kept wondering if everything was really true or not. I’m sure all the big points are rooted in fact, but I have a hard time believing that everything really happened the way the movie says it did. I assume the actual truth differs in some ways. I’ve read several accounts that Mark Zuckerberg refuses to see the movie, though I’m unsure if that’s because he feels it’s untrue or if it is true and the truth hurts. I tend to believe the former.
All in all I enjoyed this movie – it tells a good story regardless of whether or not it’s true. It also didn’t change my opinion of Mark Zuckerberg or Facebook at all – I still think they’re both awesome. I’ve yet to find a better way to keep in contact with people I otherwise would have lost touch with long ago.