UPDATEs That Really Aren’t

A long time ago I can remember wondering what happens when you try to update a value by setting it equal to itself – whether the database engine will actually carry out the update or just ignore it. At the time I didn't think much of it, but it popped into my head again last week and I decided to find out. I'm sure I'm not the first to try this, but after a few searches yielded nothing I decided to blog about it too! Here's what I came up with.

Query Types

In my mind, there's 2 types of update queries that should end up doing nothing:

  1. Queries where the optimizer might not know the value matches, but the storage engine should figure it out:
1UPDATE tbl<
2SET col = 5 -- when col already equals 5
3WHERE key = 'foo';

I can't expect the optimizer to know that 'col' is already 5, but once the row gets looked up I would imagine the storage engine would check to see if the value will be changing before actually performing the update.

  1. Queries where the optimizer should know that nothing will happen:
1UPDATE tbl
2SET col = 5
3WHERE col = 5;

I don't by any means have inside knowledge of how the optimizer works, but I know it's a smart cookie and I'm pretty sure it can figure out the above query will do nothing.

What I'm Looking For

So how will I know whether the database engine is really doing nothing on these queries? I really won't, but I think I'm taking a reasonable approach by looking at both the transaction log and the buffer pool. Data being updated should result in entries being written to the transaction log as well as the appropriate data pages being marked "dirty" in the buffer pool so they can be flushed to disk during the next checkpoint. If either of those occur, I believe it's safe to assume that a change has taken place.

The Experiment

Here's the exact code that I ran along with screenshots of my results. Feel free to follow along on your own if you like, but please be sure to run this on a development box. As always, it is your responsibility to understand any code you are running on your system.

The test server I used is running SQL Server 2008 SP1, and the actual database used the full recovery model and contained no other tables.

First off, we want to make sure that our checks for dirty pages in the buffer pool are as accurate as possible. If pages in the pool are dirty and automatic checkpointing runs before we can query for them, we'll see inaccurate results. Fortunately Microsoft has given us Trace Flag 3505 to disable automatic checkpointing. Be forewarned that this is an instance-level option affecting all databases, so this should not be used on a production machine! After that, we'll create a table named 't' and populate it with 5 rows, and then issue a manual checkpoint so all the operations associated with creating & filling the table are flushed from the log.

 1-- kill automatic checkpoints
 2DBCC TRACEON (3505);
 3
 4CREATE TABLE t (
 5	a INT,
 6	b CHAR(1),
 7	CONSTRAINT PK_t PRIMARY KEY CLUSTERED (a)
 8);
 9
10INSERT INTO t VALUES (1,'A');
11INSERT INTO t VALUES (2,'B');
12INSERT INTO t VALUES (3,'C');
13INSERT INTO t VALUES (4,'D');
14INSERT INTO t VALUES (5,'E');
15
16CHECKPOINT;

Next we'll take a look at the active portion of the transaction log to make sure it has in fact been cleared. To view the log, run this:

1SELECT [Current LSN], Operation, Context, [Transaction ID], AllocUnitName
2FROM fn_dblog(null, null);

You should see two rows returned, representing the beginning and ending of the most recent checkpoint. Records previously written to the log are still there (until the log is backed up since this DB using the full recovery model) but are no longer in the active portion of the log since their changes were flushed to disk by the checkpoint. This is as clear as the log can get for us, so we'll consider this to be what an "empty" log looks like.

Another point of interest is what data pages make up the table we just created. To do this you can use the DBCC IND command. You'll need to specify the name of your database as the first parameter:

1DBCC IND ('DB_Name','t',1);

The Result:

You should see two rows returned like I just did. The PageIDs for pages assigned to the 't' table are 153 and 154 in my case (your values may differ). The "PageType" column shows that page 154 is a IAM page (PageType = 10), and page 153 is the data page (PageType = 1), storing the data contained in the table. This information will come in handy shortly. The next set of queries will update a value to itself, check the transaction log, and check the buffer pool for dirty pages.

 1-- perform an update
 2UPDATE t
 3SET b = 'C'
 4WHERE a = 3;
 5
 6-- check log
 7SELECT [Current LSN], Operation, Context, [Transaction ID], AllocUnitName
 8FROM fn_dblog(null, null);
 9
10-- check buffer pool for dirty pages
11select * from sys.dm_os_buffer_descriptors
12where database_id = db_id() AND is_modified = 1
13order by page_id;

Result:

Result

You'll notice right away that the messages say 1 row was affected, the active part of the transaction log has 4 entries (in addition to the 2 checkpoint ones discussed earlier) and there's a dirty page in the buffer pool, so my theory is clearly debunked and I can end this post here. Not quite. If you look at the dirty page in the buffer pool, you'll see it's page #1129493 (in my case, your page number may vary), which is a far cry from the #153 and 154 we saw earlier as belonging to the 't' table. In fact if you look at the transaction log entries, you'll see that the modified page belongs to the sys.sysobjvalues system table. According to BOL, this table "contains a row for each general value property of an entity". You're welcome to query it if you like, but you'll have to use the Dedicated Administrator Connection to do so, and nothing in it is user-readable anyway. Bottom line – the table in question, 't', was not updated by the update statement.

For grins we'll try it again, doing the same thing with a different row, but we'll issue a CHECKPOINT beforehand to ensure the active portion of the transaction log is cleared.

 1-- clear log
 2CHECKPOINT;
 3
 4-- perform a different update
 5UPDATE t
 6SET b = 'B'
 7WHERE a = 2;
 8
 9-- check log
10SELECT [Current LSN], Operation, Context, [Transaction ID], AllocUnitName
11FROM fn_dblog(null, null);
12
13-- check for dirty pages
14select * from sys.dm_os_buffer_descriptors
15where database_id = db_id() AND is_modified = 1
16order by page_id;

This time, we find that the log is "empty", and the buffer pool contains no dirty pages at all. From my experiments, it appears that the sys.sysobjvalues table is only updated the first time a query is run on a table. If you repeat these tests with the second type of query I mentioned at the beginning, you should see similar results.

Once you're all done experimenting, you'll want to be sure to re-enable automatic checkpoints:

1DBCC TRACEOFF (3505);

Conclusions

From what I've been able to observe, it appears the query optimizer and storage engine are indeed smart enough to realize when a value is being updated to be itself and instead opt to do nothing. This may have been common knowledge for some, but not for me. I welcome comments about my methods and/or results!

Other Thoughts

SQL SERVER LIES!! Maybe not, but I got a kick out of how the "Messages" screen returns "1 row(s) affected" when it's evident that no rows are updated. I guess it depends on your definition of "affected". The first thing I think of is that something was changed, but I guess it could also mean that a row was "touched" or "checked" by the storage engine, but not actually updated.