When I was a kid, my mom used to read me UPDATE queries every
night before I went to sleep. I heared many stories about updating
objects to database where most of them were the same old:
] SET Field1 = Value1,
Field2 = Value2, . . . WHERE PRIMARYKEY = TheKey
So, I grew up with the same ideas on how to update objects in
tables as other kids do. All the UPDATE queries involved taking all
the fields and the update stored procedures used to have all the
properties of the objects. If you are using some Code Generators
(e.g. Code Smith) and generating data access layer codes and stored
procedures for objects, you will see almost all the code generators
and ORM tools generate UPDATE statments with all the fields in the
SET block. Let me show you with an example how evil this idea
is.
Imagine a table like this:
dbo ] .
[ ChannelSubscribedByUser ] (
[ ID ]
[ int ]
IDENTITY ( 1 , 1 ) NOT
NULL , [
UserId ] [
int ] NOT
NULL , [
ChannelId ] [
int ] NOT
NULL , [
ReadRSSItemIDs ] [
image ] NOT
NULL , CONSTRAINT [ PK_ChannelSubscribedByUser ] PRIMARY KEY CLUSTERED ( [
UserId ] ASC
, [ ChannelId ] ASC
)
This is a table from Pageflakes database. In this table,
we store all the RSS feeds user has read from a particular RSS
channel. UserId is a foreign key to User table and ChannelID is a
foreign key to Channel table. Pretty straight forward. We had a
harmless update stored procedure generated using Code Smith using
the famous .NET Tiers template.
] . [
prcChannelSubscribedByUserUpdate
] @ID int , @ChannelId int , @ReadRSSItemIDs image , @UserId int AS UPDATE dbo. [
ChannelSubscribedByUser
] SET [
ChannelId ] =
@ChannelId , [
ReadRSSItemIDs ] =
@ReadRSSItemIDs , [
UserId ] =
@UserId WHERE [
ID ] =
@ID SELECT [ ID
] , [
UserId ] ,
[ ChannelId ] ,
[ ReadRSSItemIDs ] FROM
dbo. [ ChannelSubscribedByUser ] WHERE
[ ID ]
= @ID
Look at the query plan and see how horrible it really is:
There are 2 Clustered Index Seeks, one Table Spool (very
expensive), 2 Nested Loops, 1 Assert, 1 Clustered Index Seek. If
you look at the IO Statistics, you can see how truly evil this
query is:
Table ‘RSSChannel’. Scan count 0, logical reads 3, physical
reads 0, read-ahead reads 0, lob logical reads 0, lob physical
reads 0, lob read-ahead reads 0.
Table ‘PageFlakesUser’. Scan count 0, logical reads 3, physical
reads 0, read-ahead reads 0, lob logical reads 0, lob physical
reads 0, lob read-ahead reads 0.
Table ‘ChannelSubscribedByUser’. Scan count 1, logical reads 15,
physical reads 0, read-ahead reads 0, lob logical reads 3, lob
physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 1, logical reads 5, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
It is making SQL Server go to the tables which are refered as
foreign key during a single row update!
Generally when we update a row in a table which represents an
object, we rarely change the value of the fields which are foreign
keys to other tables and has index on them. Most of the time, the
updates are on the fields which contain properties, not relations.
For example, 99% of the cases, you will update properties of an
Employee object like FirstName, LastName,
Age etc. 1% case you will modify the CompanyID
(because s/he was fired) which is a foreign key to Company
Table. But if you go to your database and see the stored procedure
which updates the Employee object, you will see this:
LastName = @LastName ,
CompanyID = @CompanyID WHERE EmployeeID = @EmployeeID
Don’t be ashamed. I know we all have queries like this every
where.
If you remove those unwanted fields which generally have Foreign
Key and Index on them, you can gain significant performance
improvement. When I just change the UPDATE statement to this:
ChannelSubscribedByUser
] SET [
ReadRSSItemIDs ] =
@ReadRSSItemIDs , WHERE
[ ID ]
= @ID
See the query plan:
There’s just one Clustered Index Update. No Table Spool, no
Clustered Index seeks, no nested loops, no asserts. The IO
statistics shows how significant the improvement really is:
Table ‘ChannelSubscribedByUser’. Scan count 1, logical reads 6,
physical reads 0, read-ahead reads 0, lob logical reads 3, lob
physical reads 0, lob read-ahead reads 0.
There’s just one update and nothing else. Also the number of
Logical Reads is 6 compared to 15.
So, you thought you knew how to write UPDATE statements? Think
again.