Web-based real-time SQL Server Performance Dashboard

SQL Server Performance Dashboard (SSPD) is a small open source web app that shows you performance & problems of one or more SQL Server instances and their databases in near real time. It uses the Dynamic Management Views (DMV) to gather useful data from the verbose output and combines them with utility stored procs in order to get meaningful, easy to understand information out of them. You can use it to quickly spot blocking queries, who is blocking who, expensive query that are consuming high CPU or disk, see if there’s unusual locks, very high disk activity and so on.

See a live demo: http://dashboard.omaralzabir.com/

You can read details about the tool from this CodeProject article: http://www.codeproject.com/Articles/799053/Web-based-real-time-SQL-Server-Performance-Dashboa

The binaries are here, which you can just extract into a IIS folder, put the connection strings in the web.config file and you are ready to roll:

SqlServerPerformanceDashboard GitHub Project Binaries

Or you can get the source code from the GitHub project site: https://github.com/oazabir/SQLServerDashboard

Building High Performance Queue in Database for storing Orders, Notifications, Tasks

We have Queues everywhere. There are queues for asynchronously sending notifications like email and SMS in most websites. E-Commerce sites have queues for storing orders, processing and dispatching them. Factory Assembly line automation systems have queues for running tasks in parallel, in a certain order. Queue is a widely used data structure that sometimes have to be created in a database instead of using specialized queue technologies like MSMQ. Running a high performance and highly scalable queue using database technologies is a big challenge and it’s hard to maintain when the queue starts to get millions of rows queued and dequeued per day. Let me show you some common design mistakes made in designing Queue-like tables and how to get maximum performance and scalability from a queue implemented using simple database features.

Queue

Let’s first identify the challenges you have in such queue tables:

  • The table is both read and write. Thus queuing and dequeuing impact each other and cause lock contention, transaction deadlocks, IO timeouts etc under heavy load.
  • When multiple receivers try to read from the same queue, they randomly get duplicate items picked out of the queue, thus resulting in duplicate processing. You need to implement some kind of high performance row lock on the queue so that same item never gets picked up by concurrent receivers.
  • The Queue table needs to store rows in certain order and read in certain order, which is an index design challenge. It’s not always first in and first out. Sometimes Orders have higher priority and need to be processed regardless of when they are queued.
  • The Queue table needs to store serialized objects in XML or binary form, which becomes a storage and index rebuild challenge. You can’t rebuild index on the Queue table because it contains text and/or binary fields. Thus the tables keep getting slower and slower every day and eventually queries start timing out until you take a downtime and rebuild the indexes.
  • During dequeue, a batch of rows are selected, updated and then returned for processing. You have a “State” column that defines the state of the items. During dequeue, you select items of certain state. Now State only has a small set of values eg PENDING, PROCESSING, PROCESSED, ARCHIVED. As a result, you cannot create index on “State” column because that does not give you enough selectivity. There can be thousands of rows having the same state. As a result, any dequeue operation results in a clustered index scan that’s both CPU and IO intensive and produces lock contention.
  • During dequeue, you cannot just remove the rows from table because that causes fragmentation in the table. Moreover, you need to retry orders/jobs/notification N times  incase they fail on first attempt. This means rows are stored for longer period, indexes keep growing and dequeue gets slower day by day.
  • You have to archive processed items from the Queue table to a different table or database, in order to keep the main Queue table small. That means moving large amount of rows of some particular status to another database. Such large data removal leaves the table highly defragmented causing poor queue/dequeue performance.
  • You have a 24×7 business. You have no maintenance window where you can take a downtime and archive large number of rows. This means you have to continuously archive rows without affecting production queue-dequeue traffic.

If you have implemented such queue tables, you might have suffered from one or more of the above challenges. Let me give you some tips on how to overcome these challenges and how to design and maintain a high performance queue table.

Read the article for details:

http://www.codeproject.com/KB/database/fastqueue.aspx

Please vote if you find this useful.

Exporting normalized relational data from database to flat file format

Sometimes you need to export relational normalized data into flat files where a single row comes from various tables. For example, say you want to export all customer records along with their work and home address, and primary phone number in a single row. But the address and contact information are coming from different tables and there can be multiple rows in those table for a single customer. Sometimes there can be no row available in address/phone table for a customer. In such a case, neither INNER JOIN, nor LEFT JOIN/OUTER JOIN will work. How do you do it?

Solution is to use OUTER APPLY.

Consider some tables like this:

Customer Table

CustomerID FirstName LastName DOB
1 Scott Guthrie 1/1/1950
2 Omar AL Zabir 1/1/1982

Contact table

CustomerID ContactType ContactValue IsPrimary
1 WorkAddress Microsoft TRUE
1 HomeAddress Seattle FALSE
1 Phone 345345345 FALSE
1 Phone 123123123 TRUE
2 WorkAddress London TRUE
2 Phone 1312123123 FALSE

We need to create a flat file export from this where the output needs to look like:

CustomerID FirstName LastName DOB HomeAddress WorkAddress PrimaryPhone IsPhonePrimary
1 Scott Guthrie 1/1/1950 Seattle Microsoft 123123123 Yes
2 Omar AL Zabir 1/1/1982 No Home Address London 1312123123 No

There are some complex requirement in the output:

  • If customer has multiple phone, then it needs to select the one which is flagged as primary.
  • If customer has no home address, then it needs to show “No home address” instead of NULL.
  • It needs to tell if the phone address we got is primary phone or not.

The query to generate this will be:

SELECT 
c.CustomerID,
c.FirstName,
c.LastName,
c.DOB,

'HomeAddress' =
CASE
WHEN home.ContactValue IS NULL THEN 'No Home Address'
ELSE home.ContactValue
END,
work.ContactValue,
phone.ContactValue as PrimaryPhone,
'IsPhonePrimary' =
CASE
WHEN phone.IsPrimary = 1 THEN 'Yes'
ELSE 'No'
END
FROM Customer c

OUTER APPLY (
SELECT TOP 1 ContactValue from Contact WHERE CustomerID = c.CustomerID
AND ContactType = 'HomeAddress'
ORDER BY IsPrimary DESC
) AS home

OUTER APPLY (
SELECT TOP 1 ContactValue from Contact WHERE CustomerID = c.CustomerID
AND ContactType = 'WorkAddress'
ORDER BY IsPrimary DESC
) AS work

OUTER APPLY (
SELECT TOP 1 ContactValue, IsPrimary from Contact WHERE CustomerID = c.CustomerID
AND ContactType = 'Phone'
ORDER BY IsPrimary DESC
) AS phone

All the tricks are in the OUTER APPLY blocks. OUTER APPLY selects the row that needs to appear as the value of the columns in the output after the customer table fields.

The primary address is selected by reverse ordering the rows selected from Contact table by IsPrimary field. Thus the rows having True comes first.

Optimize ASP.NET Membership Stored Procedures for greater speed and scalability

Last year at Pageflakes,
when we were getting millions of hits per day, we were having query
timeout due to lock timeout and Transaction Deadlock errors. These
locks were produced from aspnet_Users and
aspnet_Membership tables. Since both of these tables
are very high read (almost every request causes a read on these
tables) and high write (every anonymous visit creates a row on
aspnet_Users), there were just way too many locks
created on these tables per second. SQL Counters showed thousands
of locks per second being created. Moreover, we had queries that
would select thousands of rows from these tables frequently and
thus produced more locks for longer period, forcing other queries
to timeout and thus throw errors on the website.

If you have read my last blog post, you know why such locks
happen. Basically every table when it grows up to hold millions of
records and becomes popular goes through this trouble. It’s
just a part of scalability problem that is common to database. But
we rarely take prevention about it in our early design.

The solution is simple, you should either have WITH
(NOLOCK)
or SET TRANSACTION ISOLATION LEVEL READ
UNCOMMITTED
before SELECT queries. Either of this will do.
They tell SQL Server not to hold any lock on the table while it is
reading the table. If some row is locked while the read is
happening, it will just ignore that row. When you are reading a
table thousand times per second, without these options, you are
issuing lock on many places around the table thousand times per
second. It not only makes read from table slower, but also so many
lock prevents insert, update, delete from happening timely and thus
queries timeout. If you have queries like “show the currently
online users from last one hour based on
LastActivityDate field”, that is going to issue
such a wide lock that even other harmless select queries will
timeout. And did I tell you that there’s no index on
LastActivityDate on aspnet_Users
table?

Now don’t blame yourself for not putting either of these
options on your every stored proc and every dynamically generated
SQL from the very first day. ASP.NET developers made the same
mistake. You won’t see either of these used in any of the
stored procs used by ASP.NET Membership. For example, the following
stored proc gets called whenever you access Profile
object:

ALTER PROCEDURE [dbo].[aspnet_Profile_GetProperties]
@ApplicationName
nvarchar(256),
@UserName nvarchar(256),
@CurrentTimeUtc datetime
AS
BEGIN

DECLARE
@ApplicationId uniqueidentifier
SELECT
@ApplicationId = NULL
SELECT
@ApplicationId = ApplicationId FROM
dbo.aspnet_Applications WHERE LOWER(@ApplicationName) = LoweredApplicationName
IF (@ApplicationId IS NULL)
RETURN

DECLARE
@UserId uniqueidentifier
DECLARE
@LastActivityDate datetime
SELECT
@UserId = NULL

SELECT
@UserId = UserId, @LastActivityDate = LastActivityDate
FROM dbo.aspnet_Users
WHERE ApplicationId = @ApplicationId AND LoweredUserName = LOWER(@UserName)

IF (@UserId IS NULL)
RETURN
SELECT TOP
1 PropertyNames, PropertyValuesString, PropertyValuesBinary
FROM dbo.aspnet_Profile
WHERE UserId = @UserId

IF (@@ROWCOUNT > 0)
BEGIN
UPDATE
dbo.aspnet_Users
SET LastActivityDate=@CurrentTimeUtc
WHERE UserId = @UserId
END
END

There are two
SELECT operations that hold lock on two very high read tables
aspnet_Users and aspnet_Profile.
Moreover, there’s a nasty UPDATE statement. It tries to
update the LastActivityDate of a user whenever you
access Profile object for the first time within a http
request.

This stored proc alone is enough to bring your site down. It did
to us because we are using Profile Provider
everywhere. This stored proc was called around 300 times/sec. We
were having nightmarish slow performance on the website and many
lock timeouts and transaction deadlocks. So, we added the
transaction isolation level and we also modified the UPDATE
statement to only perform an update when the
LastActivityDate is over an hour. So, this means, the
same user’s LastActivityDate won’t be
updated if the user hits the site within the same hour.

So, after the modifications, the stored proc looked like
this:

ALTER PROCEDURE [dbo].[aspnet_Profile_GetProperties]
@ApplicationName
nvarchar(256),
@UserName nvarchar(256),
@CurrentTimeUtc datetime
AS
BEGIN
-- 1. Please no more locks during reads
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

DECLARE @ApplicationId uniqueidentifier
--SELECT @ApplicationId = NULL
--SELECT @ApplicationId = ApplicationId FROM dbo.aspnet_Applications
WHERE LOWER(@ApplicationName) = LoweredApplicationName
--IF (@ApplicationId IS NULL)
-- RETURN

-- 2. No more call to Application table. We have only one app dude!
SET @ApplicationId = dbo.udfGetAppId()

DECLARE @UserId uniqueidentifier
DECLARE
@LastActivityDate datetime
SELECT
@UserId = NULL

SELECT
@UserId = UserId, @LastActivityDate = LastActivityDate
FROM dbo.aspnet_Users
WHERE ApplicationId = @ApplicationId AND LoweredUserName = LOWER(@UserName)

IF (@UserId IS NULL)
RETURN
SELECT TOP
1 PropertyNames, PropertyValuesString, PropertyValuesBinary
FROM dbo.aspnet_Profile
WHERE UserId = @UserId

IF (@@ROWCOUNT > 0)
BEGIN
-- 3. Do not update the same user within an hour
IF DateDiff(n, @LastActivityDate, @CurrentTimeUtc) > 60
BEGIN
-- 4. Use ROWLOCK to lock only a row since we know this query
-- is highly selective
UPDATE dbo.aspnet_Users WITH(ROWLOCK)
SET LastActivityDate=@CurrentTimeUtc
WHERE UserId = @UserId
END
END
END

The changes I
made are numbered and commented. No need for further explanation.
The only tricky thing here is, I have eliminate call to Application
table just to get the ApplicationID from ApplicationName. Since
there’s only one application in a database (ever heard of
multiple applications storing their user separately on the same
database and the same table?), we don’t need to look up the
ApplicationID on every call to every Membership stored proc. We can
just get the ID and hard code it in a function.

CREATE FUNCTION dbo.udfGetAppId()
RETURNS uniqueidentifier
WITH EXECUTE AS
CALLER
AS
BEGIN
RETURN CONVERT
(uniqueidentifier, 'fd639154-299a-4a9d-b273-69dc28eb6388')
END;

This UDF returns the ApplicationID that I have
hardcoded copying from the Application table. Thus it eliminates
the need for quering on the Application table.

Similarly you should do the changes in all other stored
procedures that belong to Membership Provider. All the stroc procs
are missing proper locking, issues aggressive lock during update
and too frequent updates than practical need. Most of them also try
to resolve ApplicationID from ApplicationName, which is unnecessary
when you have only one web application per database. Make these
changes and enjoy lock contention free super performance from
Membership Provider!


kick it on DotNetKicks.com

Linq to SQL solve Transaction deadlock and Query timeout problem using uncommitted reads

When your database tables start accumulating thousands of rows
and many users start working on the same table concurrently, SELECT
queries on the tables start producing lock contentions and
transaction deadlocks. This is a common problem in any high volume
website. As soon as you start getting several concurrent users
hitting your website that results in SELECT queries on some large
table like aspnet_users table that are also being updated
very frequently, you end up having one of these errors:

Transaction (Process ID ##) was deadlocked on lock resources
with another process and has been chosen as the deadlock victim.
Rerun the transaction.

Or,

Timeout Expired. The Timeout Period Elapsed Prior To Completion
Of The Operation Or The Server Is Not Responding.

The solution to these problems are – use proper index on
the table and use transaction isolation level Read
Uncommitted
or WITH (NOLOCK) in your SELECT queries. So,
if you had a query like this:

SELECT * FORM aspnet_users
where ApplicationID =’xxx’ AND LoweredUserName = 'someuser'

You should end up having any of the above errors under high
load. There are two ways to solve this:

SET TRANSACTION LEVEL READ UNCOMMITTED;
SELECT * FROM aspnet_Users
WHERE ApplicationID =’xxx’ AND LoweredUserName = 'someuser'

Or use the WITH (NOLOCK):

SELECT * FROM aspnet_Users WITH (NOLOCK)
WHERE ApplicationID =’xxx’ AND LoweredUserName = 'someuser'

The reason for the errors are that since aspnet_users is
a high read and high write table, during read, the table is
partially locked and during write, it is also locked. So, when the
locks overlap on each other from several queries and especially
when there’s a query that’s trying to read a large
number of rows and thus locking large number of rows, some of the
queries either timeout or produce deadlocks.

Linq to Sql does not produce queries with the WITH
(NOLOCK)
option nor does it use READ UNCOMMITTED. So, if
you are using Linq to SQL queries, you are going to end up with any
of these problems on production pretty soon when your site becomes
highly popular.

For example, here’s a very simple query:

using (var db = new DropthingsDataContext()) { var user = db.aspnet_Users.First(); var pages = user.Pages.ToList(); }

DropthingsDataContext is a DataContext built from Dropthings database.

When you attach SQL Profiler, you get this:

You see none of the queries have READ UNCOMMITTED or WITH

(NOLOCK).

The fix is to do this:

using (var db = new DropthingsDataContext2()) { db.Connection.Open(); db.ExecuteCommand("SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;"); var user = db.aspnet_Users.First(); var pages = user.Pages.ToList(); }

This will result in the following profiler output

As you see, both queries execute within the same connection and
the isolation level is set before the queries execute. So, both
queries enjoy the isolation level.

Now there’s a catch, the connection does not close. This
seems to be a bug in the DataContext that when it is disposed, it
does not dispose the connection it is holding onto.

In order to solve this, I have made a child class of the
DropthingsDataContext named DropthingsDataContext2
which overrides the Dispose method and closes the
connection.

   class DropthingsDataContext2 : DropthingsDataContext, IDisposable { public new void Dispose() { if (base.Connection != null) if (base.Connection.State != System.Data.ConnectionState.Closed) { base.Connection.Close(); base.Connection.Dispose(); } base.Dispose(); } }

This solved the connection problem.

There you have it, no more transaction deadlock or lock
contention from Linq to SQL queries. But remember, this is only to
eliminate such problems when your database already has the right
indexes. If you do not have the proper index, then you will end up
having lock contention and query timeouts anyway.

There’s one more catch, READ UNCOMMITTED will return rows
from transactions that have not completed yet. So, you might be
reading rows from transactions that will rollback. Since
that’s generally an exceptional scenario, you are more or
less safe with uncommitted read, but not for financial applications
where transaction rollback is a common scenario. In such case, go
for committed read or repeatable read.

There’s another way you can achieve the same, which seems
to work, that is using .NET Transactions. Here’s the code
snippet:

using (var transaction = new TransactionScope( TransactionScopeOption.RequiresNew, new TransactionOptions() { IsolationLevel = IsolationLevel.ReadUncommitted, Timeout = TimeSpan.FromSeconds(30) })) { using (var db = new DropthingsDataContext()) { var user = db.aspnet_Users.First(); var pages = user.Pages.ToList(); transaction.Complete(); } }

Profiler shows a transaction begins and ends:

The downside is it wraps your calls in a transaction. So, you

are unnecessarily creating transactions even for SELECT operations.
When you do this hundred times per second on a web application,
it’s a significant over head.

Some really good examples of deadlocks are given in this
article:

http://www.code-magazine.com/article.aspx?quickid=0309101&page=2

I highly recommend it.

 

99.99% available ASP.NET and SQL Server SaaS Production Architecture

You have a hot ASP.NET+SQL Server product, growing at thousand
users per day and you have hit the limit of your own garage hosting
capability. Now that you have enough VC money in your pocket, you
are planning to go out and host on some real hosting facility,
maybe a colocation or managed hosting. So, you are thinking, how to
design a physical architecture that will ensure performance,
scalability, security and availability of your product? How can you
achieve four-nine (99.99%) availability? How do you securely let
your development team connect to production servers? How do you
choose the right hardware for web and database server? Should you
use Storage Area Network (SAN) or just local disks on RAID? How do
you securely connect your office computers to production
environment?

Here I will answer all these queries. Let me first show you a
diagram that I made for Pageflakes where we ensured we get
four-nine availability. Since Pageflakes is a Level 3
SaaS
, it’s absolutely important that we build a high
performance, highly available product that can be used from
anywhere in the world 24/7 and end-user gets quick access to their
content with complete personalization and customization of content
and can share it with others and to the world. So, you can take
this production architecture as a very good candidate for Level 3
SaaS:


Hosting_environment

Here’s a CodeProject article that explains all the
ideas:

99.99% available ASP.NET and SQL Server SaaS Production
Architecture

Hope you like it. Appreciate your vote.


kick it on DotNetKicks.com

My first book – Building a Web 2.0 Portal with ASP.NET 3.5

My first book “Building a Web 2.0 Portal with ASP.NET 3.5” from
O’Reilly is published and available in the stores. This book
explains in detail the architecture design, development, test,
deployment, performance and scalability challenges of my open
source web portal Dropthings.com. Dropthings is a prototype of a web
portal similar to iGoogle or Pageflakes. But this portal is developed using
recently released brand new technologies like ASP.NET 3.5, C# 3.0,
Linq to Sql, Linq to XML, and Windows Workflow foundation. It makes
heavy use of ASP.NET AJAX 1.0. Throughout my career I have built
several state-of-the-art personal, educational, enterprise and mass consumer web
portals
. This book collects my experience in building all of
those portals.

O’Reilly Website:
http://www.oreilly.com/catalog/9780596510503/

Amazon:

http://www.amazon.com/Building-Web-2-0-Portal-ASP-NET/dp/0596510500

Disclaimer: This book does not show you how to build Pageflakes.
Dropthings is entirely different in terms of architecture,
implementation and the technologies involved.

You learn how to:

  • Implement a highly decoupled architecture following the popular
    n-tier, widget-based application model
  • Provide drag-and-drop functionality, and use ASP.NET 3.5 to
    build the server-side part of the web layer
  • Use LINQ to build the data access layer, and Windows Workflow
    Foundation to build the business layer as a collection of
    workflows
  • Build client-side widgets using JavaScript for faster
    performance and better caching
  • Get maximum performance out of the ASP.NET AJAX Framework for
    faster, more dynamic, and scalable sites
  • Build a custom web service call handler to overcome
    shortcomings in ASP.NET AJAX 1.0 for asynchronous, transactional,
    cache-friendly web services
  • Overcome JavaScript performance problems, and help the user
    interface load faster and be more responsive
  • Solve various scalability and security problems as your site
    grows from hundreds to millions of users
  • Deploy and run a high-volume production site while solving
    software, hardware, hosting, and Internet infrastructure
    problems

If you’re ready to build state-of-the art, high-volume web
applications that can withstand millions of hits per day, this book
has exactly what you need.

A significant part of sql server process memory has been paged out. This may result in performance degradation

If you are using SQL Sever Server standard edition 64 bit on a
Windows 2003 64bit, you will frequently encounter this problem
where SQL Server says:

A significant part of sql server process memory has been paged
out. This may result in performance degradation. Duration 0
seconds. Working set (KB) 25432, committed (KB) 11296912, memory
utilization 0%

The number in working set and duration will vary. What happens
here is SQL Server is forced to release memory to operating system
because some other application or OS itself needs to allocate
RAM.

We went through many support articles like:

  • 918483:
    How to reduce paging of buffer pool memory in the 64-bit version of
    SQL Server 2005
  • 905865:
    The sizes of the working sets of all the processes in a console
    session may be trimmed when you use Terminal Services to log on to
    or log off from a computer that is running Windows Server 2003
  • 920739:
    You may experience a decrease in overall system performance when
    you are copying files that are larger than approximately 500 MB in
    Windows Server 2003 Service Pack 1

But nothing solved the problem. We still have the page out
problem happening every day.

The server has 16 GB RAM where 12 GB is maximum limit allocated
to SQL Server. 4 GB is left to OS and and other application. We
have also turned off antivirus and any large backup job. 12 GB RAM
should be plenty because there’s no other app running on the
dedicated SQL Server box. But the page out still happens. When this
happens, SQL Server becomes very slow. Queries timeout, website
throws error, transactions abort. Sometimes this problems goes on
for 30 to 40 minutes and website becomes slow/unresponsive during
that time.

I have found what causes SQL Server to page out. File System
cache somehow gets really high and forces SQL Server to trim
down.

clip_image002

You see the System cache resident bytes are very high. During
this time SQL Server gets much less RAM than it needs. Queries
timeout at very high rate like 15 per sec. Moreover, there’s high
SQL Lock Timeout/sec (around 15/sec not captured in screen
shot).

clip_image004

SQL Server max memory is configured 12 GB. But here it shows
it’s getting less than 8 GB.

While the file system cache is really high, there’s no
process that’s taking significant RAM.

clip_image006

After I used SysInternal’s
CacheSet
to reset file system cache and set around 500 MB as
max limit, memory started to free up.

clip_image008

SQL Server started to see more RAM free:

clip_image010

Then I hit the “Clear” button to clear file system
cache and it came down dramatically.

clip_image012

Paging stopped. System cache was around 175 MB only. SQL Server
lock timeout came back to zero. Everything went back to normal.

So, I believe there’s either some faulty driver or the OS itself
is leaking file system cache in 64bit environment.

What we have done is, we have a dedicated person who goes to
production database servers every hour, runs the CacheSet program
and clicks “Clear” button. This clears the file system cache and
prevents it from growing too high.

There are lots of articles written about this problem. However,
the most informative one I have found is from the SQL Server PSS
team:


http://blogs.msdn.com/psssql/archive/2007/05/31/the-sql-server-working-set-message.aspx

UPDATE – THE FINAL SOLUTION!

The final solution is to run this program on Windows
Startup:

SetSystemFileCacheSize 128 256

This sets the lower and higher limit for the System Cache. You
need to run this on every windows startup because a restart will
undo the cache setting to unlimited.

You can run the program without any parameter to see what is the
current setting.

Download the program from this page:

http://www.uwe-sieber.de/ntcacheset_e.html

Go to the end and you will get the link to the
SetSystemFileCacheSize.zip

Cleanup inactive anonymous users from ASP.NET Membership Tables

ASP.NET 2.0 Websites that allow anonymous visit and anonymous
user profile have a unique challenge to cleanup unused data which
is generated by anonymous users who never come back. Every first
visit is creating one anonymous user, page setup, and other user
specific content. If the user never comes back, it still remains in
the database permanently. It is possible user might come back
within a day, or a week or a month. But there’s no guaranty
if user will ever come back or not. Generally sticky users are max
30% of the total users who come to most websites. So, you end up
with 70% unused data which are never needed. All these requires
cleanup, otherwise the database keeps growing uncontrollably and
gets slower and slower. This cleanup operation is humongous for
busy websites. Think about deleting millions of rows from several
tables, one after another while maintaining foreign key
constraints. Also the cleanup operation needs to run while the site
is running, without hampering site’s overall performance. The whole
operation results in heavily fragmented index and space in the MDF
file. The log file also becomes enormous in order to keep track of
the transactions. Hard drives get really hot and start sweating
furiously. While the CPU keeps smiling having nothing to do with
it, it’s really painful to watch SQL Server go through this
every day. Unless you clean up the database and maintain its size
under control; you can’t keep up with SQL Server’s RAM and
Disk IO requirement.

When a user visits the site, Asp.net Membership Provider updates
the LastActivityDate of aspnet_users table. From this field, I can
find out how long the user has been idle. The IsAnonymous bit field
tells me whether the user account is anonymous or registered. If it
is registered, no need to worry. But if it is anonymous and more
than 30 days old, I can be sure that the user will never come back
because the cookie has already expired. If you repeatedly logout
from your start page, all cookie related to the site gets cleared.
That means you are producing one new anonymous user record during
each log out. That anonymous record is never used because you will
soon log in to have your customized pages back and then you will
log out again. This will result in another anonymous user account
which again becomes useless as soon as you log in.

Here’s how the whole cleanup process works:

  • Find out the users which are old enough to be discarded
  • Find out the pages user has
  • Delete all the widget instances on those pages
  • Then delete those pages
  • Remove rows from child tables related to aspnet_users like
    aspnet_profile, aspnet_UsersInRoles, aspnet_PersonalizationPerUser.
    Remove rows for users to be deleted
  • Remove the users from aspnet_users
  • Pray that you did not accidentally remove any good user

Here’s the giant DB script which does it all. I have put
enough inline comment so that you can understand what the script is
doing:

   1: -- Number of days after which we give users 'bye bye'
   2: DECLARE @Days int
   3: SET @Days = 14
   4:
   5: -- No of users to delete per run. If it's too high, database will get stuck
   6: -- for a long time. If it's too low, you will end up having more trash than
   7: -- you can cleanup
   8: DECLARE @NoOfUsersToDelete int
   9: SET @NoOfUsersToDelete = 1000
  10:
  11: -- Create temporary tables which holds the users and pages to delete
  12: -- As the user and the page is used to find out other tables, instead
  13: -- of running SELECT ID FORM ... repeatedly, it's better to have them
  14: -- in a temp table
  15: IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[PagesToDelete]') AND type in (N'U'))
  16: DROP TABLE [dbo].[PagesToDelete]
  17: IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[aspnetUsersToDelete]') AND type in (N'U'))
  18: DROP TABLE [dbo].[AspnetUsersToDelete]
  19:
  20: create table PagesToDelete (PageID int NOT NULL PRIMARY KEY)
  21: create table AspnetUsersToDelete (UserID uniqueidentifier NOT NULL PRIMARY KEY)
  22:
  23: -- Find out inactive anonymous users and store the UserID in the temporary
  24: -- table
  25: insert into AspnetUsersToDelete
  26: select top(@NoOfUsersToDelete) UserID from aspnet_Users where
  27: (isAnonymous = 1) and (LastActivityDate < (getDate()-@Days))
  28: order by UserID -- Saves SQL Server from sorting in clustered index again
  29:
  30: print 'Users to delete: ' + convert(varchar(255),@@ROWCOUNT)
  31: GO
  32:
  33: -- Get the pages of the users which will be deleted
  34: insert into PagesToDelete
  35: select ID from Page where UserID in
  36: (
  37: select UserID from AspnetUsersToDelete
  38: )
  39:
  40: print 'Pages to delete: ' + convert(varchar(255),@@ROWCOUNT)
  41: GO
  42:
  43: -- Delete all Widget instances on the pages to be deleted
  44: delete from WidgetInstance where PageID IN
  45: ( SELECT PageID FROM PagesToDelete )
  46:
  47: print 'Widget Instances deleted: ' + convert(varchar(255), @@ROWCOUNT)
  48: GO
  49:
  50: -- Delete the pages
  51: delete from Page where ID IN
  52: ( SELECT PageID FROM PagesToDelete )
  53: GO
  54:
  55: -- Delete User Setting
  56: delete from UserSetting WHERE UserID IN
  57: ( SELECT UserID FROm AspnetUsersToDelete )
  58: GO
  59:
  60: -- Delete profile of users
  61: delete from aspnet_Profile WHERE UserID IN
  62: ( SELECT UserID FROm AspnetUsersToDelete )
  63: GO
  64:
  65: -- Delete from aspnet_UsersInRoles
  66: delete from aspnet_UsersInRoles WHERE UserID IN
  67: ( SELECT UserID FROm AspnetUsersToDelete )
  68: GO
  69:
  70: -- Delete from aspnet_PersonalizationPerUser
  71: delete from aspnet_PersonalizationPerUser WHERE UserID IN
  72: ( SELECT UserID FROm AspnetUsersToDelete )
  73: GO
  74:
  75: -- Delete the users
  76: delete from aspnet_users where userID IN
  77: ( SELECT UserID FROm AspnetUsersToDelete )
  78:
  79: PRINT 'Users deleted: ' + convert(varchar(255), @@ROWCOUNT)
  80: GO
  81:
  82:
  83: drop table PagesToDelete
  84: drop table AspnetUsersToDelete
  85: GO

Now the question comes, when can I run this script? It depends on
several factors:

  • The lowest traffic period. For example, USA
    midnight time when everyone in USA is sleeping if your majority
    users are from USA
  • The period when there’s no other
    maintenance tasks running like Index Defrag or Database Bakup. If
    by any chance any other maintenance task conflicts with this
    enormous delete operation, SQL Server is dead.
  • The operation will take from 10 mins to
    hours depending on the volume of trash to cleanup. So, consider the
    duration of running this script and plan other maintenance jobs
    accordingly.
  • It’s best to run 30 mins before INDEX
    DEFRAG jobs run. After the script completes, the tables will be
    heavily fragmented. So, you need to defrag the indexes.

Before running this script, there are some preparations to
take:

  • Make sure you have turned of AutoShrink from Database Property.
    Database size will reduce after the cleanup and if SQL Server tried
    to shrink the database, there will be a big IO activity. Turn off
    auto shrink because the database will grow again.
  • Make sure the LOG file’s initial size is big enough to
    hold such enormous transactions. You can specify 1/3rd of the MDF
    size as LDF’s Initial Size. Also make sure the log file is
    not shrunk. Let it occupy HD space. It saves SQL Server from
    expanding the file and shrinking the file. Both of these require
    high Disk IO.

Once the cleanup job runs and the INDEX DEFRAG runs, the
database performance will improve significantly. The tables are now
smaller. That means the indexes are now smaller. SQL Server need
not to run through large indexes anymore. Future INDEX DEFRAGs take
shorter time because there’s not much data left to optimize.
SQL Server also takes less RAM because it has to work with much
less amount of data. Database backup size also reduces because the
MDF size does not keep increasing indefinitely. As a result, the
significant overhead of this cleanup operation is quite acceptable
when compared to all the benefits.

Note: I will be posting some stuffs from my old blog to new blog.
Please ignore if you have read them before.