Fast page loading by moving ASP.NET AJAX scripts after visible content

ASP.NET ScriptManager control has a property
LoadScriptsBeforeUI, when set to false, should
load all AJAX framework scripts after the content of the page. But
it does not effectively push down all scripts
after the content. Some framework scripts, extender scripts and
other scripts registered by Ajax Control Toolkit still load before
the page content loads. The following screen taken from www.dropthings.com shows several
script tags are still added at the beginning of

which
forces them to download first before the page content is loaded and
displayed on the page. Script tags pause rendering on several
browsers especially in IE until the scripts download and execute.
As a result, it gives user a slow loading impression as user stares
at a white screen for some time until the scripts before the
content download and execute completely. If browser could render
the html before it downloads any script, user would see the page
content immediately after visiting the site and not see a white
screen. This will give user an impression that the website is
blazingly fast (just like Google homepage) because user will
ideally see the page content, if it’s not too large, immediately
after hitting the URL.

image
Figure: Script blocks being delivered before the content

From the above screen shot you see there are some scripts from
ASP.NET AJAX framework and some scripts from Ajax Control Toolkit
that are added before the content of the page. Until these scripts
download, browser don’t see anything on the UI and thus you get a
pause in rendering giving user a slow load feeling. Each script to
external URL adds about 200ms avg network roundtrip delay outside
USA while it tries to fetch the script. So, user basically stares
at a white screen for at least 1.5 sec no matter how fast internet
connection he/she has.

These scripts are rendered at the beginning of form tag
because they are registered using
Page.ClientScript.RegisterClientScriptBlock. Inside
Page class of System.Web, there’s a method
BeginFormRender which renders the client script blocks
immediately after the form tag.

   1: internal void BeginFormRender(HtmlTextWriter writer, string formUniqueID)
   2: {
   3:     ...
   4:         this.ClientScript.RenderHiddenFields(writer);
   5:         this.RenderViewStateFields(writer);
   6:         ...
   7:         if (this.ClientSupportsJavaScript)
   8:         {
   9:             ...
  10:             if (this._fRequirePostBackScript)
  11:             {
  12:                 this.RenderPostBackScript(writer, formUniqueID);
  13:             }
  14:             if (this._fRequireWebFormsScript)
  15:             {
  16:                 this.RenderWebFormsScript(writer);
  17:             }
  18:         }
  19:         this.ClientScript.RenderClientScriptBlocks(writer);
  20: }

Figure: Decompiled code from System.Web.Page class

Here you see several script blocks including scripts registered
by calling ClientScript.RegisterClientScriptBlock are
rendered right after form tag starts.

There’s no easy work around to override the
BeginFormRender method and defer rendering of these scripts.
These rendering functions are buried inside System.Web and
none of these are overridable. So, the only solution seems to be
using a Response Filter to capture the html being written and
suppress rendering the script blocks until it’s the end of the
body tag. When the tag is about to be
rendered, we can safely assume page content has been successfully
delivered and now all suppressed script blocks can be rendered at
once.

In ASP.NET 2.0, you to create Response Filter which is an
implementation of a Stream. You can replace default
Response.Filter with your own stream and then ASP.NET will
use your filter to write the final rendered HTML. When
Response.Write is called or Page’s Render method
fires, the response is written to the output stream via the filter.
So, you can intercept every byte that’s going to be sent to the
client (browser) and modify it the way you like. Response Filters
can be used in variety ways to optimize Page output like stripping
off all white spaces or doing some formatting on the generated
content, or manipulating the characters being sent to the browser
and so on.

I have created a Response filter which captures all characters
being sent to the browser. It it finds that script blocks are being
rendered, instead of rendering it to the
Response.OutputStream, it will extract the script blocks out
of the buffer being written and render the rest of the content. It
stores all script blocks, both internal and external, in a string
buffer. When it detects tag is about to be
written to the response, it flushes all the captured script blocks
from the string buffer.

   1: public class ScriptDeferFilter : Stream
   2: {
   3:     Stream responseStream;
   4:     long position;
   5:
   6:     /// 
   7:     /// When this is true, script blocks are suppressed and captured for 
   8:     /// later rendering
   9:     /// 
  10:     bool captureScripts;
  11:
  12:     /// 
  13:     /// Holds all script blocks that are injected by the controls
  14:     /// The script blocks will be moved after the form tag renders
  15:     /// 
  16:     StringBuilder scriptBlocks;
  17:
  18:     Encoding encoding;
  19:
  20:     public ScriptDeferFilter(Stream inputStream, HttpResponse response)
  21:     {
  22:         this.encoding = response.Output.Encoding;
  23:         this.responseStream = response.Filter;
  24:
  25:         this.scriptBlocks = new StringBuilder(5000);
  26:         // When this is on, script blocks are captured and not written to output
  27:         this.captureScripts = true;
  28:     }

Here’s the beginning of the Filter class. When it initializes,
it takes the original Response Filter. Then it overrides the
Write method of the Stream so that it can capture the
buffers being written and do it’s own processing.

   1: public override void Write(byte[] buffer, int offset, int count)
   2: {
   3:     // If we are not capturing script blocks anymore, just redirect to response stream
   4:     if (!this.captureScripts)
   5:     {
   6:         this.responseStream.Write(buffer, offset, count);
   7:         return;
   8:     }
   9:
  10:     /* 
  11:      * Script and HTML can be in one of the following combinations in the specified buffer:          
  12:      * .....< script ....>..........
  13:      * < script ....>..........
  14:      * < script ....>.....
  15:      * < script ....>..... .....
  16:      * ....< script ....>..... 
  17:      * < script ....>..... 
  18:      * ..........
  19:      * .....
  20:      * < script>.....
  21:      * .... 
  22:      * ......
  23:      * Here, "...." means html content between and outside script tags
  24:     */
  25:
  26:     char[] content = this.encoding.GetChars(buffer, offset, count);
  27:
  28:     int scriptTagStart = 0;
  29:     int lastScriptTagEnd = 0;
  30:     bool scriptTagStarted = false;
  31:
  32:     for (int pos = 0; pos < content.Length; pos++)
  33:     {
  34:         // See if tag start
  35:         char c = content[pos];
  36:         if (c == '<')
  37:         {
  38:             int tagStart = pos;
  39:             // Check if it's a tag ending
  40:             if (content[pos+1] == '/')
  41:             {
  42:                 pos+=2; // go past the 
  43:
  44:                 // See if script tag is ending
  45:                 if (isScriptTag(content, pos))
  46:                 {
  47:                     /// Script tag just ended. Get the whole script
  48:                     /// and store in buffer
  49:                     pos = pos + "script>".Length;
  50:                     scriptBlocks.Append(content, scriptTagStart, pos - scriptTagStart);
  51:                     scriptBlocks.Append(Environment.NewLine);
  52:                     lastScriptTagEnd = pos;
  53:
  54:                     scriptTagStarted = false;
  55:                     continue;
  56:                 }
  57:                 else if (isBodyTag(content, pos))
  58:                 {
  59:                     /// body tag has just end. Time for rendering all the script
  60:                     /// blocks we have suppressed so far and stop capturing script blocks
  61:
  62:                     if (this.scriptBlocks.Length > 0)
  63:                     {
  64:                         // Render all pending html output till now
  65:                         this.WriteOutput(content, lastScriptTagEnd, tagStart - lastScriptTagEnd);
  66:
  67:                         // Render the script blocks
  68:                         byte[] scriptBytes = this.encoding.GetBytes(this.scriptBlocks.ToString());
  69:                         this.responseStream.Write(scriptBytes, 0, scriptBytes.Length);
  70:
  71:                         // Stop capturing for script blocks
  72:                         this.captureScripts = false;
  73:
  74:                         // Write from the body tag start to the end of the inut buffer and return
  75:                         // from the function. We are done.
  76:                         this.WriteOutput(content, tagStart, content.Length - tagStart);
  77:                         return;
  78:                     }
  79:                 }
  80:                 else
  81:                 {
  82:                     // some other tag's closing. safely skip one character as smallest
  83:                     // html tag is one character e.g. . just an optimization to save one loop
  84:                     pos++;
  85:                 }
  86:             }
  87:             else
  88:             {
  89:                 if (isScriptTag(content, pos+1))
  90:                 {
  91:                     /// Script tag started. Record the position as we will 
  92:                     /// capture the whole script tag including its content
  93:                     /// and store in an internal buffer.
  94:                     scriptTagStart = pos;
  95:
  96:                     // Write html content since last script tag closing upto this script tag 
  97:                     this.WriteOutput(content, lastScriptTagEnd, scriptTagStart - lastScriptTagEnd);
  98:
  99:                     // Skip the tag start to save some loops
 100:                     pos += "< script".Length;
 101:
 102:                     scriptTagStarted = true;
 103:                 }
 104:                 else
 105:                 {
 106:                     // some other tag started
 107:                     // safely skip 2 character because the smallest tag is one character e.g. 
 108:                     // just an optimization to eliminate one loop 
 109:                     pos++;
 110:                 }
 111:             }
 112:         }
 113:     }
 114:
 115:     // If a script tag is partially sent to buffer, then the remaining content
 116:     // is part of the last script block
 117:     if (scriptTagStarted)
 118:     {
 119:
 120:         this.scriptBlocks.Append(content, scriptTagStart, content.Length - scriptTagStart);
 121:     }
 122:     else
 123:     {
 124:         /// Render the characters since the last script tag ending
 125:         this.WriteOutput(content, lastScriptTagEnd, content.Length - lastScriptTagEnd);
 126:     }
 127: }

There are several situations to consider here. The Write
method is called several times during the Page render process
because the generated HTML can be quite big. So, it will contain
partial HTML. So, it's possible the first Write call contains a
start of a script block, but no ending script tag. The following
Write call may or may not have the ending script block. So, we need
to preserve state to make sure we don't overlook any script block.
Each Write call can have several script block in the buffer
as well. It can also have no script block and only page
content.

The idea here is to go through each character and see if there's
any starting script tag. If there is, remember the start position
of the script tag. If script end tag is found within the buffer,
then extract out the whole script block from the buffer and render
the remaining html. If there's no ending tag found but a script tag
did start within the buffer, then suppress output and capture the
remaining content within the script buffer so that next call to
Write method can grab the remaining script and extract it
out from the output.

There are two other private functions that are basically helper
functions and does not do anything interesting:

   1: private void WriteOutput(char[] content, int pos, int length)
   2: {
   3:     if (length == 0) return;
   4:
   5:     byte[] buffer = this.encoding.GetBytes(content, pos, length);
   6:     this.responseStream.Write(buffer, 0, buffer.Length);
   7: }
   8:
   9: private bool isScriptTag(char[] content, int pos)
  10: {
  11:     if (pos + 5 < content.Length)
  12:         return ((content[pos] == 's' || content[pos] == 'S')
  13:             && (content[pos + 1] == 'c' || content[pos + 1] == 'C')
  14:             && (content[pos + 2] == 'r' || content[pos + 2] == 'R')
  15:             && (content[pos + 3] == 'i' || content[pos + 3] == 'I')
  16:             && (content[pos + 4] == 'p' || content[pos + 4] == 'P')
  17:             && (content[pos + 5] == 't' || content[pos + 5] == 'T'));
  18:     else
  19:         return false;
  20:
  21: }
  22:
  23: private bool isBodyTag(char[] content, int pos)
  24: {
  25:     if (pos + 3 < content.Length)
  26:         return ((content[pos] == 'b' || content[pos] == 'B')
  27:             && (content[pos + 1] == 'o' || content[pos + 1] == 'O')
  28:             && (content[pos + 2] == 'd' || content[pos + 2] == 'D')
  29:             && (content[pos + 3] == 'y' || content[pos + 3] == 'Y'));
  30:     else
  31:         return false;
  32: }

The isScriptTag and isBodyTag functions may look
weird. The reason for such weird code is pure performance. Instead
of doing fancy checks like taking a part of the array out and doing
string comparison, this is the fastest way of doing the check. Best
thing about .NET IL is that it's optimized, if any of the condition
in the && pairs fail, it won't even execute the rest. So,
this is as best as it can get to check for certain
characters.

There are some corner cases that are not handled here. For
example, what if the buffer contains a partial script tag
declaration. For example, ".... and that's it. The
remaining characters did not finish in the buffer instead next
buffer is sent with the remaining characters like "ipt src="..."
>...... In such case, the script tag won't be
taken out. One way to handle this would be to make sure you always
have enough characters left in the buffer to do a complete tag name
check. If not found, store the half finished buffer somewhere and
on next call to Write, combine it with the new buffer sent and do
the processing.

In order to install the Filter, you need to hook it in in the
Global.asax BeginRequest or some other event that's fired before
the Response is generated.

   1: protected void Application_BeginRequest(object sender, EventArgs e)
   2: {
   3:     if (Request.HttpMethod == "GET")
   4:     {
   5:         if (Request.AppRelativeCurrentExecutionFilePath.EndsWith(".aspx"))
   6:         {
   7:             Response.Filter = new ScriptDeferFilter(Response);
   8:         }
   9:     }
  10: }

Here I am hooking the Filter only for GET calls to
.aspx pages. You can hook it to POST calls as well. But
asynchronous postbacks are regular POST and I do not want to
do any change in the generated JSON or html fragment. Another way
is to hook the filter only when ContentType is
text/html.

When this filter is installed, www.dropthings.com defers all
script loading after the

tag completes.

image
Figure: Script tags are moved after the

tag when the
filter is used

You can grab the Filter class from the
App_CodeScriptDeferFilter.cs of the Dropthings project. Go
to CodePlex site
and download the latest code for the latest filter.

10 ASP.NET Performance and Scalability Secrets

ASP.NET 2.0 has many secrets, when revealed, can give you big
performance and scalability boost. For instance, there are secret
bottlenecks in Membership and Profile provider which can be solved
easily to make authentication and authorization faster.
Furthermore, ASP.NET Http pipeline can be tweaked to avoid
executing unnecessary code that gets hit on each and every request.
Not only that, ASP.NET Worker Process can be pushed to its limit to
squeeze out every drop of performance out of it. Page fragment
output caching on the browser (not on the server) can save
significant amount of download time on repeated visits. On demand
UI loading can give your site a fast and smooth feeling. Finally,
Content Delivery Networks (CDN) and proper use of HTTP Cache
headers can make your website screaming fast when implemented
properly. In this article, you will learn these techniques that can
give your ASP.NET application a big performance and scalability
boost and prepare it to perform well under 10 times to 100 times
more traffic.

http://www.codeproject.com/KB/aspnet/10ASPNetPerformance.aspx

In this article I have shown the following techniques:

  • ASP.NET Pipeline optimization
  • ASP.NET Process configuration optimization
  • Things you must do for ASP.NET before going live
  • Content Delivery Network
  • Caching AJAX calls on browser
  • Making best use of Browser Cache
  • On demand progressive UI loading for fast smooth
    experience
  • Optimize ASP.NET 2.0 Profile provider
  • How to query ASP.NET 2.0 Membership tables without bringing
    down the site
  • Prevent Denial of Service (DOS) attack

The above techniques can be implemented on any ASP.NET website
especially those who use ASP.NET 2.0’s Membership and Profile
provider.

You can learn a lot more about performance and scalability
improvement of ASP.NET and ASP.NET AJAX websites from my book –
Building a
Web 2.0 portal using ASP.NET 3.5
.

Making best use of cache for high performance website

Use URLs consistently

Browsers cache content based on the URL. When URL changes,
browser fetches a new version from origin server. URL can be
changed by changing the query string parameters. For example,
“/default.aspx” is cached on the browser. If you
request “/default.aspx?123” it will fetch new content
from server. Response from the new URL can also be cached in
browser if you return proper caching headers. In that case,
changing the query parameter to something else like
“/default.aspx?456” will return new content from
server. So, you need to make sure you use URL consistently
everywhere when you want to get cached response. From homepage, if
you have requested a file with URL “/welcome.gif”, make
sure from another page you request the same file using the same
URL. One common mistake is to sometimes omit the “www”
subdomain from the url. www.pageflakes.com/default.aspx
is not same as pageflakes.com/default.aspx.
Both will be cached separately.

Cache static content for longer period

Static files can be cached for longer period like one month. If
you are thinking that you should cache for couple of days so that
when you change the file, users will pick it up sooner,
you’re mistaken. If you update a file which was cached by
Expires header, new users will immediately get the new file while
old users will see the old content until it expires on their
browser. So, as long as you are using Expires header to cache
static files, you should use as high value as possible.

For example, if you have set expires header to cache a file for
three days, one user will get the file today and store it in cache
for next three days. Another user will get the file tomorrow and
cache it for three days after tomorrow. If you change the file on
the day after tomorrow, the first user will see it on fourth day
and the second user will see it on fifth day. So, different users
will see different versions of the file. As a result, it does not
help setting a lower value assuming all users will pick up the
latest file soon. You will have to change the url of the file in
order to ensure everyone gets the exact same file immediately.

You can setup Expires header from static files from IIS Manager.
You’ll learn how to do it in later section.

Use cache friendly folder structure

Store cached content under a common folder. For example, store
all images of your site under the “/static” folder
instead of storing images separately under different subfolders.
This will help you use consistent URL throughout the site because
from anywhere you can use
“/static/images/somefile.gif”. Later on, we will learn
it’s easier to move to a Content Delivery Network when you
have static cacheable files under a common root folder.

Reuse common graphics files

Sometimes we put common graphics files under several virtual
directories so that we can write smaller paths. For example, say
you have indicator.gif in root folder, some subfolders and under
CSS folder. You did it because you need not worry about paths from
different places and you could just use the file name as relative
URL. This does not help in caching. Each copy of the file is cached
in browser separately. So, you should collect all graphics files in
the whole solution and put them under the same root
“static” folder after eliminating duplicates and use
the same URL from all the pages and CSS files.

Change file name when you want to expire
cache

When you want a static file to be changed, don’t just
update the file because it’s already cached in user’s
browser. You need to change the file name and update all references
everywhere so that browser downloads the new file. You can also
store the file names in database or configuration files and use
data binding to generate the URL dynamically. This way you can
change the URL from one place and have the whole site receive the
change immediately.

Use a version number while accessing static files

If you do not want to clutter your static folder with multiple
copies of the same file, you can use query string to differentiate
versions of same file. For example, a GIF can be accessed with a
dummy query string like
“/static/images/indicator.gif?v=1”. When you change the
indicator.gif, you can overwrite the same file and then update all
references to the file to
“/static/images/indicator.gif?v=2”. This way you can
keep changing the same file again and again and just update the
references to access the graphics using the new version number.

Store cacheable files in a different domain

It’s always a good idea to put static contents in a
different domain. First of all, browser can open another two
concurrent connections to download the static files. Another
benefit is that you don’t need to send the cookies to the
static files. When you put the static files on the same domain as
your web application, browser sends all the ASP.NET cookies and all
other cookies that your web application is producing. This makes
the request headers be unnecessarily large and waste bandwidth. You
don’t need to send these cookies to access the static files.
So, if you put the static files in a different domain, those
cookies will not be sent. For example, put your static files in
www.staticcontent.com
domain while your website is running on www.dropthings.com. The other
domain does not need to be a completely different web site. It can
just be an alias and share the same web application path.

SSL is not cached, so minimize SSL use

Any content that is served over SSL is not cached. So, you need
to put static content outside SSL. Moreover, you should try
limiting SSL to only secure pages like Login page or Payment page.
Rest of the site should be outside SSL over regular HTTP. SSL
encrypts request and response and thus puts extra load on server.
Encrypted content is also larger than the original content and thus
takes more bandwidth.

HTTP POST requests are never cached

Cache only happens for HTTP GET requests. HTTP POST requests are
never cached. So, any kind of AJAX call you want to make cacheable,
it needs to be HTTP GET enabled.

Generate Content-Length response header

When you are dynamically serving content via web service calls
or HTTP handlers, make sure you emit Content-Length header.
Browsers have several optimizations for downloading contents faster
when it knows how many bytes to download from the response by
looking at the Content-Length header. Browsers can use
persisted connections more effectively when this header is present.
This saves browser from opening a new connection for each request.
When there’s no Content-Length header, browser doesn’t
know how many bytes it’s going to receive from the server and
thus keeps the connection open as long as it gets bytes delivered
from the server until the connection closes. So, you miss the
benefit of Persisted Connections that can greatly reduce download
time of several small files like css, javascripts, and images.

How to configure static content caching in IIS

In IIS Manager, Web site properties dialog box has “HTTP
Headers” tab where you can define Expires header for all
requests that IIS handles. There you can define whether to expire
content immediately or expire after certain number of days or on a
specific date. The second option (Expire after) uses sliding
expiration, not absolute expiration. This is very useful because it
works per request. Whenever someone requests a static file, IIS
will calculate the expiration date based on the number of
days/months from the Expire after.


clip_image001

For dynamic pages that are served by ASP.NET, a handler can
modify the expires header and override IIS default setting.

A significant part of sql server process memory has been paged out. This may result in performance degradation

If you are using SQL Sever Server standard edition 64 bit on a
Windows 2003 64bit, you will frequently encounter this problem
where SQL Server says:

A significant part of sql server process memory has been paged
out. This may result in performance degradation. Duration 0
seconds. Working set (KB) 25432, committed (KB) 11296912, memory
utilization 0%

The number in working set and duration will vary. What happens
here is SQL Server is forced to release memory to operating system
because some other application or OS itself needs to allocate
RAM.

We went through many support articles like:

  • 918483:
    How to reduce paging of buffer pool memory in the 64-bit version of
    SQL Server 2005
  • 905865:
    The sizes of the working sets of all the processes in a console
    session may be trimmed when you use Terminal Services to log on to
    or log off from a computer that is running Windows Server 2003
  • 920739:
    You may experience a decrease in overall system performance when
    you are copying files that are larger than approximately 500 MB in
    Windows Server 2003 Service Pack 1

But nothing solved the problem. We still have the page out
problem happening every day.

The server has 16 GB RAM where 12 GB is maximum limit allocated
to SQL Server. 4 GB is left to OS and and other application. We
have also turned off antivirus and any large backup job. 12 GB RAM
should be plenty because there’s no other app running on the
dedicated SQL Server box. But the page out still happens. When this
happens, SQL Server becomes very slow. Queries timeout, website
throws error, transactions abort. Sometimes this problems goes on
for 30 to 40 minutes and website becomes slow/unresponsive during
that time.

I have found what causes SQL Server to page out. File System
cache somehow gets really high and forces SQL Server to trim
down.

clip_image002

You see the System cache resident bytes are very high. During
this time SQL Server gets much less RAM than it needs. Queries
timeout at very high rate like 15 per sec. Moreover, there’s high
SQL Lock Timeout/sec (around 15/sec not captured in screen
shot).

clip_image004

SQL Server max memory is configured 12 GB. But here it shows
it’s getting less than 8 GB.

While the file system cache is really high, there’s no
process that’s taking significant RAM.

clip_image006

After I used SysInternal’s
CacheSet
to reset file system cache and set around 500 MB as
max limit, memory started to free up.

clip_image008

SQL Server started to see more RAM free:

clip_image010

Then I hit the “Clear” button to clear file system
cache and it came down dramatically.

clip_image012

Paging stopped. System cache was around 175 MB only. SQL Server
lock timeout came back to zero. Everything went back to normal.

So, I believe there’s either some faulty driver or the OS itself
is leaking file system cache in 64bit environment.

What we have done is, we have a dedicated person who goes to
production database servers every hour, runs the CacheSet program
and clicks “Clear” button. This clears the file system cache and
prevents it from growing too high.

There are lots of articles written about this problem. However,
the most informative one I have found is from the SQL Server PSS
team:


http://blogs.msdn.com/psssql/archive/2007/05/31/the-sql-server-working-set-message.aspx

UPDATE – THE FINAL SOLUTION!

The final solution is to run this program on Windows
Startup:

SetSystemFileCacheSize 128 256

This sets the lower and higher limit for the System Cache. You
need to run this on every windows startup because a restart will
undo the cache setting to unlimited.

You can run the program without any parameter to see what is the
current setting.

Download the program from this page:

http://www.uwe-sieber.de/ntcacheset_e.html

Go to the end and you will get the link to the
SetSystemFileCacheSize.zip

Cleanup inactive anonymous users from ASP.NET Membership Tables

ASP.NET 2.0 Websites that allow anonymous visit and anonymous
user profile have a unique challenge to cleanup unused data which
is generated by anonymous users who never come back. Every first
visit is creating one anonymous user, page setup, and other user
specific content. If the user never comes back, it still remains in
the database permanently. It is possible user might come back
within a day, or a week or a month. But there’s no guaranty
if user will ever come back or not. Generally sticky users are max
30% of the total users who come to most websites. So, you end up
with 70% unused data which are never needed. All these requires
cleanup, otherwise the database keeps growing uncontrollably and
gets slower and slower. This cleanup operation is humongous for
busy websites. Think about deleting millions of rows from several
tables, one after another while maintaining foreign key
constraints. Also the cleanup operation needs to run while the site
is running, without hampering site’s overall performance. The whole
operation results in heavily fragmented index and space in the MDF
file. The log file also becomes enormous in order to keep track of
the transactions. Hard drives get really hot and start sweating
furiously. While the CPU keeps smiling having nothing to do with
it, it’s really painful to watch SQL Server go through this
every day. Unless you clean up the database and maintain its size
under control; you can’t keep up with SQL Server’s RAM and
Disk IO requirement.

When a user visits the site, Asp.net Membership Provider updates
the LastActivityDate of aspnet_users table. From this field, I can
find out how long the user has been idle. The IsAnonymous bit field
tells me whether the user account is anonymous or registered. If it
is registered, no need to worry. But if it is anonymous and more
than 30 days old, I can be sure that the user will never come back
because the cookie has already expired. If you repeatedly logout
from your start page, all cookie related to the site gets cleared.
That means you are producing one new anonymous user record during
each log out. That anonymous record is never used because you will
soon log in to have your customized pages back and then you will
log out again. This will result in another anonymous user account
which again becomes useless as soon as you log in.

Here’s how the whole cleanup process works:

  • Find out the users which are old enough to be discarded
  • Find out the pages user has
  • Delete all the widget instances on those pages
  • Then delete those pages
  • Remove rows from child tables related to aspnet_users like
    aspnet_profile, aspnet_UsersInRoles, aspnet_PersonalizationPerUser.
    Remove rows for users to be deleted
  • Remove the users from aspnet_users
  • Pray that you did not accidentally remove any good user

Here’s the giant DB script which does it all. I have put
enough inline comment so that you can understand what the script is
doing:

   1: -- Number of days after which we give users 'bye bye'
   2: DECLARE @Days int
   3: SET @Days = 14
   4:
   5: -- No of users to delete per run. If it's too high, database will get stuck
   6: -- for a long time. If it's too low, you will end up having more trash than
   7: -- you can cleanup
   8: DECLARE @NoOfUsersToDelete int
   9: SET @NoOfUsersToDelete = 1000
  10:
  11: -- Create temporary tables which holds the users and pages to delete
  12: -- As the user and the page is used to find out other tables, instead
  13: -- of running SELECT ID FORM ... repeatedly, it's better to have them
  14: -- in a temp table
  15: IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[PagesToDelete]') AND type in (N'U'))
  16: DROP TABLE [dbo].[PagesToDelete]
  17: IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[aspnetUsersToDelete]') AND type in (N'U'))
  18: DROP TABLE [dbo].[AspnetUsersToDelete]
  19:
  20: create table PagesToDelete (PageID int NOT NULL PRIMARY KEY)
  21: create table AspnetUsersToDelete (UserID uniqueidentifier NOT NULL PRIMARY KEY)
  22:
  23: -- Find out inactive anonymous users and store the UserID in the temporary
  24: -- table
  25: insert into AspnetUsersToDelete
  26: select top(@NoOfUsersToDelete) UserID from aspnet_Users where
  27: (isAnonymous = 1) and (LastActivityDate < (getDate()-@Days))
  28: order by UserID -- Saves SQL Server from sorting in clustered index again
  29:
  30: print 'Users to delete: ' + convert(varchar(255),@@ROWCOUNT)
  31: GO
  32:
  33: -- Get the pages of the users which will be deleted
  34: insert into PagesToDelete
  35: select ID from Page where UserID in
  36: (
  37: select UserID from AspnetUsersToDelete
  38: )
  39:
  40: print 'Pages to delete: ' + convert(varchar(255),@@ROWCOUNT)
  41: GO
  42:
  43: -- Delete all Widget instances on the pages to be deleted
  44: delete from WidgetInstance where PageID IN
  45: ( SELECT PageID FROM PagesToDelete )
  46:
  47: print 'Widget Instances deleted: ' + convert(varchar(255), @@ROWCOUNT)
  48: GO
  49:
  50: -- Delete the pages
  51: delete from Page where ID IN
  52: ( SELECT PageID FROM PagesToDelete )
  53: GO
  54:
  55: -- Delete User Setting
  56: delete from UserSetting WHERE UserID IN
  57: ( SELECT UserID FROm AspnetUsersToDelete )
  58: GO
  59:
  60: -- Delete profile of users
  61: delete from aspnet_Profile WHERE UserID IN
  62: ( SELECT UserID FROm AspnetUsersToDelete )
  63: GO
  64:
  65: -- Delete from aspnet_UsersInRoles
  66: delete from aspnet_UsersInRoles WHERE UserID IN
  67: ( SELECT UserID FROm AspnetUsersToDelete )
  68: GO
  69:
  70: -- Delete from aspnet_PersonalizationPerUser
  71: delete from aspnet_PersonalizationPerUser WHERE UserID IN
  72: ( SELECT UserID FROm AspnetUsersToDelete )
  73: GO
  74:
  75: -- Delete the users
  76: delete from aspnet_users where userID IN
  77: ( SELECT UserID FROm AspnetUsersToDelete )
  78:
  79: PRINT 'Users deleted: ' + convert(varchar(255), @@ROWCOUNT)
  80: GO
  81:
  82:
  83: drop table PagesToDelete
  84: drop table AspnetUsersToDelete
  85: GO

Now the question comes, when can I run this script? It depends on
several factors:

  • The lowest traffic period. For example, USA
    midnight time when everyone in USA is sleeping if your majority
    users are from USA
  • The period when there’s no other
    maintenance tasks running like Index Defrag or Database Bakup. If
    by any chance any other maintenance task conflicts with this
    enormous delete operation, SQL Server is dead.
  • The operation will take from 10 mins to
    hours depending on the volume of trash to cleanup. So, consider the
    duration of running this script and plan other maintenance jobs
    accordingly.
  • It’s best to run 30 mins before INDEX
    DEFRAG jobs run. After the script completes, the tables will be
    heavily fragmented. So, you need to defrag the indexes.

Before running this script, there are some preparations to
take:

  • Make sure you have turned of AutoShrink from Database Property.
    Database size will reduce after the cleanup and if SQL Server tried
    to shrink the database, there will be a big IO activity. Turn off
    auto shrink because the database will grow again.
  • Make sure the LOG file’s initial size is big enough to
    hold such enormous transactions. You can specify 1/3rd of the MDF
    size as LDF’s Initial Size. Also make sure the log file is
    not shrunk. Let it occupy HD space. It saves SQL Server from
    expanding the file and shrinking the file. Both of these require
    high Disk IO.

Once the cleanup job runs and the INDEX DEFRAG runs, the
database performance will improve significantly. The tables are now
smaller. That means the indexes are now smaller. SQL Server need
not to run through large indexes anymore. Future INDEX DEFRAGs take
shorter time because there’s not much data left to optimize.
SQL Server also takes less RAM because it has to work with much
less amount of data. Database backup size also reduces because the
MDF size does not keep increasing indefinitely. As a result, the
significant overhead of this cleanup operation is quite acceptable
when compared to all the benefits.

Note: I will be posting some stuffs from my old blog to new blog.
Please ignore if you have read them before.