Pageflakes – the AJAX king

Here’s what a Digg article says:


Pageflakes and YouOS – not Google – are the Ajax
Kings

InformationWeek took a close look at Google and about 20
startups to find out who’s the Ajax King. In the “webtop” or
“personalized startpage” category – arguably the most important of
all categories – the winners are Pageflakes and YouOs, ahead of
Google and Microsoft.

I am very happy to see we again got recognized as the best
Webtop ahead of Microsoft and Google. The first one was on Web 2.0 Awards. Pleeeeeeeeeease give
us a Digg and don’t miss the article on InformationWeek.

Forms authentication failed for the request. Reason: The ticket supplied was invalid. (Solution)

When you turn on web garden (multiple process per application
pool) or go into multi server load balance deployment where servers
are serving the same website, you will have Forms Authentication
problem. Users will get automatically logged out or see the Yellow
screen of death (ASP.NET error page) frequently. This happens
because ASP.NET encrypts the login information in cookie. But the
encryption key is unique for each machine and for each process. So,
if one user hits Server #1 and gets an ecnrypted key, and then the
next hit goes to Server #2, it will fail to decrypt the cookie and
log user out or throw user the asp.net general error message.

In order to prevent this on your production server, you need to
remember this before you go live:

The reasons for a forms auth ticket failing are normally that
either the validation key or the decryption key are not in sync
across all servers in a web farm. Another potential reason can be
if both ASP.NET 1.1 and ASP.NET 2.0 applications are issuing forms
auth tickets with the same domain and path.

For the first case, setting the validationKey and decryptionKey
attributes explicitly on on each web server
will solve the problem.

For the second case, setting the validationKey and decryptionKey
attributes explicitly in for *both* the
ASP.NET 1.1 and ASP.NET 2.0 applications is necessary. Additionally
on the ASP.NET 2.0 apps, the “decryption” attribute in
should be set to “3DES”.

This is what I learned from Stephan Schackow (Microsoft Atlas
team). Here’s how the machine.config should look like:

< system .web > < processModel autoConfig =”true” /> < machineKey validationKey =”…” decryptionKey =”…” validation =”SHA1″ />

is the node that you need to introduce inside
the node if it does not exist.

Catch No 1:

How do you generate the machine key? You need to use a utility
to produce the key for your PC. I have made a .exe which can
generate such keys for you. Here’s how you run it:

SecurityKey.exe 24 64

Download
here

It takes two parameters and these are the exact values you have
to provide.

For super cautious IT guys like me, don’t worry, it’s a .NET 2.0
binary, no virus. Test the strength of your anti-virus software on
this .exe if you like. If it can’t find any virus then either I’m
smarter than you and have hidden a virus successfully or there’s no
virus at all and I am not that smart as I sound like.

Catch No 2:

You have put the machine keys in both machine. Restarted IIS.
Even restarted your server. But you still see lots of Event Log
error entries which shows users are still getting the dreaded
“Forms authentication failed for the request. Reason: The ticket
supplied was invalid”. So, what did you do wrong? You call
Microsoft support. Go to Forums and make post. Everyone says what
you did is correct.

Here’s what you need to do: wait. Wait for 2 or 3 days until all
those users come back to your website at least once. Those users
will have cookie encrypted with previously assigned encryption key
pair. Naturally, it will fail to decrypt with the new key pair you
have just specified in machine.config. So, until all those users
get a new key, you will keep on having the error message. Don’t be
alarmed if you see this even after one week or month. This just
means some user visited you after a long time and you are not doing
much good in attracting users to your site. So, if you see such
event log entries after a week or two, call you marketing team and
ask what kind of marketing they are doing.

It would be really good if there was something like “Checklist
for Going Live with ASP.NET” which stated all these issues.

How to change user name in ASP.NET 2.0 Membership Provider

Profile.UserName is a readonly field. So, how do you change a
user’s name? Many use email address as user name (like us,
Pageflakes) and we want to allow users to change their email
address which in turn changes their user name. So, how do you do
it?

It looks like there’s no easy way to do it. You have to do the
following:

  1. Create a new user using the new email address
  2. Get the password of the old account and set it to the new
    account. If you can’t get the old password via Membership provider,
    then ask user.
  3. Create a new profile for the new user account
  4. Copy all the properties from the old profile to the new profile
    object.
  5. Log out user from old account
  6. Auto sign in to the new account so that user does not notice
    what an incredible thing just happened.

Here’s how I do it:

if (Profile.UserName !=
newUserName)
{

// Changing email address of user.
Delete current user account and create

// a new one using the new email address
but the same password

if (
null !=
Membership.GetUser(newUserName))

throw
new
ApplicationException(
"There's another user with the same
email. Please enter a different email.");


MembershipUser newUser =
Membership.CreateUser(newUserName,
currentPassword);


// Create profile for the new user and
copy all values from current profile

// to new profile

ProfileCommon newProfile =
ProfileCommon.Create(newUserName,
true)
as
ProfileCommon;

    newProfile.IsInvited = Profile.IsInvited;
    newProfile.IsRealUser = Profile.IsRealUser;
    newProfile.Name = newUserName;

    newProfile.Save();


if (
Membership.ValidateUser(newUserName,
currentPassword))
    {

FormsAuthentication.SignOut();
        Session.Abandon();


// Delete the old profile and
user

ProfileManager.DeleteProfile(Profile.UserName);

Membership.DeleteUser(user.UserName);


//FormsAuthentication.SetAuthCookie(newUserName,
true);

FormsAuthentication.RedirectFromLoginPage(newUserName,

true);
    }
}

How to use ASP.NET 2.0 Profile object from web service code

You have started to use ASP.NET 2.0 Profile Provider and you are
very happy with it. All your .aspx pages are full of
“Profile.Something”. You also introduced a lot of new properties on
the profile object via web.config. Then you added a new web
service. There you want to access the Profile object. You realize,
you are doomed.

You cannot access the Profile object from Web Service.

At runtime, ASP.NET generates a class looking at the
specification provided in web.config, which becomes the “Profile”
object in .aspx pages. But this object is not available in Web
service (.asmx.cs) and you cannot see the custom properties you
have added in the profile object. Although
HttpContext.Current.Profile will give you reference to Profile
object, but it’s type is ProfileBase which does not show your
custom properties. Makes sense, because the class is generated at
runtime. But if it can be made available in .aspx.cs, then it
should also be available in .asmx.cs.

In order to overcome this problem, you have to hand code that
profile class in your App_Code folder and then configure web.config
so that it does not auto generate the class instead use your one.
Here’s what you do in web.config:

I have added a new attribute UserProfile. Now go to App_Code and
make a UserProfile class like this:

public class UserProfile :
System.Web.Profile.ProfileBase
{
[SettingsAllowAnonymousAttribute(true)]
public virtual int Timezone
{
get
{
return ((int)(this.GetPropertyValue(“Timezone”)));
}
set
{
this.SetPropertyValue(“Timezone”, value);
}

}

Declare all the properties like this. Don’t forget to add the
[SettingsAllowAnonymousAttribute(true)] on the properties which you
want to be made available to anonymous users.

At the end of the class, add this method:

public virtual ProfileCommon
GetProfile(string username)
{
return ((ProfileCommon)(ProfileBase.Create(username)));

}

Here’s an easy way to avoid hand coding this class and
generating it automatically. Before you make the changes in
web.config and create the UserProfile class, run your web project
as it was before. But before running it, turn off SQL Server. This
will make Asp.net execution to break on first call to some Profile
object’s property. For ex, if you have a custom property TimeZone
in the Profile object, execution will break on this line:

public virtual int
Timezone
{
get
{

return
((int)(this.GetPropertyValue(“Timezone”)));

It will fail to load the profile object values from database
because database is down. If you scroll up, you will see this is
the class that ASP.NET generates at run time. You will see all the
properties are declared on this class already. So, you can just
copy & paste it in your own class easily!

But after copying, you will realize there’s no
[SettingsAllowAnonymousAttribute(true)] attribute. So, you will
have to put them manually. Also after making your own custom class,
you will have to remove all the custom properties declared inside node in the web.config.

Now that you have your own Profile class, inside web service,
you can cast (HttpContext.Current.Profile as UserProfile) and then
you can use all the custom properties.

If you don’t want to enjoy strongly typed coding on web service,
then you can always use the old way of accessing Profile properties
via: Profile.GetPropertyValue(“TimeZone”). But it’s no fun.

Optimize ASP.NET 2.0 Profile Provider before you go live

Do you know there are two important stored procedures in ASP.NET
2.0 Profile Provider that you can optimize significantly? If you
use them without doing the necessary optimization, your servers
will sink taking your business down with you during heavy load.
Here’re a story:

During March, Pageflakes
was shown on MIX 2006. We were having a glamorous time back then.
We were right on Showcase
of Atlas web site
. The first company, not the second or third.
Number of visits per day were rising sky high. One day we noticed,
the server was no more. We restarted the server, brought it back,
again it died within an hour. After doing a lot of postmortem
analysis on the remaining of server’s body parts, we found that it
was having 100% CPU and super high IO usage. The hard drives were
over heated and turned themselves off in order to save themselves.
This was quite surprising to us because we were very intelligent
back then and we profiled every single web service function. So, we
went through hundreds of megabytes of logs hoping to find which
webservice function was taking the time. We suspected one. It was
the first function that loads a user’s page setup. We broke it up
into smaller parts in order to see which part is taking most of the
time.


    
private


GetPageflake(
string

source, 
string

pageID, 
string

userUniqueName
)

    {

if( Profile.IsAnonymous ) {
using ( new TimedLog (Profile.UserName, "GetPageflake" ))

      {

You see, the entire function body is timed. If you want to learn
how this timing works, wait until I make a post on it. We also
timed smaller parts which we suspected were taking the most
resource. But we could find not a single place in our code which
was taking any significant time. Our codebase is always super
optimized (after all, you know who is reviewing it, me).

Meanwhile, users were shouting, management was screaming,
support staff was complaining on the phone. Developers were
fainting one by one. Remaining ones were furiously sweating and
blood vessels on their forehead were coming out etc. Nothing
special, just a typical situation we have couple of times every
month.

Now you must be shouting, “You could have used SQL Profiler, you
idiot!” We were using SQL Server workgroup edition. It does
not have SQL Profiler. So, we had to hack our way through to get it
running on a server somehow. Don’t ask how. After running
the SQL Profiler, boy, were we surprised! The name of the
honorable SP which were giving us so much pleasure was none
but the famous and the most pretigious one, which we all know by
name due to its incredible contribution to the development
of human civilization. Please welcome, the
great stored procedure “dbo.aspnet_Profile_GetProfiles”!

We used (and still use) Profile provider extensively.

Here’s the SP:


CREATE PROCEDURE
        
[dbo].[aspnet_Profile_GetProfiles]

    @ApplicationName
                
nvarchar
(256),

    @ProfileAuthOptions
        
int
,

    @PageIndex
                      

                int
,

    @PageSize
                       

                int
,

    @UserNameToMatch
                
nvarchar
(256)
= 
NULL
,

    @InactiveSinceDate
              
datetime



                    = 
NULL

AS

BEGIN

    
DECLARE

@ApplicationId


                        uniqueidentifier

    
SELECT

@ApplicationId
= 
NULL

    
SELECT

@ApplicationId
= ApplicationId
                
FROM

aspnet_Applications


WHERE LOWER (@ApplicationName) = LoweredApplicationName

    
IF

(@ApplicationId

IS
NULL
)


        
RETURN




        
--
Set the page bounds

    
DECLARE

@PageLowerBound

int

    
DECLARE

@PageUpperBound

int

    
DECLARE

@TotalRecords
  
int

    
SET

@PageLowerBound
= @PageSize
                    * @PageIndex

    
SET

@PageUpperBound
= @PageSize
                    - 1 + @PageLowerBound


    
--
Create a temp table TO store the select results

    
CREATE
TABLE 
#PageIndexForUsers

    (


            IndexId 
int
IDENTITY 
(0,
1) 
NOT
NULL
,


            UserId 
uniqueidentifier

    )


    
--
Insert into our temp table

    
INSERT
INTO 
#PageIndexForUsers
                    (UserId)


        
SELECT

u.UserId


        
FROM


  dbo.aspnet_Users
                u, dbo.aspnet_Profile p


        
WHERE


 ApplicationId = @ApplicationId


                
AND

u.UserId
                    = p.UserId


                
AND

(@InactiveSinceDate
                
IS
NULL OR 
LastActivityDate
                        <= @InactiveSinceDate)


                
AND

(

                    (@ProfileAuthOptions = 2)


                       

                OR 
(@ProfileAuthOptions
= 0 

                        AND 
IsAnonymous
= 1)


                       

                OR 
(@ProfileAuthOptions
= 1 

                        AND 
IsAnonymous
= 0)


                    )


                
AND

(@UserNameToMatch
                
IS
NULL OR 
LoweredUserName
                    
LIKE
LOWER
(@UserNameToMatch))


        
ORDER
BY 
UserName


    
SELECT

u.UserName,
u.IsAnonymous, u.LastActivityDate,
                    p.LastUpdatedDate,


                
DATALENGTH
(p.PropertyNames)
                    + 
DATALENGTH
(p.PropertyValuesString)

+
DATALENGTH (p.PropertyValuesBinary)

    
FROM


  dbo.aspnet_Users
                    u, dbo.aspnet_Profile p, #PageIndexForUsers
i

    
WHERE


 u.UserId = p.UserId 
AND

p.UserId
= i.UserId 
AND


i.IndexId >= @PageLowerBound AND i.IndexId <= @PageUpperBound



    
DROP
TABLE 
#PageIndexForUsers


    
END


END 

First it looks up for ApplicationID.


    
DECLARE

@ApplicationId


                    uniqueidentifier

    
SELECT

@ApplicationId
= 
NULL

    
SELECT

@ApplicationId
= ApplicationId
            
FROM

aspnet_Applications


WHERE LOWER (@ApplicationName) = LoweredApplicationName

    
IF

(@ApplicationId

IS
NULL
)


    
RETURN

Then it creates a temporary table (why not table data type?) in
order to store profiles of users.


— Create a temp table TO store the select results


    
CREATE
TABLE 
#PageIndexForUsers

    (

        IndexId 

        int IDENTITY 
(0,
1) 
NOT
NULL
,

        UserId 

        uniqueidentifier

    )


    
--
Insert into
        our temp table

    
INSERT
INTO 
#PageIndexForUsers
(UserId)

If it gets called very frequently, there will be too high IO due
to the temporary table creation. It also runs through 2 very big
tables – aspnet_Users and aspnet_Profile. The SP is written in such
a way that if one user has multiple profiles, it will return all
profiles of the user. But normally we store one profile per user.
So, there’s no need for creating a temporary table. Moreover,
there’s no need for doing
LIKE LOWER

(@UserNameToMatch). It is always called with a full user name which
we can match directly using the equal operator.


So, we opened up the stored proc and did a open heart bypass
surgery like this:


IF 
@UserNameToMatch

IS
NOT NULL 


        
BEGIN



        
SELECT

u.UserName,
u.IsAnonymous, u.LastActivityDate, p.LastUpdatedDate,


                    
DATALENGTH
(p.PropertyNames)
                    + 
DATALENGTH
(p.PropertyValuesString)
+ 
DATALENGTH
(p.PropertyValuesBinary)


        
FROM


  dbo.aspnet_Users u


        
INNER
JOIN 
dbo.aspnet_Profile
p 
ON

u.UserId
= p.UserId
                


        
WHERE


            u.LoweredUserName = 
LOWER
(@UserNameToMatch)



        
SELECT

@@ROWCOUNT



        
END


        
ELSE

    
BEGIN

-- Do the original bad
things

It ran fine locally. Now it was time to run it on the server.
This is an important SP which is used by the ASP.NET 2.0 Profile
Provider, heart of ASP.NET framework. If we do something wrong
here, we might not be able to see the problem immediately, but may
be after one month we will realize users profile is mixed up and
there’s no way to get it back. So, it was a pretty hard decision to
run this on a live production server directly without doing enough
test. We did not have time to do enough test anyway. We are already
down. So, we all gathered, said our prayer and hit the “Execute”
button on SQL Server Management Studio.

The SP ran fine. On the server we noticed from 100% CPU usage it
came down to 30% CPU usage. IO usage also came down to 40%.

We went live again.

Here’s another SP which gets called on every page load and
webservice call on our site because we use Profile provider
extensively.


CREATE PROCEDURE 
[dbo].[aspnet_Profile_GetProperties]

    @ApplicationName      
nvarchar
(256),

    @UserName

nvarchar
(256),

    @CurrentTimeUtc       
datetime

AS

BEGIN

    
DECLARE

@ApplicationId

uniqueidentifier

    
SELECT

@ApplicationId
= 
NULL

    
SELECT

@ApplicationId
= ApplicationId 

                FROM 
dbo.aspnet_Applications


                        WHERE LOWER
(@ApplicationName)
= LoweredApplicationName

    
IF

(@ApplicationId

IS
NULL
)

        

        RETURN


    
DECLARE

@UserId

uniqueidentifier

    
SELECT

@UserId
= 
NULL


    
SELECT

@UserId
= UserId

    
FROM


 dbo.aspnet_Users

    
WHERE

ApplicationId
= @ApplicationId 

                AND 
LoweredUserName
= 

                        LOWER
(@UserName)


    
IF

(@UserId

IS
NULL
)

        

        RETURN

    
SELECT
TOP 
1
PropertyNames, PropertyValuesString, PropertyValuesBinary

    
FROM


       dbo.aspnet_Profile

    
WHERE


      UserId = @UserId


    
IF

(@@ROWCOUNT
> 0)

    
BEGIN

        

        UPDATE 
dbo.aspnet_Users

        

        SET 

  LastActivityDate=@CurrentTimeUtc

        

        WHERE 
UserId
= @UserId

    
END

END

When you run the SP, see the statistics:


Table 
'aspnet_Applications'
.
Scan 
count

1,
logical reads 2, physical reads 0, 

                        read
-ahead
reads 0, lob logical reads 0, lob physical
                            reads 0, lob 
read
-ahead
reads
                                    0.


(1 row(s) affected)

Table 
'aspnet_Users'
.
Scan 
count

1,
logical reads 4, physical reads 0, 

                        read
-ahead
reads 0, lob logical reads 0, lob physical
                            reads 0, lob 
read
-ahead
reads
                                    0.


(1 row(s) affected)


(1 row(s) affected)

Table 
'aspnet_Profile'
.
Scan 
count

0,
logical reads 3, physical reads 0, 

                        read
-ahead
reads 0, lob logical reads 0, lob physical
                            reads 0, lob 
read
-ahead
reads
                                    0.


(1 row(s) affected)

Table 
'aspnet_Users'
.
Scan 
count

0,
logical reads 27, physical reads 0, 

                        read
-ahead
reads 0, lob logical reads 0, lob physical
                            reads 0, lob 
read
-ahead
reads
                                    0.


(1 row(s) affected)


(1 row(s) affected)

This stored proc is called whenever you access some
property on Profile object.

First it does a SELECT on aspnet_application to find out the
application ID from application name. You can easily replace this
with a hard coded application ID inside the SP and save some
effort. Normally we run only one application on our production
server. So, there’s no need to lookup application ID on every
single call.

From client statistics, you can see how expensive it is:

Then look at the last block where aspnet_users table is updated
with LastActivityDate. This is the most expensive one.

This is done in order to ensure Profile provider remembers when
was the last time a user’s profile was accessed. We do not need to
do this on every single page load and web service call. May be we
can do it when user first logs in and logs out. In our case, a lot
of web service is called while user is on the page. There’s only
one page anyway. So, we can easily remove this in order to save a
costly update on the giant aspnet_users table on every single web
service call.

100% CPU, 100% IO, a near death experience for SQL Server 2005 and us

For last two weeks we were having pretty hard time at
Pageflakes. The
database server was having 100% CPU, and 100% IO usage most
of time. It was running hot, almost about to go to a coma.
It’s a 64 bit Dual Core Dual Xeon Dell server with 2 GB RAM running
around 30 GB database on 4 SCSI drives. So, it’s more or less the
best hardware money can buy (except for RAM ofcourse). But
still the performance counter looked like this:


Pretty horrible situation isn’t it? Users are having connection
timeout on their browser. Event Log is full of “SQLConnection:
Timeout”. User’s can’t see their page. Those who can see their
pages, have serous poor performance and slow response from the
server. You can imagine the rest. Email flood, phone calls,
management screaming on your ears etc etc.

After a lot of diagnostics, we came to conclusion that SQL
Server 2005 is the culprit. So, we ran SQL Profiler on a different
server (don’t ever run SQL Profiler on the same server where you
database is running). We saw there’s an SP which is taking
thousands of read and cpu cycles.


So, we were pretty much sure that SP was the culprit. So, we
took one of the long executing ones and ran it through Execution
Plan to see what goes wrong.

So, we first did this in order to see IO usage during the SP
execution:


set statistics IO on

GO

Here’s the output after we run the SP:

Table ‘RSSItem’. Scan count 1, logical reads 40, physical reads
0, read-ahead reads 0, lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.

Nothing special. Runs pretty quick, very low read count. Looks
like no read count at all, completely cached and returned from
memory. So, SQL Server 2005 is doing pretty good caching on
it. After this, we verified whether the index was being
used properly or not:


‘Clustered Index Seek’ the best possible thing. If it was
Clustered Index Scan, then we would have thought there’s something
wrong. So, we were making best use of Index also. We were pulling
our hairs out.

So, I decided to look at the SP in order to see if I can find
out something. I stared at this for half an hour:


ALTER PROCEDURE 
[dbo].[prcRSSItemGetByChannelIDPageSize]

    @ChannelID 
int
,

    @StartIndex 
int
,

    @PageSize 
int

AS


WITH 
SelectedRSSItems(
[ID],[ChannelID],[Hash],...)

AS

(

    
SELECT


        [ID],

        [ChannelID],

        [Hash],

        [Title],

        [Guid],

        [Description],

        [EncodedContent],

        [Link],

        [PublishDate],

        [XML],

        [SavedCount],

        ROW_NUMBER() 
OVER

(
ORDER
BY 
[PublishDate]

DESC
)

AS

[RowNumber]

    
FROM

        dbo.[RSSItem]

    
WHERE

        [ChannelID] = @ChannelID 

)


SELECT 

        *

FROM

        SelectedRSSItems

WHERE

        RowNumber 
BETWEEN

(@StartIndex
+ 1) 
AND

(@StartIndex
+ @PageSize)

ORDER BY 
PublishDate

desc

Those who are wondering what’s with the “WITH” block. This is
called Common Table Expression (CTE). This is a new feature in SQL
Server 2005 which is the best so far for paging. Previously we had
to page rows by doing any of the following:

  1. Copy primary key in a temporary table after sorting and
    use IDENTITY column to generate row number. Then select rows by
    joining primary key from temporary table to the actual table. The
    paging is done based on row number on the temporary table. Very
    expensive. You end up creating temporary table on every call to the
    SP. Too many IO on tempdb.
  2. Use subquery. Very complicated SQL. Not so good result.
  3. Use cursor to skip rows and select only those which fall within
    the page. It was the best solution so far.

Now you have the mighty ROW_NUMBER() function which we all have
been waiting for 10 years!

Back to our disaster, you see the SQL is nice and clean, nothing
suspicious. But SQL Server 2005 is drowning taking us and a
thousands of users all over the world with it.

Finally, it struck my mind! SQL Server is doing something on the
PublishDate field because it needs to sort the rows based on
it.


ROW_NUMBER()


OVER

(

ORDER BY

[PublishDate]

DESC

)

AS

[RowNumber]

What if I make a new Non-Clustered index on ChannelID and
PublishDate? The query filters using ChannelID and sorts using
Publishdate. So, it should get everything it needs from the
index.

I created a brand new index on ChannelID and PublishDate. It
took 12 mins to create the index. Then the CPU usage looked like
this:


We are saved! We are back in business!

So, what we learn from this experience? We learn nothing.

This blog post is dedicated to my friend Shahed who wrote the
above SP two weeks ago and gave us a lot of fun all these
days.

Blogging Tools

Today I am trying to use Windows Live Writer Beta. Previously I
have tried using One Note 2007, MS Word 2007, InfoPath template.
None of these were good enough. Although MS Word 2007 is really
good for writing and quick publishing, but it generates bloated
html, makes you page heavy and also codes look horrible. None of
these have the feature to upload pictures with post. So, you have
to go to the blog website, upload pictures and then edit the post
using the web editor and insert pictures in it. If you do all these
from the web, then the desktop blogging experience does not improve
at all.

Now Windows Live Writer seems to have solved all these problems.
Let’s try publishing a picture:

It’s a pretty interesting picture. For last one week, our
servers running Pageflakes
were almost dying with super high CPU and IO usage. Here you see a
64bit Dual Core Dual Xeon DELL server is at its knees. We finally
resolved the problem last night. It was a near death experience for
us. I will soon write about it. I tried a long time to put some
picture. Did not find any. So, just testing with this.

Let’s try the Google Map embedding. This is my country and Dhaka
is the city where I live.


Let’s try some codes from Visual Studio:

    [WebMethod]
    [WebOperationAttribute(true, ResponseFormatMode.Json, false)]
    public int GetPageVersionNo(int pageId)
    {
        using (TheFacade facade = new TheFacade())
        {
            return facade.ThePageflakes.GetPageVersionNo(pageId, Profile.UserName);
        }
    }

OK, here’s a catch. You cannot paste html to this editor.
You need to switch to Html View in order to paste the html.
But this is much better than going to web editor and doing
it.

So, far it really felt the best editing experiencing. Going to
publish it now. Hope it works…

Web Application performance optimization tips

Yesterday I took a class on website performance optimization.
I would like to share what I did in the class.

I took the class in a company which has a local blog site. It has
serious performance issue because of both hardware and software
limitations. The homepage takes around 3 sec to prepare on the
server side which is absolutely unacceptable. So, it was an ideal
candidate to digg into the architecture and code of the site and
try to improve part by part.

The blog site has the following tables: Blog, Post, Comment,
Category. The database is mySQL.

Blog table is like this which represents one blog of a user:


ID


BIGINT


Title


text


UserID


BIGINT FK to user table

Post table contains posts in a blog and looks like this


ID


BIGINT


BlogID


BIGINT FK to Blog table


Title


Text


Section1


Char(1)


Section2


Char(1)


Section3


Char(1)


.

.

.


Here Section1…9 are global sections for blog posts like
Politics, Jokes, Entertainment etc. Each blog belongs to some
section.

If Section1 contains ‘1’, then the post belongs to Politics. (I
know what you are thinking, hold on)

Comment table contains comments submitted against posts:


ID


BIGINT


PostID


BIGINT FK to Blog table


Title


Text

Category table contains category of a blog where posts belong. Each
blog can have their own category list. Each post belongs to one or
more category.


ID


BIGINT


BlogID


BIGINT FK to Blog table


Title


Text

Optimization step 1 – Table design and data
type

First step is to change the table design. You see all those
BIGINTs? Why do you need BIGINT instead of INT? INT is 32 bit which
gives you a range from 0 to 4294967295. So, if you make 13 posts
per second, in 10 years this number will run out. It’s a good plan
to think ahead of 10 years, unfortunately we don’t have 10 years
advanced hardware. When you have 32bit processor, every time you
make you processor work with a 64bit number like comparing numbers
(e.g. WHERE BlogID=1121233388765543246) it has to compare two 32
bit chunks and combine their results in order to reach to a
decision. So, your 32bit processor is doing double work. If your
CPU is 60% busy with such computation, converting to INT will
reduce CPU usage to nearly 40% easily because comparing 32bit with
another 32bit for a 32bit processor is just one operation for the
processor. 32bit data structure is the fastest data structure for
32bit processors. But if you have a 64bit processor hardware, OS is
64bit, Database engine is 64bit, then you can use 64bit data
structure and then 64bit data structure will be the fastest one.
However, if you have 64bit hardware, but your OS is 32bit and
Database Engine is also 32bit, you do not gain much speed
improvement.

So, we converted all the BIGINTs to INT. 64bit to 32bit.

We also need to think about storage optimization. 64bit things will
take double space than 32bit things. But that’s a negligible
addition. We need to think about bigger stuffs like Text data
types. Sometimes we use the common “Text” data type of variable
length every where in order to avoid future database changes. So,
trying to save efforts in changing database design in next 10 years
ultimately results in poor database performance and storage
problems for next 10 years. Here are some tips on choosing the
right field type and size:

  • If you don’t need unicode support (multiple
    language) in fields, use the non-unicode types. For ex, use
    varchar instead of nvarchar. All the text data types that start
    with “n” support unicode and thus take 2 bytes per character.
    That’s double the size of the non-“n” counter parts. Normally we
    need unicode only for fields which somehow gets into the UI e.g.
    First Name, Title, Description etc. But fields like “FileName”,
    “Path”, “Url” etc do not need unicode support at all (when your
    server is in some English speaking country, not in Japan, China
    etc). So, you can use varchar in these cases and save 50%
    space.
  • Try to avoid using text data types as primary
    key. Searching over a text data type is lot more expensive than
    searching on integer data types. I have seen tables with primary
    key on “Url”, “FileName”, “Full Name” type fields. Think about what
    happens when you make a text field primary key. Database Engine has
    to match characters in the field data whenever you do queries like
    “WHERE FileName=’something'”. It has to match “all” the characters
    in order to find a complete match. That’s way more expensive than
    just doing “WHERE ID=10” which only takes one 32bit comparison
    which the processor can do billions of times per second. You can
    easily get around using text fields as primary key by adding an
    integer “ID” column which is an auto number and using that ID
    column on hyperlinks and page navigations. For example, instead of
    using hyperlinks like

    www.pageflakes.com/downloadfile.aspx?filename=TestFile.html

    you can do it as www.pageflakes.com/downloadfile.aspx?id=10
  • Fixed length text data types are much faster to
    compare and store than variable length data types. Whenever you use
    fixed length data types like char(255), database engine allocates
    fixed 255 bytes in the row and it knows there’s no way it’s going
    to increase. So, it need not maintain a ‘length’ indicator, nor
    does it need to increase/decrease row size whenever the content
    changes. This results in less fragmentation. Less fragmentation
    means less hard drive usage and less moving around in different
    parts of the hard drive. So, use fixed length data types whenever
    possible and especially to those fields which appear in WHERE
    clause. But beware that when field length fields are compared, you
    need to account for the trailing spaces. If you are comparing WHERE
    FirstName = ‘Omar’ and first name field is defined as char(15) then
    you actually need to do WHERE FirstName = ‘Omar’ + (11
    spaces).
  • Do not use expression on the left side of the
    WHERE clause. For example, you could do the above like this: WHERE
    Rtrim(FirstName) = ‘Omar’. But that makes the database engine to
    trim every single first name it runs through. Similarly, if you do
    WHERE Age+10 > 20 it has to do the sum for every row it runs
    through. The idea is to use the expression on the right side so
    that database engine can run that expression once and then compare
    the resultant with the field data on the left. So, do this: WHERE
    Age > 10

Optimization Step 2 – Using proper index

Normally rows are added sequentially and the last row is added at
the end of the table. So, blog posts are stored in this way in
database:

Post table sample data


ID


BlogID


Title


1


1



2


1



3


2



4


1



5


2



6


4



7


2



8


4



Now if you ask your database engine to find all the posts that
belong to BlogID = 2, it has to run through the entire table to
find which rows have BlogID = 2 and select those rows. So, if you
have 1 million rows, it runs through 1 million rows to select 10 or
20 rows from it. If you see your server’s CPU usage is very high
and hard drive activity is also very high, then database is running
throughout your hard drive in order to find rows. Here’s an
interesting way to find how busy your hard drive is:
High disk usage

Go to Control Panel->Administrative Tools->Performance

Add the above performance counters. You will see hard drive usage
pretty well. See how busy both hard drive is from the above
picture? Read time is more than it can handle.

% Disk Read Time is way high. Which means it’s trying its best to
find rows from DB and going out of its limit. Also the alarming
counter is “Current Disk Queue Length”. This indicates that there
are 22 requests waiting for reading something from your hard drive
and they are currently waiting for the last request to complete.
This means your hard drive is so busy running here and there that
request for fetching data from hard drive is getting piled up
continuously.

By using Index, you can resolve this. Index are like Binary Trees.
Binary Tree stores data in such a way that, they can be searched
very quickly. If you have 1M rows in a table, it can find an entry
in say 100 steps. In order to learn details about Binary Tree,
please search
in Google
.

However, you need to do proper indexing in order to get the correct
result. Improper indexing will make this situation worse. Here’s
what I discovered in an index in the “Post” table:


Index


BlogID, ID

The index is on BlogID field and ID field. We definitely need
BlogID field to be indexed because we need to find rows using
BlogID. But what happens when you make one index with both BlogID
and ID? It creates each entry in the index using a combination of
BlogID and ID. So, an entry in the index only matches when you
query rows with both BlogID “AND” ID. Something like “WHERE
BlogID=1 AND ID=12312”. But no one will do it for sure because if
you know the ID, you know a particular row already. You don’t need
BlogID at all. Usually the query is “WHERE BlogID=something”. So,
you need one index which has BlogID only. However, sometimes you
need to find an individual post using ID. So, you also need
“another” index which has ID field only. So, there should be 2
separate indexes, one on BlogID and one on ID.

After doing this, table scan stopped, indexes were used properly
and performance counter looked like this:
Low Disk Usage

Peace at last.

Some handy tips for making proper index:

  • You MUST NOT run a query which has fields in
    WHERE clause but those fields are not part of any index. As soon as
    you run a query which has some field in the WHERE cause not part of
    any index, database engine has to run through ENTIRE table to find
    a match. That’s when CPU and Hard drive usage goes sky
    high.
  • Look at your WHERE clause fields. If you have
    AND, then make an index which contains all the fields in ANDs. For
    ex, WHERE A=1 AND B=2 AND C=3, then make an index based on A, B, C
    and it must be in this exact order.
  • If you have multiple queries on the same table
    and each has different fields in WHERE clause, create different
    indexes. For ex, if you have a query which does WHERE A=1 and then
    there’s another query which does WHERE B=1, create two separate
    index. One with field A and one with field B.
  • The above applied for clauses which has OR. For
    example, WHERE A=1 OR B=2

Optimization step 3 – Caching rows in memory or text
file

Let’s look at how the page is rendered, we need to decide on
performance optimization strategy based on how things really
look:


Logo


Page title


Date


Top Blogger list


Blog posts


Recent comment list


Blogger1

Blogger2

Blogger3

Blogger4


Post 1 Title

Post 1 description…….

# of comments

Post 2 Title

Post 2 description…….

# of comments

Post 3 Title

Post 3 description…….

# of comments

Post 4 Title

Post 4 description…….

# of comments


Comment 1
Comment 2
Comment 3
Comment 4

There are 3 major parts, list of Top bloggers who has the highest
number of posts, show the last 10 blogs per page at the middle and
at the right side, show last 10 comments

Left part blogger list is generated based on this query:

  • Find number of posts made by each
    user
  • Order them in descending order. Highest number of
    posts become Rank 1.
  • Get the top 10 rows.

You can guess, it’s a complicated query and requires the database
engine to do a lot of running here and there on comments table,
post table, blog table and user table. So, when we run this query
on every visit to homepage, it makes the entire server pretty slow.
There are 10 to 15 requests per second on the homepage which makes
the database engine run this query 10 to 15 times per second!
That’s around 864,000 per day! It’s a miracle that the server is
still alive don’t you think? Now think about the scenario, when
does bloggers’ rank change? When a blogger makes a post. So, how
often does a blogger make a post? Say twice or thrice per day. So,
why do we run this query 10 times per second instead of twice or
thrice per day? Now that is the right question, Neo.

Remember this, the frequency of changing data is much less than the
frequency of reading data.

So, what does this mean? INSERT, UPDATE and DELETE happens much
less than SELECT. Normally SELECT happens 100s or 1000s times more
than INSERT/UPDATE/DELETE. So, we do not need to run the same
SELECT again and again when the underlying rows are not changing at
all. We can SELECT once, and store the result somewhere so that
when we SELECT the same thing again and the data in the table has
not changed, we can serve the same result from memory instead of
going to the database.

So, here’s the plan, we run the complicated query which takes some
time to find out the top 10 bloggers and then get the result. Then
we store the result either in memory or in some text file in the
web server. The homepage will ALWAYS read from that text file and
render the list. It will never go to the database and fetch the
list. So, this already saves the database from running a
complicated query and reduces the execution time of the
homepage.

The same thing can be done for the “Recent Comments” list. This
list gets updated only when a new comment is posted. So, it can
also be cached and stored either in memory or in a text file.

Making these 2 most expensive queries cached, we can save 40% of
the execution time of the home page. Now that’s some
improvement!

Now comes the most interesting part, the center part which shows 10
posts at a time. Post table is a pretty big table. It can have
millions of rows within couple of months. So, doing queries on this
table will become pretty expensive unless you do the right
indexing. Even indexing won’t help you much if there’s too much
traffic on your site. So, we need to do intelligent caching
here.

We apply the principle of updating cache only when something change
in this place too. The post list shows recent posts in paging mode
which shows 10 posts at a time. Normally people see the top 10
posts every time they visit the homepage. So, that’s something we
need to cache seriously. Also, this cache gets refreshed frequently
too. May be once every 20 seconds when someone adds or edits a
post. So, the lifetime of this cache is pretty short also.

But if you cache top 100 posts in the cache, you can provide cached
content to a user for up to 10 clicks on the ‘Next 10 posts’ link.
Normally 80% users will click ‘Next 10 posts’ once, 60% will click
twice, 40% will click thrice. Some persistent ones will click it 10
times but that’s pretty rare. So, you can get around by caching top
100 or 200 posts easily. This saves you from doing a SELECT on the
large “Post” table and then ordering it by “PostDate” and then
selecting the top 10. You do it only once when you build the
“Top100Posts” cache table or text file and store the data in the
table in sorted order. This way, when paging occurs, you only
select Top 10 rows from Nth position from the “Top100Posts” table
or the text file. This happens blazingly fast compared to doing
this on the actual “Post” table.

Optimization Step 4 – Page output rendering

Here’s a nice formula you can use for caching:

Output = code(a) + code(b)

Where, code(a) = db(a) + db(b)

code(b) = db(c) + db(d)

This means, some code uses some part of the database and some other
part of the code uses some other part of the database. The final
output that user sees is from both codes’ combined hard
effort.

So, if db(a) and db(b) gets cached, the output of code(a) can also
be cached. Their is no way result of code(a) is going to change
until either db(a) or db(b) changes. Think about the blog example.
If top 10 bloggers are calculated by code(a) and it uses two
database operations to prepare the list db(a) and db(b), then until
db(a) or db(b) returns a different value, the result of code(a) is
never going to change. So, when we cache it, the equation
becomes:

x = code(a) = db(a) + db(b)

So, the formular is: output = x + code(b)

Let’s say code(b) generates the ‘Recent comments’ list. If that is
also cached then the formula becomes:

y = code(b) = db(c) + db(d)

So, the formula is: output, z = x + y

Now you see, the right side is entirely cached. So, until either x
or y changes, z is always the same. That means, if you and I hit
the home page of the blog site, we will see the exact output until
someone posts a blog or comment.

Now think, why do we even need to execute code(a) and code(b) and
generate the output at all? If we do it once in a while and store
the entire output in a static html file, then we can just deliver
that static html file to the browser. Is there any need to execute
.php code, or .aspx code at all for the homepage? The entire
homepage can become just a static html file which web server can
cache it in memory. Thus the entire page can be served from memory
without requiring a single IO operation or database call!

Conclusion

We started with super high CPU and IO usage, thousands of database
calls, super slow web site and came down to zero database call,
zero IO usage and a super fast homepage.

Reply to all emails from unique sender in an Outlook folder


You have a folder full of mails from your
friends. Now you want to reply to each of them sending some fixed
message. One person will get only one reply no matter how many
mails are there from that user. Here’s how you can do
this:

  1. First write an email in Text
    Format.
  2. Save the email. It will go
    to “Drafts” folder. Now drag that email to the folder which
    contains all the emails you want to reply to.
  3. Now press ALT+F11 and paste
    the macro at the end of the post.
  4. Hit F5 and wait for a long
    time. You will see, Outbox has all the reply
    emails.

The macro runs through each and every email
and then checks if the sender is already replied to. If not replied
to, then it gets the body of the message and then makes a reply by
combining your message with the body of the message.

The benefit of this approach is you are
replying to mails sent from someone which has low probability of
getting filtered out by spam filters.

Sub ReplytoALLEmail()
    Dim objFolder As Folder


Dim objTemplate As MailItem
Set objTemplate = Application.ActiveExplorer.Selection.Item(1)

Set objFolder = objTemplate.Parent

Dim objItem As MailItem
Dim objReply As MailItem

Dim dic As New Dictionary

Dim strEmail As String

Dim strEmails As String

Dim strBody As String
strBody = objTemplate.Body

For Each objItem In objFolder.Items
If Not (objTemplate = objItem) Then

strEmail = objItem.SenderEmailAddress

If Not dic.Exists(strEmail) Then
strEmails = strEmails + strEmail + ";"

dic.Add strEmail, ""

Set objReply = objItem.Reply()

objReply.Body = strBody + vbCrLf + vbCrLf + objItem.Body

On Local Error Resume Next
objReply.Send
End If
End If

Next

Debug.Print strEmails
End Sub