Omar AL Zabir Blog – Page 20 – Engineering Manager, Meta

100% CPU, 100% IO, a near death experience for SQL Server 2005 and us

For last two weeks we were having pretty hard time at
Pageflakes. The
database server was having 100% CPU, and 100% IO usage most
of time. It was running hot, almost about to go to a coma.
It’s a 64 bit Dual Core Dual Xeon Dell server with 2 GB RAM running
around 30 GB database on 4 SCSI drives. So, it’s more or less the
best hardware money can buy (except for RAM ofcourse). But
still the performance counter looked like this:

Pretty horrible situation isn’t it? Users are having connection
timeout on their browser. Event Log is full of “SQLConnection:
Timeout”. User’s can’t see their page. Those who can see their
pages, have serous poor performance and slow response from the
server. You can imagine the rest. Email flood, phone calls,
management screaming on your ears etc etc.

After a lot of diagnostics, we came to conclusion that SQL
Server 2005 is the culprit. So, we ran SQL Profiler on a different
server (don’t ever run SQL Profiler on the same server where you
database is running). We saw there’s an SP which is taking
thousands of read and cpu cycles.

So, we were pretty much sure that SP was the culprit. So, we
took one of the long executing ones and ran it through Execution
Plan to see what goes wrong.

So, we first did this in order to see IO usage during the SP
execution:


set statistics IO on

GO

Here’s the output after we run the SP:

Table ‘RSSItem’. Scan count 1, logical reads 40, physical reads
0, read-ahead reads 0, lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.

Nothing special. Runs pretty quick, very low read count. Looks
like no read count at all, completely cached and returned from
memory. So, SQL Server 2005 is doing pretty good caching on
it. After this, we verified whether the index was being
used properly or not:

‘Clustered Index Seek’ the best possible thing. If it was
Clustered Index Scan, then we would have thought there’s something
wrong. So, we were making best use of Index also. We were pulling
our hairs out.

So, I decided to look at the SP in order to see if I can find
out something. I stared at this for half an hour:


ALTER PROCEDURE 
[dbo].[prcRSSItemGetByChannelIDPageSize]


    @ChannelID 
int
,


    @StartIndex 
int
,


    @PageSize 
int

AS


WITH 
SelectedRSSItems(
[ID],[ChannelID],[Hash],...)

AS


    
SELECT


        [ID],


        [ChannelID],


        [Hash],


        [Title],


        [Guid],


        [Description],


        [EncodedContent],


        [Link],


        [PublishDate],


        [XML],


        [SavedCount],


        ROW_NUMBER() 
OVER

(
ORDER
BY 
[PublishDate]

DESC
)

AS

[RowNumber]


    
FROM


        dbo.[RSSItem]


    
WHERE


        [ChannelID] = @ChannelID


SELECT


FROM


        SelectedRSSItems


WHERE


        RowNumber 
BETWEEN

(@StartIndex
+ 1) 
AND

(@StartIndex
+ @PageSize)


ORDER BY 
PublishDate

desc

Those who are wondering what’s with the “WITH” block. This is
called Common Table Expression (CTE). This is a new feature in SQL
Server 2005 which is the best so far for paging. Previously we had
to page rows by doing any of the following:

Copy primary key in a temporary table after sorting and
use IDENTITY column to generate row number. Then select rows by
joining primary key from temporary table to the actual table. The
paging is done based on row number on the temporary table. Very
expensive. You end up creating temporary table on every call to the
SP. Too many IO on tempdb.
Use subquery. Very complicated SQL. Not so good result.
Use cursor to skip rows and select only those which fall within
the page. It was the best solution so far.

Now you have the mighty ROW_NUMBER() function which we all have
been waiting for 10 years!

Back to our disaster, you see the SQL is nice and clean, nothing
suspicious. But SQL Server 2005 is drowning taking us and a
thousands of users all over the world with it.

Finally, it struck my mind! SQL Server is doing something on the
PublishDate field because it needs to sort the rows based on
it.

ROW_NUMBER()

OVER
(
ORDER BY
[PublishDate]
DESC
)
AS
[RowNumber]

What if I make a new Non-Clustered index on ChannelID and
PublishDate? The query filters using ChannelID and sorts using
Publishdate. So, it should get everything it needs from the
index.

I created a brand new index on ChannelID and PublishDate. It
took 12 mins to create the index. Then the CPU usage looked like
this:

We are saved! We are back in business!

So, what we learn from this experience? We learn nothing.

This blog post is dedicated to my friend Shahed who wrote the
above SP two weeks ago and gave us a lot of fun all these
days.

Blogging Tools

Today I am trying to use Windows Live Writer Beta. Previously I
have tried using One Note 2007, MS Word 2007, InfoPath template.
None of these were good enough. Although MS Word 2007 is really
good for writing and quick publishing, but it generates bloated
html, makes you page heavy and also codes look horrible. None of
these have the feature to upload pictures with post. So, you have
to go to the blog website, upload pictures and then edit the post
using the web editor and insert pictures in it. If you do all these
from the web, then the desktop blogging experience does not improve
at all.

Now Windows Live Writer seems to have solved all these problems.
Let’s try publishing a picture:

It’s a pretty interesting picture. For last one week, our
servers running Pageflakes
were almost dying with super high CPU and IO usage. Here you see a
64bit Dual Core Dual Xeon DELL server is at its knees. We finally
resolved the problem last night. It was a near death experience for
us. I will soon write about it. I tried a long time to put some
picture. Did not find any. So, just testing with this.

Let’s try the Google Map embedding. This is my country and Dhaka
is the city where I live.

Let’s try some codes from Visual Studio:

    [WebMethod]

    [WebOperationAttribute(true, ResponseFormatMode.Json, false)]

    public int GetPageVersionNo(int pageId)

        using (TheFacade facade = new TheFacade())

            return facade.ThePageflakes.GetPageVersionNo(pageId, Profile.UserName);

OK, here’s a catch. You cannot paste html to this editor.
You need to switch to Html View in order to paste the html.
But this is much better than going to web editor and doing
it.

So, far it really felt the best editing experiencing. Going to
publish it now. Hope it works…

Web Application performance optimization tips

Yesterday I took a class on website performance optimization.
I would like to share what I did in the class.

I took the class in a company which has a local blog site. It has
serious performance issue because of both hardware and software
limitations. The homepage takes around 3 sec to prepare on the
server side which is absolutely unacceptable. So, it was an ideal
candidate to digg into the architecture and code of the site and
try to improve part by part.

The blog site has the following tables: Blog, Post, Comment,
Category. The database is mySQL.

Blog table is like this which represents one blog of a user:

ID	BIGINT
Title	text
UserID	BIGINT FK to user table

Post table contains posts in a blog and looks like this

ID	BIGINT
BlogID	BIGINT FK to Blog table
Title	Text
Section1	Char(1)
Section2	Char(1)
Section3	Char(1)
. . .

Here Section1…9 are global sections for blog posts like
Politics, Jokes, Entertainment etc. Each blog belongs to some
section.

If Section1 contains ‘1’, then the post belongs to Politics. (I
know what you are thinking, hold on)

Comment table contains comments submitted against posts:

ID	BIGINT
PostID	BIGINT FK to Blog table
Title	Text

Category table contains category of a blog where posts belong. Each
blog can have their own category list. Each post belongs to one or
more category.

ID	BIGINT
BlogID	BIGINT FK to Blog table
Title	Text

Optimization step 1 – Table design and data
type

First step is to change the table design. You see all those
BIGINTs? Why do you need BIGINT instead of INT? INT is 32 bit which
gives you a range from 0 to 4294967295. So, if you make 13 posts
per second, in 10 years this number will run out. It’s a good plan
to think ahead of 10 years, unfortunately we don’t have 10 years
advanced hardware. When you have 32bit processor, every time you
make you processor work with a 64bit number like comparing numbers
(e.g. WHERE BlogID=1121233388765543246) it has to compare two 32
bit chunks and combine their results in order to reach to a
decision. So, your 32bit processor is doing double work. If your
CPU is 60% busy with such computation, converting to INT will
reduce CPU usage to nearly 40% easily because comparing 32bit with
another 32bit for a 32bit processor is just one operation for the
processor. 32bit data structure is the fastest data structure for
32bit processors. But if you have a 64bit processor hardware, OS is
64bit, Database engine is 64bit, then you can use 64bit data
structure and then 64bit data structure will be the fastest one.
However, if you have 64bit hardware, but your OS is 32bit and
Database Engine is also 32bit, you do not gain much speed
improvement.

So, we converted all the BIGINTs to INT. 64bit to 32bit.

We also need to think about storage optimization. 64bit things will
take double space than 32bit things. But that’s a negligible
addition. We need to think about bigger stuffs like Text data
types. Sometimes we use the common “Text” data type of variable
length every where in order to avoid future database changes. So,
trying to save efforts in changing database design in next 10 years
ultimately results in poor database performance and storage
problems for next 10 years. Here are some tips on choosing the
right field type and size:

If you don’t need unicode support (multiple
language) in fields, use the non-unicode types. For ex, use
varchar instead of nvarchar. All the text data types that start
with “n” support unicode and thus take 2 bytes per character.
That’s double the size of the non-“n” counter parts. Normally we
need unicode only for fields which somehow gets into the UI e.g.
First Name, Title, Description etc. But fields like “FileName”,
“Path”, “Url” etc do not need unicode support at all (when your
server is in some English speaking country, not in Japan, China
etc). So, you can use varchar in these cases and save 50%
space.
Try to avoid using text data types as primary
key. Searching over a text data type is lot more expensive than
searching on integer data types. I have seen tables with primary
key on “Url”, “FileName”, “Full Name” type fields. Think about what
happens when you make a text field primary key. Database Engine has
to match characters in the field data whenever you do queries like
“WHERE FileName=’something'”. It has to match “all” the characters
in order to find a complete match. That’s way more expensive than
just doing “WHERE ID=10” which only takes one 32bit comparison
which the processor can do billions of times per second. You can
easily get around using text fields as primary key by adding an
integer “ID” column which is an auto number and using that ID
column on hyperlinks and page navigations. For example, instead of
using hyperlinks like
www.pageflakes.com/downloadfile.aspx?filename=TestFile.html
you can do it as www.pageflakes.com/downloadfile.aspx?id=10
Fixed length text data types are much faster to
compare and store than variable length data types. Whenever you use
fixed length data types like char(255), database engine allocates
fixed 255 bytes in the row and it knows there’s no way it’s going
to increase. So, it need not maintain a ‘length’ indicator, nor
does it need to increase/decrease row size whenever the content
changes. This results in less fragmentation. Less fragmentation
means less hard drive usage and less moving around in different
parts of the hard drive. So, use fixed length data types whenever
possible and especially to those fields which appear in WHERE
clause. But beware that when field length fields are compared, you
need to account for the trailing spaces. If you are comparing WHERE
FirstName = ‘Omar’ and first name field is defined as char(15) then
you actually need to do WHERE FirstName = ‘Omar’ + (11
spaces).
Do not use expression on the left side of the
WHERE clause. For example, you could do the above like this: WHERE
Rtrim(FirstName) = ‘Omar’. But that makes the database engine to
trim every single first name it runs through. Similarly, if you do
WHERE Age+10 > 20 it has to do the sum for every row it runs
through. The idea is to use the expression on the right side so
that database engine can run that expression once and then compare
the resultant with the field data on the left. So, do this: WHERE
Age > 10

Optimization Step 2 – Using proper index

Normally rows are added sequentially and the last row is added at
the end of the table. So, blog posts are stored in this way in
database:

Post table sample data

ID	BlogID	Title
1	1	…
2	1	…
3	2	…
4	1	…
5	2	…
6	4	…
7	2	…
8	4	…

Now if you ask your database engine to find all the posts that
belong to BlogID = 2, it has to run through the entire table to
find which rows have BlogID = 2 and select those rows. So, if you
have 1 million rows, it runs through 1 million rows to select 10 or
20 rows from it. If you see your server’s CPU usage is very high
and hard drive activity is also very high, then database is running
throughout your hard drive in order to find rows. Here’s an
interesting way to find how busy your hard drive is:

Go to Control Panel->Administrative Tools->Performance

Add the above performance counters. You will see hard drive usage
pretty well. See how busy both hard drive is from the above
picture? Read time is more than it can handle.

% Disk Read Time is way high. Which means it’s trying its best to
find rows from DB and going out of its limit. Also the alarming
counter is “Current Disk Queue Length”. This indicates that there
are 22 requests waiting for reading something from your hard drive
and they are currently waiting for the last request to complete.
This means your hard drive is so busy running here and there that
request for fetching data from hard drive is getting piled up
continuously.

By using Index, you can resolve this. Index are like Binary Trees.
Binary Tree stores data in such a way that, they can be searched
very quickly. If you have 1M rows in a table, it can find an entry
in say 100 steps. In order to learn details about Binary Tree,
please search
in Google.

However, you need to do proper indexing in order to get the correct
result. Improper indexing will make this situation worse. Here’s
what I discovered in an index in the “Post” table:

Index

BlogID, ID

The index is on BlogID field and ID field. We definitely need
BlogID field to be indexed because we need to find rows using
BlogID. But what happens when you make one index with both BlogID
and ID? It creates each entry in the index using a combination of
BlogID and ID. So, an entry in the index only matches when you
query rows with both BlogID “AND” ID. Something like “WHERE
BlogID=1 AND ID=12312”. But no one will do it for sure because if
you know the ID, you know a particular row already. You don’t need
BlogID at all. Usually the query is “WHERE BlogID=something”. So,
you need one index which has BlogID only. However, sometimes you
need to find an individual post using ID. So, you also need
“another” index which has ID field only. So, there should be 2
separate indexes, one on BlogID and one on ID.

After doing this, table scan stopped, indexes were used properly
and performance counter looked like this:

Peace at last.

Some handy tips for making proper index:

You MUST NOT run a query which has fields in
WHERE clause but those fields are not part of any index. As soon as
you run a query which has some field in the WHERE cause not part of
any index, database engine has to run through ENTIRE table to find
a match. That’s when CPU and Hard drive usage goes sky
high.
Look at your WHERE clause fields. If you have
AND, then make an index which contains all the fields in ANDs. For
ex, WHERE A=1 AND B=2 AND C=3, then make an index based on A, B, C
and it must be in this exact order.
If you have multiple queries on the same table
and each has different fields in WHERE clause, create different
indexes. For ex, if you have a query which does WHERE A=1 and then
there’s another query which does WHERE B=1, create two separate
index. One with field A and one with field B.
The above applied for clauses which has OR. For
example, WHERE A=1 OR B=2

Optimization step 3 – Caching rows in memory or text
file

Let’s look at how the page is rendered, we need to decide on
performance optimization strategy based on how things really
look:

Logo	Page title	Date
Top Blogger list	Blog posts	Recent comment list
Blogger1 Blogger2 Blogger3 Blogger4	Post 1 Title Post 1 description……. # of comments Post 2 Title Post 2 description……. # of comments Post 3 Title Post 3 description……. # of comments Post 4 Title Post 4 description……. # of comments	Comment 1 Comment 2 Comment 3 Comment 4

There are 3 major parts, list of Top bloggers who has the highest
number of posts, show the last 10 blogs per page at the middle and
at the right side, show last 10 comments

Left part blogger list is generated based on this query:

Find number of posts made by each
user
Order them in descending order. Highest number of
posts become Rank 1.
Get the top 10 rows.

You can guess, it’s a complicated query and requires the database
engine to do a lot of running here and there on comments table,
post table, blog table and user table. So, when we run this query
on every visit to homepage, it makes the entire server pretty slow.
There are 10 to 15 requests per second on the homepage which makes
the database engine run this query 10 to 15 times per second!
That’s around 864,000 per day! It’s a miracle that the server is
still alive don’t you think? Now think about the scenario, when
does bloggers’ rank change? When a blogger makes a post. So, how
often does a blogger make a post? Say twice or thrice per day. So,
why do we run this query 10 times per second instead of twice or
thrice per day? Now that is the right question, Neo.

Remember this, the frequency of changing data is much less than the
frequency of reading data.

So, what does this mean? INSERT, UPDATE and DELETE happens much
less than SELECT. Normally SELECT happens 100s or 1000s times more
than INSERT/UPDATE/DELETE. So, we do not need to run the same
SELECT again and again when the underlying rows are not changing at
all. We can SELECT once, and store the result somewhere so that
when we SELECT the same thing again and the data in the table has
not changed, we can serve the same result from memory instead of
going to the database.

So, here’s the plan, we run the complicated query which takes some
time to find out the top 10 bloggers and then get the result. Then
we store the result either in memory or in some text file in the
web server. The homepage will ALWAYS read from that text file and
render the list. It will never go to the database and fetch the
list. So, this already saves the database from running a
complicated query and reduces the execution time of the
homepage.

The same thing can be done for the “Recent Comments” list. This
list gets updated only when a new comment is posted. So, it can
also be cached and stored either in memory or in a text file.

Making these 2 most expensive queries cached, we can save 40% of
the execution time of the home page. Now that’s some
improvement!

Now comes the most interesting part, the center part which shows 10
posts at a time. Post table is a pretty big table. It can have
millions of rows within couple of months. So, doing queries on this
table will become pretty expensive unless you do the right
indexing. Even indexing won’t help you much if there’s too much
traffic on your site. So, we need to do intelligent caching
here.

We apply the principle of updating cache only when something change
in this place too. The post list shows recent posts in paging mode
which shows 10 posts at a time. Normally people see the top 10
posts every time they visit the homepage. So, that’s something we
need to cache seriously. Also, this cache gets refreshed frequently
too. May be once every 20 seconds when someone adds or edits a
post. So, the lifetime of this cache is pretty short also.

But if you cache top 100 posts in the cache, you can provide cached
content to a user for up to 10 clicks on the ‘Next 10 posts’ link.
Normally 80% users will click ‘Next 10 posts’ once, 60% will click
twice, 40% will click thrice. Some persistent ones will click it 10
times but that’s pretty rare. So, you can get around by caching top
100 or 200 posts easily. This saves you from doing a SELECT on the
large “Post” table and then ordering it by “PostDate” and then
selecting the top 10. You do it only once when you build the
“Top100Posts” cache table or text file and store the data in the
table in sorted order. This way, when paging occurs, you only
select Top 10 rows from Nth position from the “Top100Posts” table
or the text file. This happens blazingly fast compared to doing
this on the actual “Post” table.

Optimization Step 4 – Page output rendering

Here’s a nice formula you can use for caching:

Output = code(a) + code(b)

Where, code(a) = db(a) + db(b)

code(b) = db(c) + db(d)

This means, some code uses some part of the database and some other
part of the code uses some other part of the database. The final
output that user sees is from both codes’ combined hard
effort.

So, if db(a) and db(b) gets cached, the output of code(a) can also
be cached. Their is no way result of code(a) is going to change
until either db(a) or db(b) changes. Think about the blog example.
If top 10 bloggers are calculated by code(a) and it uses two
database operations to prepare the list db(a) and db(b), then until
db(a) or db(b) returns a different value, the result of code(a) is
never going to change. So, when we cache it, the equation
becomes:

x = code(a) = db(a) + db(b)

So, the formular is: output = x + code(b)

Let’s say code(b) generates the ‘Recent comments’ list. If that is
also cached then the formula becomes:

y = code(b) = db(c) + db(d)

So, the formula is: output, z = x + y

Now you see, the right side is entirely cached. So, until either x
or y changes, z is always the same. That means, if you and I hit
the home page of the blog site, we will see the exact output until
someone posts a blog or comment.

Now think, why do we even need to execute code(a) and code(b) and
generate the output at all? If we do it once in a while and store
the entire output in a static html file, then we can just deliver
that static html file to the browser. Is there any need to execute
.php code, or .aspx code at all for the homepage? The entire
homepage can become just a static html file which web server can
cache it in memory. Thus the entire page can be served from memory
without requiring a single IO operation or database call!

Conclusion

We started with super high CPU and IO usage, thousands of database
calls, super slow web site and came down to zero database call,
zero IO usage and a super fast homepage.

Reply to all emails from unique sender in an Outlook folder

You have a folder full of mails from your
friends. Now you want to reply to each of them sending some fixed
message. One person will get only one reply no matter how many
mails are there from that user. Here’s how you can do
this:

First write an email in Text
Format.
Save the email. It will go
to “Drafts” folder. Now drag that email to the folder which
contains all the emails you want to reply to.
Now press ALT+F11 and paste
the macro at the end of the post.
Hit F5 and wait for a long
time. You will see, Outbox has all the reply
emails.

The macro runs through each and every email
and then checks if the sender is already replied to. If not replied
to, then it gets the body of the message and then makes a reply by
combining your message with the body of the message.

The benefit of this approach is you are
replying to mails sent from someone which has low probability of
getting filtered out by spam filters.


Sub ReplytoALLEmail()

    Dim objFolder As Folder



    Dim objTemplate As MailItem

    Set objTemplate =
Application.ActiveExplorer.Selection.Item(1)



    Set objFolder = objTemplate.Parent



    Dim objItem As MailItem

    Dim objReply As MailItem



    Dim dic As New Dictionary



    Dim strEmail As String



    Dim strEmails As String



    Dim strBody As String

    strBody = objTemplate.Body



    For Each objItem In objFolder.Items

        If Not (objTemplate = objItem) Then



            strEmail = objItem.SenderEmailAddress



            If Not dic.Exists(strEmail) Then

                strEmails = strEmails + strEmail + ";"



                dic.Add strEmail, ""



                Set objReply = objItem.Reply()



                objReply.Body = strBody + vbCrLf + vbCrLf +
objItem.Body



                On Local Error Resume Next

                objReply.Send

            End If

        End If



    Next



    Debug.Print strEmails

End Sub

Client side Page Fragment Output cache, reduce page download time significantly

When you have a page full of lots of HTML, it will be best if
the whole page can be cached on the browser. You can do this using
HTTP Response caching header either by injecting them manually or
by using @OutputCache tag directive on ASPX pages.

<% @ OutputCache

Location

=”Client”

Duration

=”86400″

VaryByParam

=”*”

VaryByHeader

=”*”

%>

But if part of the page is dynamic and part of the page is
static (like header, left side menu, foother, right side menu,
bottom part) etc where static parts of the page occupy a
significant amount of html, then if you could cache those parts on
the browser, you could save a lot of bytes that gets downloaded
every time the page downloads. On most of the websites, you will
find the header, navigation menu, footer, bottom part,
advertisements are mostly static and thus easily cacheable. If the
whole page size is say 50KB, at least 20KB is static and 30KB might
be dynamic. If you can use client side caching of page fragments
(not ASP.NET’s server side page output cache), you can save
40% download time easily.

ASP.NET offers you page fragment caching using @Outputcache
which is good, but that caching is on server side. It cache the
output of user controls and serves them from server side cache. You
cannot eliminate the download of those costly bytes. It just saves
some CPU process. Nothing much for users in it.

The only way to cache part of page is by allowing browser to
download those parts separately and making those parts cacheable
just like images/CSS/javascripts get cached. So, we need to
download page fragments separately and make them cached on the
browser’s cache. IFRAME is an easy way to do this but IFRAME
makes the page heavy and thus not follow CSS of the page. There are
many reasons why IFRAME can’t work. So, we have a better way,
we will use Javascript to render content of the page and
javascript will get cached on the browser’s cache.

So, here’s the idea:

We will split the whole page into multiple parts
We will generate page content using Javascript. Each cacheable
part comes from javascript and javascript renders the HTML of
it.
The parts which are cachable gets cached by the browser and
thus never downloaded again (until you want them to be). But those
parts which are non-cachable and highly dynamic, does not get
cached by browser.

So, let’s think of a page setup like this:

Logo

Header

Left navigation Menu

Dynamic part of the page

Footer

Here only one part is dynamic and the rest is fully cacheable.
So, the Default.aspx which renders this whole page looks like
this:

<%@PageLanguage="VB"AutoEventWireup="false"%>

<%@OutputCacheNoStore="true"Location="None"%>

DOCTYPEhtmlPUBLIC"-//W3C//DTDXHTML 1.0 Transitional//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<htmlxmlns="http://www.w3.org/1999/xhtml">

<headrunat="server">

    <title>MyBig Fat Pagetitle>

head>

<body>

<formid="form1"runat="server">

<tablewidth="100%"border="1">

<tr>

<td>Somelogo heretd>

<td><scriptid="Script1"src="Header.aspx"type="text/javascript">script>td>

tr>

<tr>

<td><scriptid="LeftMenu"src="LeftMenu.aspx"type="text/javascript">script>td>

<tdbgcolor="lightgrey"><div>

This is the dynamic part which gets changed on every load. Checkout the time when

it was generated: <%=DateTime.Now %>div>td>

tr>

<tr>

<tdcolspan="2"><scriptid="Footer"src="Footer.aspx"type="text/javascript">script>td>

tr>

table>

form>

body>

html>

The page looks like this:

You see, the cached parts are 30 mins older. Browser has not
downloaded those parts at all and thus saved a significant amount
of data transfer. The only part that was downloaded was the dynamic
part.

When you load the page first time, all 4 files are downloaded.
But the last 3 files get cached and never downloaded until
browser’s cache expires. So, on second visit, only one file
downloaded and thus saves a significant amount of data
transfer.

Let’s look at one of the files Header.aspx which gets
cached. Nothing fancy here, it’s a regular ASPX page:

The interesting thing here is the “ContentType”
which I have set to “text/html/javascript”. This is not
something built-in, I have introduced this type.

When you put an ASPX inside a Script tag, it surely does not
work because < script
id
=”Script1″
src
=”Header.aspx”
type
=”text/javascript”>
expects javascript output, not html output. If html output is
provided, browsers simply ignores it. So, we need to convert the
output of Header.aspx into Javascript which when downloaded and
executed by the browser, emits the original html that was generated
when ASP.NET executed the page.

We use HTTP Module to intercept all .aspx calls and when the
page is about to be written to the output, we check if the content
type is “text/html/javascript”. If it is, this is our
cue to convert the page output to javascript representation.

If you want to know details about HTTP Module and how to use
Response Filter to modify page output, please read this wonderful
article:

http://www.aspnetresources.com/articles/HttpFilters.aspx

It really explains all the things. I would recommend you read
this article first and then continue with the rest.

We have made a response filter named Html2JSPageFilter.js
(available in the code download), which overrides the Write method
of Stream and converts the entire HTML of the page to javascript
representation:

    publicoverridevoidWrite(byte[]buffer, intoffset, intcount)

        stringstrBuffer = System.Text.UTF8Encoding.UTF8.GetString(buffer, offset, count);

        //---------------------------------

        //Wait for the closing  tag

        //---------------------------------

        Regexeof = newRegex("",RegexOptions.IgnoreCase);

        if(!eof.IsMatch (strBuffer))

            responseHtml.Append (strBuffer);

        else

            responseHtml.Append (strBuffer);

            string finalHtml = responseHtml.ToString ();

            //extract only the content inside the form tag tag ASP.NET generatesin all .aspx

            intformTagStart = finalHtml.IndexOf(");

            intformTagStartEnd = finalHtml.IndexOf('>',formTagStart);

            intformTagEnd = finalHtml.LastIndexOf("");

            stringpageContentInsideFormTag = finalHtml.Substring(formTagStartEnd + 1,formTagEnd - formTagStartEnd - 1);

First we get the entire page output, then we get only what is
inside the

tag that ASP.NET generates for all .aspx
pages.

Next step is to remove the viewstate hidden field because this
will conflict with the view state on the default.aspx.

            //Remove the __VIEWSTATE tag because page fragments don't needviewstate

            //Note this will make all ASP.NET controls in the page fragments gomad which

            //needs viewstate to do their work.

            Regexre = newRegex("()",RegexOptions.IgnoreCase);

            pageContentInsideFormTag =re.Replace(pageContentInsideFormTag, string.Empty);

Now we convert the entire html output to javascript string
format:

            ///Convert the HTML to javascript string representation

            stringjavascript2Html =

                pageContentInsideFormTag.Replace("r","")

                .Replace("n","")

                .Replace("   ","")

                .Replace(" ","")

                .Replace("  ","")

                .Replace("","")

                .Replace("'","'");

Final touch is to put that javascript string inside a
“document.write(‘…’);” call. When you
call document.write to emit html, it gets part of the page
html:

            //Generate the document.write('...') which adds the content in thedocument

            stringpageOutput = "document.write('"+ javascript2Html + "');";

This is basically the trick. Use a Response filter to get the
.aspx output, and then convert it to Javascript representation.

For convenience, I have used a HttpModule to hook into ASP.NET
pipeline and wait for .aspx files which try to emit content type of
“text/html/javascript”. Again this content type is
nothing special, you could use “text/Omar Al
Zabir”.

    voidIHttpModule.Init(HttpApplicationcontext)

        context.ReleaseRequestState += newEventHandler(InstallResponseFilter);

    privatevoidInstallResponseFilter(objectsender, EventArgse)

     HttpResponseresponse = HttpContext.Current.Response;

     if(response.ContentType == "text/html/javascript")

         response.ContentType = "text/javascript";

         response.Filter = newHtml2JSPageFilter(response.Filter);

And finally in web.config, we have to register the HttpModule so
that it gets called:

        <httpModules>

            <addname="Html2JSModule"type="Html2JavascriptModule"/>

        httpModules>

The entire source code is available in this URL:

Download Source
code of: Client side Page Fragment Output cache, reduce page
download time significantly

Enjoy. Use this approach in your aspx and html files and save
significant amount of download time on users end. Although it
slightly increases first time visit download time (200+ms for each
script tag), but it makes second time visit a breeze. See the
performance difference yourself. First visit www.pageflakes.com. Then close your
browser, open it again and enter www.pageflakes.com. See how fast it
loads. If you use a HTTP debugger to monitor how much data is
transferred, you will see it’s only 200 bytes!

IIS 6 Compression – quickest and effective way to do it for ASP.NET compression

IIS 6 has builtin gzip compression ability which can compress
output of dynamic webpages (.aspx) and webservices (.asmx). The
compression is really good and can easily reduce 60% download time.
You should always turn this feature on in your production server.
The CPU usage is not that high compared to the reduction of
download time for users. Your users will love the significant
download time reduction when you turn it on.

Now, on internet you will find a lot of solutions. I tried all
of them which appears in first 30 Google search results. But failed
to make any of them work properly. Finally I was able to make it
work, but I realized you have to do it in a very specific way and
in specific order. Here it goes:

Go to IIS Manager from Administrative Tools
Right click on your computer name (not on websites or Default
Web Site)
Choose All Tasks-> Restart IIS
From the drop down, choose “Stop IIS” and click OK.
IIS is not stopped. Make sure it’s not still running
Now go to C:WINDOWSSYSTEM32INETSRV
Make a copy of the file Metabase.xml. This is a dangerous
file, don’t play around with it. Make sure you have a backup before
you do what I am going to tell you now.
Open the metabase.xml in Notepad. Don’t use any other editor
besides Notepad, Notepad2 or Visual Studio.
Find using “IIsCompressionScheme”
You will find a match which looks like this:
“<IIsCompressionScheme Location=”/LM/W3SVC/Filters/Compression/deflate”

There are two nodes named IIsCompressionScheme and one node with plural IIsCompressionSchemes.
Delete these nodes.
Once you have deleted the 3 nodes, paste the text from this
link in in their position:

http://tinypaste.com/630fc

Now start IIS and hit your site once. When it runs for the
first time, it will send uncompressed output, but it will compress
it behind the scene. So, next hit will give you the compressed
output.

Go to www.pipeboost.com
and enter the URL to ensure you are getting compressed content.
Before you do so, make sure you have visited your site for a while
in your local browser so that the pages got the chance to get
themselves compressed.

Blogging from InfoPath

I’m trying to Blog from InfoPath. There’s a fantastic InfoPath
template available here:

http://gqwu.members.winisp.net/blogeditor/

I have tried Word 2007’s new blogging feature. It works
fine for posts without code. But if you insert code, the code looks
super horrible when the blog gets published. Even the “Copy as
HTML” can’t give you good output. Also no image upload
support.

I am thinking what if there was a nice addin for MS Word which
can make a document publish as blog by automatically uploading
pictures in it and nicely formatting code blocks, it could make us
so productive. Mankind has invented so many great tools, but a
really good Blogging tool that works for all is still
missing.

Send a message individually to all recipients using Outlook

Say you have written a message which has lots of users in the
recipient list. Now you don’t want to send the mail to all those
users who can see each others email address. You want to send the
message one by one. Here’s what you do:

After composing the message save it to make it go to Drafts
folder.
Select the message
Press ALT+F11 to bring the Visual Basic Editor.
Now paste the following code:

Sub SendIndividually()

Dim objItem As MailItem
Set objItem = Application.ActiveExplorer.Selection.Item(1)

Dim objTo As Recipient
Dim objClonedMail As MailItem

On Local Error Resume Next

For Each objTo In objItem.Recipients

Set objClonedMail = objItem.Copy()
objClonedMail.To = “”
objClonedMail.Recipients.Add objTo
objClonedMail.Send

End Sub

Hit F5 and wait for a while. You will see Outbox is full of
copy of the message with individual recipient in the “To”

Read my previous post on how to generate a message with all the
sender’s email address from all the message in a folder. You can
combine both approach to send everyone important notice.

Get email address of all users from all mails in Outlook Folder

Sometimes you want to send some important notice to everyone who
has ever mailed you. Let’s say you have a folder named “Friends” in
Outlook where you store all the emails from your friends. Now you
want to get all of their email addresses. Pretty difficult work if
you have thousands of such mails. Here’s an easy way.

Select the folder in Outlook and press ALT+F11. It will open
Visual Basic Editor.
Double click on ThisOutlookSession from the Project tree.
Paste the following function:

Sub GetALLEmailAddresses()

Dim objFolder As Folder
Set objFolder = Application.ActiveExplorer.Selection

Dim dic As New Dictionary
Dim strEmail As String
Dim strEmails As String

Dim objItem As MailItem
For Each objItem In objFolder.Items

strEmail = objItem.SenderEmailAddress
If Not dic.Exists(strEmail) Then
strEmails = strEmails + strEmail + “;”
dic.Add strEmail, “”
End If

Debug.Print strEmails
End Sub

Hit F5 and it will run for a while. Then press Ctrl+G. You will
see the email addresses in the “Immediate
Window”.

Copy the whole string and you have all the email addresses from
all the emails in the selected Outlook folder. There will be no
duplicate address in the list.