My first book – Building a Web 2.0 Portal with ASP.NET 3.5

My first book “Building a Web 2.0 Portal with ASP.NET 3.5” from
O’Reilly is published and available in the stores. This book
explains in detail the architecture design, development, test,
deployment, performance and scalability challenges of my open
source web portal Dropthings.com. Dropthings is a prototype of a web
portal similar to iGoogle or Pageflakes. But this portal is developed using
recently released brand new technologies like ASP.NET 3.5, C# 3.0,
Linq to Sql, Linq to XML, and Windows Workflow foundation. It makes
heavy use of ASP.NET AJAX 1.0. Throughout my career I have built
several state-of-the-art personal, educational, enterprise and mass consumer web
portals
. This book collects my experience in building all of
those portals.

O’Reilly Website:
http://www.oreilly.com/catalog/9780596510503/

Amazon:

http://www.amazon.com/Building-Web-2-0-Portal-ASP-NET/dp/0596510500

Disclaimer: This book does not show you how to build Pageflakes.
Dropthings is entirely different in terms of architecture,
implementation and the technologies involved.

You learn how to:

  • Implement a highly decoupled architecture following the popular
    n-tier, widget-based application model
  • Provide drag-and-drop functionality, and use ASP.NET 3.5 to
    build the server-side part of the web layer
  • Use LINQ to build the data access layer, and Windows Workflow
    Foundation to build the business layer as a collection of
    workflows
  • Build client-side widgets using JavaScript for faster
    performance and better caching
  • Get maximum performance out of the ASP.NET AJAX Framework for
    faster, more dynamic, and scalable sites
  • Build a custom web service call handler to overcome
    shortcomings in ASP.NET AJAX 1.0 for asynchronous, transactional,
    cache-friendly web services
  • Overcome JavaScript performance problems, and help the user
    interface load faster and be more responsive
  • Solve various scalability and security problems as your site
    grows from hundreds to millions of users
  • Deploy and run a high-volume production site while solving
    software, hardware, hosting, and Internet infrastructure
    problems

If you’re ready to build state-of-the art, high-volume web
applications that can withstand millions of hits per day, this book
has exactly what you need.

Making best use of cache for high performance website

Use URLs consistently

Browsers cache content based on the URL. When URL changes,
browser fetches a new version from origin server. URL can be
changed by changing the query string parameters. For example,
“/default.aspx” is cached on the browser. If you
request “/default.aspx?123” it will fetch new content
from server. Response from the new URL can also be cached in
browser if you return proper caching headers. In that case,
changing the query parameter to something else like
“/default.aspx?456” will return new content from
server. So, you need to make sure you use URL consistently
everywhere when you want to get cached response. From homepage, if
you have requested a file with URL “/welcome.gif”, make
sure from another page you request the same file using the same
URL. One common mistake is to sometimes omit the “www”
subdomain from the url. www.pageflakes.com/default.aspx
is not same as pageflakes.com/default.aspx.
Both will be cached separately.

Cache static content for longer period

Static files can be cached for longer period like one month. If
you are thinking that you should cache for couple of days so that
when you change the file, users will pick it up sooner,
you’re mistaken. If you update a file which was cached by
Expires header, new users will immediately get the new file while
old users will see the old content until it expires on their
browser. So, as long as you are using Expires header to cache
static files, you should use as high value as possible.

For example, if you have set expires header to cache a file for
three days, one user will get the file today and store it in cache
for next three days. Another user will get the file tomorrow and
cache it for three days after tomorrow. If you change the file on
the day after tomorrow, the first user will see it on fourth day
and the second user will see it on fifth day. So, different users
will see different versions of the file. As a result, it does not
help setting a lower value assuming all users will pick up the
latest file soon. You will have to change the url of the file in
order to ensure everyone gets the exact same file immediately.

You can setup Expires header from static files from IIS Manager.
You’ll learn how to do it in later section.

Use cache friendly folder structure

Store cached content under a common folder. For example, store
all images of your site under the “/static” folder
instead of storing images separately under different subfolders.
This will help you use consistent URL throughout the site because
from anywhere you can use
“/static/images/somefile.gif”. Later on, we will learn
it’s easier to move to a Content Delivery Network when you
have static cacheable files under a common root folder.

Reuse common graphics files

Sometimes we put common graphics files under several virtual
directories so that we can write smaller paths. For example, say
you have indicator.gif in root folder, some subfolders and under
CSS folder. You did it because you need not worry about paths from
different places and you could just use the file name as relative
URL. This does not help in caching. Each copy of the file is cached
in browser separately. So, you should collect all graphics files in
the whole solution and put them under the same root
“static” folder after eliminating duplicates and use
the same URL from all the pages and CSS files.

Change file name when you want to expire
cache

When you want a static file to be changed, don’t just
update the file because it’s already cached in user’s
browser. You need to change the file name and update all references
everywhere so that browser downloads the new file. You can also
store the file names in database or configuration files and use
data binding to generate the URL dynamically. This way you can
change the URL from one place and have the whole site receive the
change immediately.

Use a version number while accessing static files

If you do not want to clutter your static folder with multiple
copies of the same file, you can use query string to differentiate
versions of same file. For example, a GIF can be accessed with a
dummy query string like
“/static/images/indicator.gif?v=1”. When you change the
indicator.gif, you can overwrite the same file and then update all
references to the file to
“/static/images/indicator.gif?v=2”. This way you can
keep changing the same file again and again and just update the
references to access the graphics using the new version number.

Store cacheable files in a different domain

It’s always a good idea to put static contents in a
different domain. First of all, browser can open another two
concurrent connections to download the static files. Another
benefit is that you don’t need to send the cookies to the
static files. When you put the static files on the same domain as
your web application, browser sends all the ASP.NET cookies and all
other cookies that your web application is producing. This makes
the request headers be unnecessarily large and waste bandwidth. You
don’t need to send these cookies to access the static files.
So, if you put the static files in a different domain, those
cookies will not be sent. For example, put your static files in
www.staticcontent.com
domain while your website is running on www.dropthings.com. The other
domain does not need to be a completely different web site. It can
just be an alias and share the same web application path.

SSL is not cached, so minimize SSL use

Any content that is served over SSL is not cached. So, you need
to put static content outside SSL. Moreover, you should try
limiting SSL to only secure pages like Login page or Payment page.
Rest of the site should be outside SSL over regular HTTP. SSL
encrypts request and response and thus puts extra load on server.
Encrypted content is also larger than the original content and thus
takes more bandwidth.

HTTP POST requests are never cached

Cache only happens for HTTP GET requests. HTTP POST requests are
never cached. So, any kind of AJAX call you want to make cacheable,
it needs to be HTTP GET enabled.

Generate Content-Length response header

When you are dynamically serving content via web service calls
or HTTP handlers, make sure you emit Content-Length header.
Browsers have several optimizations for downloading contents faster
when it knows how many bytes to download from the response by
looking at the Content-Length header. Browsers can use
persisted connections more effectively when this header is present.
This saves browser from opening a new connection for each request.
When there’s no Content-Length header, browser doesn’t
know how many bytes it’s going to receive from the server and
thus keeps the connection open as long as it gets bytes delivered
from the server until the connection closes. So, you miss the
benefit of Persisted Connections that can greatly reduce download
time of several small files like css, javascripts, and images.

How to configure static content caching in IIS

In IIS Manager, Web site properties dialog box has “HTTP
Headers” tab where you can define Expires header for all
requests that IIS handles. There you can define whether to expire
content immediately or expire after certain number of days or on a
specific date. The second option (Expire after) uses sliding
expiration, not absolute expiration. This is very useful because it
works per request. Whenever someone requests a static file, IIS
will calculate the expiration date based on the number of
days/months from the Expire after.


clip_image001

For dynamic pages that are served by ASP.NET, a handler can
modify the expires header and override IIS default setting.

10 cool web development related articles in 2007

Here’s a list of 10 cool ASP.NET, AJAX and web development
related articles and blog posts that I have written this year that
you might want to take a look:

13
disasters for production website and their solutions

Talks about 13 production disasters that can happen to any website
any time and bring down your business.

Build
Google IG like Ajax Start Page in 7 days using ASP.NET Ajax and
.NET 3.0

This block buster article shows how ASP.NET AJAX, Linq to XML, Linq
to SQL and Workflow Foundation can be used to create a Google IG
like start page in just 7 nights. Learn how to put together such
hot technologies into one project and make a production site out of
it.


Serve extensionless URL from ASP.NET without using ISAPI module or
IIS 6 Wildcard mapping

Currently there are only two ways to service extentionless URL like
www.pageflakes.com/omar that
hits something besides the default document – use a custom ISAPI
module or use IIS 6 wildcard mapping. Both has performance and
scalability problems because both intercepts each and every hit.
Learn how you can solve it by using a custom 404 handler.


Request format is unrecognized for URL unexpectedly ending in
/SomeWebServiceMethod

Since ASP.NET AJAX 1.0 release, Microsoft prevented JSON hijacking
by adding a special content type header. But this caused us some
trouble.


Cleanup inactive anonymous users from ASP.NET Membership
Tables

When you store anonymous user profile using ASP.NET Membership
provider and Anonymous Identification provider, you soon end up
with lots of idle anonymous user data where those users never come
back. We (Pageflakes) went through a lot of difficulty keeping our
database size down as we allow anonymous users to do almost
everything that a registered user can do. This introduces
scalability challenge. See how we solved this problem.


Prevent Denial of Service (DOS) attacks in your web
application

Web applications can be brought down to its knees by hitting the
site repeatedly or by calling expensive webservices randomly.
Anyone can write a simple loop that hits a webserver very
frequently from a high bandwidth connectivity and bring your
production server down. See how to prevent such application level
DOS attacks.


ASP.NET Ajax Extender for multi-column widget drag &
drop

It’s an ASP.NET AJAX extender that allows Pageflakes style drag
& drop functionality between columns and rows.


ASP.NET Ajax in-depth performance analysis

While building an open source start page using ASP.NET AJAX, I have
done a lot of performance analysis on AJAX framework in order to
improve first time load and perceived speed of javascript rich
pages. Check out my analysis.


Think you know how to write UPDATE statement? Think again.

Learn how to optimize common UPDATE statements


Make a surveillance application which captures desktop and then
emails you as attachment

Some time back I needed to capture a certain computers desktop in
order to find out what that user is doing every day. So, I made a
.NET 2.0 Winforms Application which stays on system tray (optional)
and capture the desktop in given time interval (say every 60 secs)
and emailed the captured images to me as message attachment (say
every 30 mins).


Today I received MVP award for the 3rd time on Visual C#. Thanks
to Microsoft for the award and setting up my new blog. I will continue
both my MVPS Blog and this
blog from now on.

A significant part of sql server process memory has been paged out. This may result in performance degradation

If you are using SQL Sever Server standard edition 64 bit on a
Windows 2003 64bit, you will frequently encounter this problem
where SQL Server says:

A significant part of sql server process memory has been paged
out. This may result in performance degradation. Duration 0
seconds. Working set (KB) 25432, committed (KB) 11296912, memory
utilization 0%

The number in working set and duration will vary. What happens
here is SQL Server is forced to release memory to operating system
because some other application or OS itself needs to allocate
RAM.

We went through many support articles like:

  • 918483:
    How to reduce paging of buffer pool memory in the 64-bit version of
    SQL Server 2005
  • 905865:
    The sizes of the working sets of all the processes in a console
    session may be trimmed when you use Terminal Services to log on to
    or log off from a computer that is running Windows Server 2003
  • 920739:
    You may experience a decrease in overall system performance when
    you are copying files that are larger than approximately 500 MB in
    Windows Server 2003 Service Pack 1

But nothing solved the problem. We still have the page out
problem happening every day.

The server has 16 GB RAM where 12 GB is maximum limit allocated
to SQL Server. 4 GB is left to OS and and other application. We
have also turned off antivirus and any large backup job. 12 GB RAM
should be plenty because there’s no other app running on the
dedicated SQL Server box. But the page out still happens. When this
happens, SQL Server becomes very slow. Queries timeout, website
throws error, transactions abort. Sometimes this problems goes on
for 30 to 40 minutes and website becomes slow/unresponsive during
that time.

I have found what causes SQL Server to page out. File System
cache somehow gets really high and forces SQL Server to trim
down.

clip_image002

You see the System cache resident bytes are very high. During
this time SQL Server gets much less RAM than it needs. Queries
timeout at very high rate like 15 per sec. Moreover, there’s high
SQL Lock Timeout/sec (around 15/sec not captured in screen
shot).

clip_image004

SQL Server max memory is configured 12 GB. But here it shows
it’s getting less than 8 GB.

While the file system cache is really high, there’s no
process that’s taking significant RAM.

clip_image006

After I used SysInternal’s
CacheSet
to reset file system cache and set around 500 MB as
max limit, memory started to free up.

clip_image008

SQL Server started to see more RAM free:

clip_image010

Then I hit the “Clear” button to clear file system
cache and it came down dramatically.

clip_image012

Paging stopped. System cache was around 175 MB only. SQL Server
lock timeout came back to zero. Everything went back to normal.

So, I believe there’s either some faulty driver or the OS itself
is leaking file system cache in 64bit environment.

What we have done is, we have a dedicated person who goes to
production database servers every hour, runs the CacheSet program
and clicks “Clear” button. This clears the file system cache and
prevents it from growing too high.

There are lots of articles written about this problem. However,
the most informative one I have found is from the SQL Server PSS
team:


http://blogs.msdn.com/psssql/archive/2007/05/31/the-sql-server-working-set-message.aspx

UPDATE – THE FINAL SOLUTION!

The final solution is to run this program on Windows
Startup:

SetSystemFileCacheSize 128 256

This sets the lower and higher limit for the System Cache. You
need to run this on every windows startup because a restart will
undo the cache setting to unlimited.

You can run the program without any parameter to see what is the
current setting.

Download the program from this page:

http://www.uwe-sieber.de/ntcacheset_e.html

Go to the end and you will get the link to the
SetSystemFileCacheSize.zip

How to setup SQL Server 2005 Transaction Log Ship on large database that really works

I tried a lot of combinations in my life in order to find out an
effective method for implementing Transaction Log Shipping between
servers which are in a workgroup, not under domain. I realized the
things you learn from article and books are for small and medium
sized databases. When you database become 10 GB or bigger, thing’s
become a lot harder than it looks. Additionally many things changed
in SQL Server 2005. So, it’s even more difficult to configure log
shipping properly nowadays.

Here’s the steps that I finally found that works. Let’s assume
there are 2 servers with SQL Server 2005. Make sure both servers
have latest SP. There’s Service Pack 1 released already.

1. Create a new user Account named “SyncAccount” on both
computers. Use the exact same user name and password.

2. Make sure File Sharing is enabled on the local area
connection between the server. Also enable file sharing in
Firewall.

3. Make sure the local network connection is not regular LAN. It
must be a gigabit card with near zero data corruption. Both cable
and switch needs to be perfect. If possible, connect both servers
using Fibre optic cable directly on the NIC in order to avoid a
separate Switch.

4. Now create a folder named “TranLogs” on both servers. Let’s
assume the folder is on E:Tranlogs.

5. On Primary Database server, share the folder “Tranlogs” and
allow SyncAccount “Full Access” to it. Then allow SyncAccount
FullAccess on TranLogs folder. So you are setting the same
permission from both “Sharing” tab and from “Security” tab.

6. On Secondary database server, allow SyncAccount “Full Access”
right on TranLogs folder. No need to share it.

7. Test whether SyncAccount can really connect between the
servers. On Secondary Server, go to Command Prompt and do this:

8. 

9. Now you have a command prompt which is running with
SyncAccount privilege. Let’s confirm the account can read and write
on “TranLog” shares on both servers.

10. 

11. This is exactly what SQL Agent will be doing during log
ship. It will copy log files from primary server’s network share to
it’s own log file folder. So, the SyncAccount needs to be able to
both read files from primary server’s network share and write onto
its own tranlogs folder. The above test verifies the result.

12. This is something new in SQL Server 2005: Add SyncAccount in
SQLServer Agent group “SqlServer2005SqlAgentUser….”. You will
find this Windows User Group after installing SQL Server 2005.

13. Now go to Control Panel->Administrative
Tools->Services and find the SQL Server Agent service. Go to its
properties and set SyncAccount as the account on the Logon tab.
Restart the service. Do this on both servers.

14. 

15. I use sa account to configure the log shipping. So, do this
on both servers:

a. Enable “sa” account. By default, sa is disabled in SQL Server
2005.

b. On “sa” account turn off Password Expiration Policy. This
prevents sa password from expiring automatically.

16. On Secondary server, you need to allow remote connections.
By default, SQL Server 2005 disables TCP/IP connection. As a
result, you cannot login to the server from another server. Launch
the Surface Area Configuration tool from Start->Programs->MS
SQL Server 2005 and go to “Remote Connection” section. Choose the
3rd option which allows both TCP/IP based remote connection and
local named pipe based connections.

17. On Secondary Server firewall, open port 1433 so that primary
server can connect to it.

18. Restart SQL Server. Yes, you need to restart SQL Server.

18. On Primary server, go to Database properties->Options and
set Recovery Model to “Full”. If it was already set to full before,
it will be wise to first set it to Simple, then shrink the
transaction log file and then make it “Full” again. This will
truncate the transaction log file for sure.

19. Now take a Full Backup of the database. During backup, make
sure you put the backup file on a physically separate hard drive
than the drive where MDF is located. Remember, not different
logical drives, different physical drives. So, you should have at
least 2 hard drives on the server. During backup, SQL Server reads
from MDF and writes on the backup file. So, if both MDF and the
backup is done on the same hard drive, it’s going to take more than
double the time to backup the database. It will also keep the
Disk fully occupied and server will become very slow.

20. After backup done, RAR the database. This ensures when you
copy the database to the other server there’s no data corruption
while the file was being transferred. If you fail to unRAR the file
on the secondary server, you get assurance that there’s some
problem on the network and you must replace network infrastructure.
The RAR also should be done on a separate hard drive than the one
where the RAR is located. Same reason, read is on one drive and
write is on another drive. Better if you can directly RAR to the
destination server using network share. It has two benefits:

a. Your server’s IO is saved. There’s no write, only read.

b. Both RAR and network copy is done in one step.

21. 

22. By the time you are done with the backup, RAR, copy over
network, restore on the other server, the Transaction Log file
(LDF) on the primary database server might become very big. For us,
it becomes around 2 to 3 GB. So, we have to manually take a
transaction log backup and ship to the secondary server before we
configure Transaction Log Shipping.

23. 

24. When you are done with copying the transaction log backup to
the second server, first restore the Full Backup on the secondary
server:

25. 

26. But before restoring, go to Options tab and choose RESTORE
WITH STANDBY:

27. 

28. When the full backup is restored, restore the transaction
log backup.

29. REMEMBER: go to options tab and set the Recovery State to
“RESTORE WITH STANDBY” before you hit the OK button.

30. This generally takes a long time. Too long in fact. Every
time I do the manual full backup, rar, copy, unrar, restore, the
Transaction Log (LDF) file becomes 2 to 3 GB. As a result, it takes
a long time to do a transaction log backup, copy and restore and it
takes more than an hour to restore it. So, within this time, the
log file on the primary server again becomes large. As a result,
when log shipping starts, the first log ship is huge. So, you need
to plan this carefully and do it only when you have least amount of
traffic.

31. I usually have to do this manual Transaction Log backup
twice. First one is around 3 GB. Second one is around 500 MB.

32. Now you have a database on the secondary server ready to be
configured for Log shipping.

33. Go to Primary Server, select the Database, right click
“Tasks” -> “Shrik”. Shrink the Log File.

34. Go to Primary server, bring on Database options, go to
Transaction Log option and enable log shipping.

35. 

36. Now configure the backup settings line this:

37. 

38. Remember, the first path is the network path that we tested
from command prompt on the secondary server. The second path is the
local hard drive folder on the primary server which is shared and
accessible from the network path.

39. Add a secondary server. This is the server where you have
restored the database backup

40. 

41. Choose “No, the secondary database is initialized” because
we have already restored the database.

42. Go to second tab “Copy Files” and enter the path on the
secondary server where log files will be copied to. Note: The
secondary server will fetch the log files from the primary server
network share to it’s local folder. So, the path you specify is on
the secondary server. Do not get confused from the picture below
that’s it’s the same path as primary server. I just have same
folder configuration on all servers. It can be D:tranlogs if you
have the tranlogs folder on D: drive on secondary server.

43. 

44. On third tab, “Restore Transaction Log” configure it as
following:

45. 

46. It is very important to choose “Disconnect users in
database…”. If you don’t do this and by any chance
Management Studio is open on the database on secondary server, log
shipping will keep on failing. So, force disconnect of all users
when database backup is being restored.

47. Setup a Monitor Server which will automatically take care of
making secondary server the primary server when your primary server
will crash.

48. 

49. In the end, the transaction log shipping configuration
window should look like this:

50. 

51. When you press OK, you will see this:

52. Do not be happy at all if you see everything shows
“Success”. Even if you did all the paths, and settings wrong, you
will still see it as successful. Login to the secondary server, go
to SQL Agents->Jobs and find the Log Ship restore job. If the
job is not there, your configuration was wrong. If it’s there,
right click and select “View History”. Wait for 15 mins to have one
log ship done. Then refresh and see the list. If you see all OK,
then it is really ok. If not, then there are two possibilities:

a. See if the Log Ship Copy job failed or not. If it fails, then
you entered incorrect path. There can be one of the following
problem:

  1. The network location on primary server is wrong
  2. The local folder was specified wrong
  3. You did not set SyncAccount as the account which runs SQL Agent
    or you did but forgot to restart the service.

b. If restore fails, then the problems can be one of the
following:

i. SyncAccount is not a valid login in SQL Server. From SQL
Server Management Studio, add SyncAccount as a user.

ii. You forgot to restore the database on secondary server as
Standby.

iii. You probably took some manual transaction log backup on the
primary server in the meantime. As a result, the backup that log
shipping took was not the right sequence.

53. If everything’s ok, you will see this: