Atlas 7: Caching web service response on browser and save bandwidth significantly

Browser can cache images, javascripts, css files on users hard
drive and it can also cache Xml Http calls if the call is a Http
Get. The cache is based on Url. If it’s the same Url and it’s
cached on the computer then the response is loaded from cache, not
from the server when it is requested again. Basically, browser can
cache any Http Get call and return cached data based on Url. If you
make a Xml Http call as Http GET and server returns some special
header which informs the browser to cache the response; on future
calls, the response will be immediately returned from the cache and
thus save the delay of network roundtrip and download time.

At Pageflakes, we cache
user’s state so that when user visits again the following day, user
gets a cached page which loads instantly from browser cache, not
from the server. Thus second time load becomes very fast. We also
cache several small parts of the page which appears on users
action. When user does the same action again, a cached result is
loaded immediately from local cache and thus saves the network
roundtrip time. User gets a fast loading site and a very responsive
site. The perceived speed increases dramatically.

The idea is to make Http Get calls while making Atlas web
service calls and return some specific Http Response headers which
tells the browser to cache the response for some specific duration.
If you return “Expires” header during the response, browser will
cache the Xml Http response. There are 2 headers that you need to
return with response which will instruct browser to cache the
response:

HTTP/ 1.1 200
OK Expires: Fri , 1
Jan 2030 Cache-Control: public

This will instruct browser to cache the response till Jan 2030.
As long as you make the same Xml Http call with the same
parameters, you will get cached response from the computer and no
call will go to the server. There are more advanced ways to get
further control over response caching. For example, here is a
header which will instruct browser to cache for 60 seconds but do
contact server and get a fresh response after 60 seconds. It will
also prevent proxies from returning cached response when browser
local cache expires after 60 seconds.

HTTP/ 1.1 200
OK Cache-Control: private
, must-revalidate , proxy-revalidate , max-age = 60

Let’s try to produce such response headers from ASP.NET web
service call:

This will result in the following response headers:

Expires header is set properly. But the problem is with
Cache-Control. It is showing “max-age” is set to zero which will
prevent browser from doing any kind of caching. If you seriously
want to prevent caching, you should emit such cache-control header.
Looks like exactly the opposite thing happened.

There’s a bug in ASP.NET 2.0 that you cannot change “max-age”
header. As max-age is set to zero, ASP.NET 2.0 sets Cache-Control
to private because max-age = 0 means no cache needed. So, there’s
no way you can make ASP.NET 2.0 return proper headers which caches
the response.

Time for a hack. After decompiling the code of HttpCachePolicy
class (Context.Response.Cache object’s class), I found the
following code:

Somehow, this._maxAge is getting set to zero and the check: “if
(!this._isMaxAgeSet || (delta < this._maxAge))" is preventing it from getting set to a bigger value. Due to this problem, we need to bypass the SetMaxAge function and set the value of the _maxAge field directly using Reflection.

This will return the following headers:

Now max-age is set to 60 and thus browser will cache the
response for 60 seconds. If you make the same call again within 60
seconds, it will return the same response. Here’s a test output
which shows the date time returned from the server:

The client side code is like this:

function cachedHttpGet() { WebService.CachedGet( {
useGetMethod:
true
, onMethodComplete:
function (result) { debug.dump(result); } } );
}

Here you see, the response is cached for 60 seconds and after
the time elapsed, there was a server call made and new date was
returned. That response was again cached for 60 seconds.

Atlas 6: When ‘this’ is not really ‘this’

Atlas callbacks are not executed on the same context where they
are called. For ex, if you are making a Page method call from a
javascript class like this:

function SampleClass() { this .id =
1 ; this
.call = function () {
PageMethods.DoSomething(
Hi
, function (result) { debug.dump( this .id ); }
); } }

What happens when you call the “call” method? Do you get “1” on
the debug console? No, you get “null” on the debug console because
“this” is no longer the instance of the class. This is a common
mistake everyone makes. As this is not yet documented in Atlas
documentations, I have seen many developers spend time finding out
what’s wrong.

Here’s the reason. We know whenever Javascript events are raised
“this” refers to the html element which produced the event. So, if
you do this:

function SampleClass() { this .id =
1 ; this
.call = function () {
PageMethods.DoSomething(
Hi
, function (result) { debug.dump( this .id ); }
); } }

If you click the button, you see “ButtonID” instead of “1”. The
reason is that, the button is making the call. So, the call is made
within button object’s context and thus “this” maps to the button
object.

Similarly, when Xml Http raises the event onreadystatechanged
which Atlas traps and fires the callback, the code execution is
still on the Xml Http’s context. It’s Xml Http object which raises
the event. As a result, “this” refer to the Xml Http object, not to
your own class where the callback is declared.

In order to make the callback fire on the context of the
instance of the class so that “this” refers to the instance of the
class, you need to make the following change:

function SampleClass() { this .id =
1 ; this
.call = function () {
PageMethods.DoSomething(
Hi
, Function.createDelegate( this ,
function (result) { debug.dump( this .id ); }
) ); } }

Here, the Function.createDelegate is used to create a delegate
which calls the given function under the “this” context.
Function.createDelegate is defined in AtlasRuntime:

Function.createDelegate
= function (instance, method) { return function ()
{
return method.apply(instance, arguments); }
}

Atlas 4: Only 2 calls at a time and don’t expect any order

Browsers make 2 concurrent AJAX calls at a time to a domain. If
you make 5 AJAX calls, browser is going to make 2 calls first, then
wait for any one of them to complete and then make another call
until all remaining 4 calls are complete. Moreover, you cannot
expect calls to execute in the same order as you make the calls.
Here’s why:

Here you see, call 3’s response download is quite big and thus
takes longer than Call 5. So, Call 5 actually gets executed before
Call 3.

So, the world of HTTP is unpredictable.

Atlas 5: Bad calls make good calls timeout

If 2 http calls somehow get stuck for too long, those two bad
calls are going to make some good calls expire too which in the
meantime got queued. Here’s a nice example:

function timeoutTest()

{

PageMethods.Timeout( { timeoutInterval : 3000, onMethodTimeout:
function() { debug.dump(“Call 1 timed out”); } } );

PageMethods.Timeout( { timeoutInterval : 3000, onMethodTimeout:
function() { debug.dump(“Call 2 timed out”); } } );

PageMethods.DoSomething( ‘Call 1’, { timeoutInterval : 3000,
onMethodTimeout: function() { debug.dump(“DoSomething 1 timed
out”); } } );

PageMethods.DoSomething( ‘Call 2’, { timeoutInterval : 3000,
onMethodTimeout: function() { debug.dump(“DoSomething 2 timed
out”); } } );

PageMethods.DoSomething( ‘Call 3’, { timeoutInterval : 3000,
onMethodTimeout: function() { debug.dump(“DoSomething 3 timed
out”); } } );

}

I am calling a method named “Timeout” on the server which does
nothing but to wait for a long time so that the call gets timed
out. After that I am calling a method which does not timeout. But
guess what the output is:

Only one call succeeded “Do Something 1”. Try again and you
might see this:

Now two calls succeeded. So, if at any moment, browser’s two
connections get jammed, then you can expect other waiting calls are
going to timeout also.

In Pageflakes, we used to get nearly 400 to 600 timeout error
reports from users’ browsers. We could never find out how this can
happen. First we suspected slow internet connection. But that
cannot happen for so many users. Then we suspected something is
wrong with the hosting providers network. We did a lot of network
analysis to find out whether there’s any problem on the network.
But we could not detect any. We used SQL Profiler to see whether
there’s any long running query which times out ASP.NET request
execution time. But no luck. We finally discovered that, it mostly
happened due to some bad calls which got stuck and made the good
calls expire too. So, we modified the Atlas Runtime and introduce
automatic retry on it and the problem disappeared completely.
However, this auto retry requires a sophisticated open heart bypass
surgery on Atlas Runtime javascript code which you have to perform
again and again whenever Microsoft releases newer version of Atlas
Runtime. You also can no longer use the
tag which produces Atlas runtime references instead you have to
manually put links to Atlas runtime and compatibility javascript
files. So, you better do auto retry yourself in your own code from
Day 1. On the onMethodTimeout method, just make one retry all the
time to be on the safe side.

Atlas 3: Atlas batch calls are not always faster

Atlas provides you batch
call feature
which combines multiple web service calls into one
call. It works transparently, you won’t notice anything nor do you
need not write any special code. Once you turn on the Batch
feature, all web service calls made within a duration gets batched
into one call. Thus saves roundtrip time and total response
time.

The actual response time might be reduced but the perceived
delay is higher. If 3 web service calls are batched, the 1st call
does not finish first. All 3 calls finish at the same time. If you
are doing some UI updates upon completion of each WS calls, it does
not happen one by one. All of the calls complete in one shot and
then the UI gets updated in one shot. As a result, you do not see
incremental updates on the UI, instead a long delay before the UI
updates. If any of the call, say the 3rd call downloads a lot of
data, user sees nothing happening until all 3 calls complete. So,
the duration of the 1st call becomes nearly the duration of the sum
of all 3 calls. Although actual total duration is reduced, but the
perceived duration is higher. Batch calls are handy when each call
is transmitting small amount of data. Thus 3 small calls gets
executed in one roundtrip.

Let’s work on a scenario where 3 calls are made one by one.
Here’s how the calls actually get executed.

The second call takes a bit time to reach the server because
first call is eating up bandwidth. The same reason it takes longer
to download. Browsers open 2 simultaneous connections to the
server. So at a time, only 2 calls are made. Once the second/first
call completes, the third call is made.

When these 3 calls are batched into one:

Here the total download time is reduced (if IIS compression
enabled) and there’s only one network latency overhead. All 3 calls
get executed on the server in one shot and the combined response is
downloaded in one call. But to the user, the perceived speed is
slower because all the UI update happens after the entire batch
call completes. The total duration the batch call will take to
complete will always be higher than 2 calls. Moreover, if you do a
lot of UI update one after another, Internet Explorer freezes for a
while giving user a bad impression. Sometimes expensive update on
the UI makes the browser screen go blank and white. But Firefox and
Opera does not have this problem.

Batch call has some advantages too. Total download time is less
than downloading individual call responses because if you use gzip
compression in IIS, the total result is compressed instead of
individually compressing each result. So, generally batch call is
better for small calls. But if a call is going to send a large
amount of data or is going to return say 20KB of response, then
it’s better not to use batch. Another problem with batch call is,
say 2 calls are very small but the 3rd call is quite big. If these
3 call gets batched, the smaller calls are going to suffer from
long delay due to the 3rd larger call.

Beginning Atlas series: Why Atlas?

This is the first question everyone asks me when they see
Pageflakes. Why not
Protopage or Dojo library? Microsoft Atlas is a very promising AJAX
library. They are putting a lot of effort on Atlas, making lots of
reusable components that can really save you a lot of time and give
your web application a complete face lift at reasonably low effort
on changes. It integrated with ASP.NET v very well and it is
compatible with ASP.NET Membership and Profile provider.

When we first started developing Pageflakes, Atlas was in
infant stage. We were only able to use the Page Method and
Webservice Method call feature of Atlas. We had to make our own
drag & drop, component architecture, popups, collapse/expand
features etc. But now you can have all these from Atlas and thus
save a lot of development time. The web service proxy feature of
Atlas is a marvel. You can point a < script> tag to a .asmx
file and you get a javascript class generated right out of the web
service definition. The Javascript class contains the exact methods
that you have on the web service class. This makes it really easy
to add/remove new webservices, add/remove methods in webservices
which does not require any changes on the client side. It also
offers a lot of control over the AJAX calls and provides rich
exception trapping feature on the javascript. Server side
exceptions are nicely thrown to client side javascript code and you
can trap it and show nicely formatted error messages to the user.
Atlas works really well with ASP.NET 2.0 eliminating the
integration problem completely. You need not worry about
authentication and authorization on page methods and web service
methods. So, you save a lot of code on the client side (of course
Atlas Runtime is huge for this reason) and you can concentrate more
on your own code then building up all these framework related
codes.

Recent version of Atlas works nicely with ASP.NET Membership and
Profile services giving you login/logout features from Javascript
without requiring page postbacks and you can read/write Profile
object directly from Javascript. This comes very handy when you
heavily use ASP.NET membership and profile providers in your web
application which we do at Pageflakes.

On earlier versions of Atlas, there was no way to make HTTP GET
calls. All calls were HTTP POST and thus quite expensive calls. Now
you can say which calls should be HTTP GET. Once you have HTTP GET,
you can utilize Http response caching features which I will explain
soon.

I will be writing about lots of Atlas tips and tricks. I am
assuming you are familiar with Atlas and you have already tried
some quick start tutorials and you know the concepts of Page
Method, Web service Proxy, Script Manager etc.

Do you have problems with users who cannot use Forgot Password option?

Here’s a scenario. We use Email address as user name in ASP.NET
2.0 Membership provider. There were several places where we used to
create user accounts using this:

Membership.CreateUser( email,
password );

We did not notice what it was doing. After some days, users
started complaining. This is what users said whose account was
automatically created by the above code:

“Hi,

I got the email invitation. I went to your site. I tried login,
it said user name or password is wrong. So, I tried Signup. Signup
said user name already taken. Then I went to forgot password to
retrieve the password. It shows something is wrong and password
email cannot be sent.

I am stuck. Please help!”

Here’s the problem. When we use the above code, it creates a row
in aspnet_users table using the email address as user name. Fine no
problem. But in aspnet_membership table, the row it creates
contains Email is NULL. So, user cannot use “Forgot Password”
option to request the password because the email address is null.
Out database contained 908 of such unfortunate users, so we had to
run the following SQL to fix it:

update aspnet_membership set email =
( select username from aspnet_users where applicationID =
and
userID = aspnet_membership.userID) ,loweredemail
= ( select loweredusername from aspnet_users where applicationid =
and
userid = aspnet_membership.userID) where loweredemail is null
and applicationID =

The applicationID is something which you need to specify for
your own application. You can find the ID from aspnet_application
table.

Then we changed the code to create user accounts to this:

Membership.CreateUser( email,
password, email );

The 3rd parameter is the email address. We did not notice
this.

Large log file can bring SQL Server down when transaction log shipping runs

We were having very poor performance when we turned on
transaction log shipping on our SQL Server. We are using SQL Server
2005. The transaction log file was around 30 GB because the
database was in Full Recovery mode. The server became very slow,
every 15 mins when we were doing the log shipping, it used to
become very slow and sometimes nonresponsive. The event log was
getting full of SqlTimeout exceptions generated by the web site.
The web site started to show asp.net error page very frequently. We
could not use SQL Server Management Studio to login to SQL Server
so that we could do something about it.

Here’s how the connection time was reported from an external
monitoring site:

The peaks are 30 seconds which mean they timed out.

So, here’s what we did:

  1. Turned off Log shipping
  2. Restarted SQL Server.
  3. Switched Database to Simple recovery model. Shrunk the log
    file. This made the log file come down to couple of megabytes.
  4. Ran for some days. All looked ok.
  5. Then switched DB to Full Recovery model and configured log
    shipping again.

So far running fine. But we go down for an hour every Saturday
when we run INDEX DEFRAG on the indexes. The log ships show around
5 or 6 log backups which are each 1 or 2 GB in size when the index
defrag happens.

How to setup SQL Server 2005 Transaction Log Ship on large database that really works

I tried a lot of combinations in my life in order to find out an
effective method for implementing Transaction Log Shipping between
servers which are in a workgroup, not under domain. I realized the
things you learn from article and books are for small and medium
sized databases. When you database become 10 GB or bigger, thing’s
become a lot harder than it looks. Additionally many things changed
in SQL Server 2005. So, it’s even more difficult to configure log
shipping properly nowadays.

Here’s the steps that I finally found that works. Let’s assume
there are 2 servers with SQL Server 2005. Make sure both servers
have latest SP. There’s Service Pack 1 released already.

1. Create a new user Account named “SyncAccount” on both
computers. Use the exact same user name and password.

2. Make sure File Sharing is enabled on the local area
connection between the server. Also enable file sharing in
Firewall.

3. Make sure the local network connection is not regular LAN. It
must be a gigabit card with near zero data corruption. Both cable
and switch needs to be perfect. If possible, connect both servers
using Fibre optic cable directly on the NIC in order to avoid a
separate Switch.

4. Now create a folder named “TranLogs” on both servers. Let’s
assume the folder is on E:Tranlogs.

5. On Primary Database server, share the folder “Tranlogs” and
allow SyncAccount “Full Access” to it. Then allow SyncAccount
FullAccess on TranLogs folder. So you are setting the same
permission from both “Sharing” tab and from “Security” tab.

6. On Secondary database server, allow SyncAccount “Full Access”
right on TranLogs folder. No need to share it.

7. Test whether SyncAccount can really connect between the
servers. On Secondary Server, go to Command Prompt and do this:

8. 

9. Now you have a command prompt which is running with
SyncAccount privilege. Let’s confirm the account can read and write
on “TranLog” shares on both servers.

10. 

11. This is exactly what SQL Agent will be doing during log
ship. It will copy log files from primary server’s network share to
it’s own log file folder. So, the SyncAccount needs to be able to
both read files from primary server’s network share and write onto
its own tranlogs folder. The above test verifies the result.

12. This is something new in SQL Server 2005: Add SyncAccount in
SQLServer Agent group “SqlServer2005SqlAgentUser….”. You will
find this Windows User Group after installing SQL Server 2005.

13. Now go to Control Panel->Administrative
Tools->Services and find the SQL Server Agent service. Go to its
properties and set SyncAccount as the account on the Logon tab.
Restart the service. Do this on both servers.

14. 

15. I use sa account to configure the log shipping. So, do this
on both servers:

a. Enable “sa” account. By default, sa is disabled in SQL Server
2005.

b. On “sa” account turn off Password Expiration Policy. This
prevents sa password from expiring automatically.

16. On Secondary server, you need to allow remote connections.
By default, SQL Server 2005 disables TCP/IP connection. As a
result, you cannot login to the server from another server. Launch
the Surface Area Configuration tool from Start->Programs->MS
SQL Server 2005 and go to “Remote Connection” section. Choose the
3rd option which allows both TCP/IP based remote connection and
local named pipe based connections.

17. On Secondary Server firewall, open port 1433 so that primary
server can connect to it.

18. Restart SQL Server. Yes, you need to restart SQL Server.

18. On Primary server, go to Database properties->Options and
set Recovery Model to “Full”. If it was already set to full before,
it will be wise to first set it to Simple, then shrink the
transaction log file and then make it “Full” again. This will
truncate the transaction log file for sure.

19. Now take a Full Backup of the database. During backup, make
sure you put the backup file on a physically separate hard drive
than the drive where MDF is located. Remember, not different
logical drives, different physical drives. So, you should have at
least 2 hard drives on the server. During backup, SQL Server reads
from MDF and writes on the backup file. So, if both MDF and the
backup is done on the same hard drive, it’s going to take more than
double the time to backup the database. It will also keep the
Disk fully occupied and server will become very slow.

20. After backup done, RAR the database. This ensures when you
copy the database to the other server there’s no data corruption
while the file was being transferred. If you fail to unRAR the file
on the secondary server, you get assurance that there’s some
problem on the network and you must replace network infrastructure.
The RAR also should be done on a separate hard drive than the one
where the RAR is located. Same reason, read is on one drive and
write is on another drive. Better if you can directly RAR to the
destination server using network share. It has two benefits:

a. Your server’s IO is saved. There’s no write, only read.

b. Both RAR and network copy is done in one step.

21. 

22. By the time you are done with the backup, RAR, copy over
network, restore on the other server, the Transaction Log file
(LDF) on the primary database server might become very big. For us,
it becomes around 2 to 3 GB. So, we have to manually take a
transaction log backup and ship to the secondary server before we
configure Transaction Log Shipping.

23. 

24. When you are done with copying the transaction log backup to
the second server, first restore the Full Backup on the secondary
server:

25. 

26. But before restoring, go to Options tab and choose RESTORE
WITH STANDBY:

27. 

28. When the full backup is restored, restore the transaction
log backup.

29. REMEMBER: go to options tab and set the Recovery State to
“RESTORE WITH STANDBY” before you hit the OK button.

30. This generally takes a long time. Too long in fact. Every
time I do the manual full backup, rar, copy, unrar, restore, the
Transaction Log (LDF) file becomes 2 to 3 GB. As a result, it takes
a long time to do a transaction log backup, copy and restore and it
takes more than an hour to restore it. So, within this time, the
log file on the primary server again becomes large. As a result,
when log shipping starts, the first log ship is huge. So, you need
to plan this carefully and do it only when you have least amount of
traffic.

31. I usually have to do this manual Transaction Log backup
twice. First one is around 3 GB. Second one is around 500 MB.

32. Now you have a database on the secondary server ready to be
configured for Log shipping.

33. Go to Primary Server, select the Database, right click
“Tasks” -> “Shrik”. Shrink the Log File.

34. Go to Primary server, bring on Database options, go to
Transaction Log option and enable log shipping.

35. 

36. Now configure the backup settings line this:

37. 

38. Remember, the first path is the network path that we tested
from command prompt on the secondary server. The second path is the
local hard drive folder on the primary server which is shared and
accessible from the network path.

39. Add a secondary server. This is the server where you have
restored the database backup

40. 

41. Choose “No, the secondary database is initialized” because
we have already restored the database.

42. Go to second tab “Copy Files” and enter the path on the
secondary server where log files will be copied to. Note: The
secondary server will fetch the log files from the primary server
network share to it’s local folder. So, the path you specify is on
the secondary server. Do not get confused from the picture below
that’s it’s the same path as primary server. I just have same
folder configuration on all servers. It can be D:tranlogs if you
have the tranlogs folder on D: drive on secondary server.

43. 

44. On third tab, “Restore Transaction Log” configure it as
following:

45. 

46. It is very important to choose “Disconnect users in
database…”. If you don’t do this and by any chance
Management Studio is open on the database on secondary server, log
shipping will keep on failing. So, force disconnect of all users
when database backup is being restored.

47. Setup a Monitor Server which will automatically take care of
making secondary server the primary server when your primary server
will crash.

48. 

49. In the end, the transaction log shipping configuration
window should look like this:

50. 

51. When you press OK, you will see this:

52. Do not be happy at all if you see everything shows
“Success”. Even if you did all the paths, and settings wrong, you
will still see it as successful. Login to the secondary server, go
to SQL Agents->Jobs and find the Log Ship restore job. If the
job is not there, your configuration was wrong. If it’s there,
right click and select “View History”. Wait for 15 mins to have one
log ship done. Then refresh and see the list. If you see all OK,
then it is really ok. If not, then there are two possibilities:

a. See if the Log Ship Copy job failed or not. If it fails, then
you entered incorrect path. There can be one of the following
problem:

  1. The network location on primary server is wrong
  2. The local folder was specified wrong
  3. You did not set SyncAccount as the account which runs SQL Agent
    or you did but forgot to restart the service.

b. If restore fails, then the problems can be one of the
following:

i. SyncAccount is not a valid login in SQL Server. From SQL
Server Management Studio, add SyncAccount as a user.

ii. You forgot to restore the database on secondary server as
Standby.

iii. You probably took some manual transaction log backup on the
primary server in the meantime. As a result, the backup that log
shipping took was not the right sequence.

53. If everything’s ok, you will see this: