Omar AL Zabir – Page 10 – Omar AL Zabir Blog

ASP.NET AJAX testing made easy using Visual Studio 2008 Web Test

Visual Studio 2008 comes with rich Web Testing support, but
it’s not rich enough to test highly dynamic AJAX websites
where the page content is generated dynamically from database and
the same page output changes very frequently based on some external
data source e.g. RSS feed. Although you can use the Web Test Record
feature to record some browser actions by running a real browser
and then play it back. But if the page that you are testing changes
everytime you visit the page, then your recorded tests no longer
work as expected. The problem with recorded Web Test is that it
stores the generated ASP.NET Control ID, Form field names inside
the test. If the page is no longer producing the same ASP.NET
Control ID or same Form fields, then the recorded test no longer
works. A very simple example is in VS Web Test, you can say
“click the button with ID
ctrl00_UpdatePanel003_SubmitButton002”, but you cannot
say “click the 2nd Submit button inside the third
UpdatePanel”. Another key limitation is in Web Tests, you
cannot address Controls using the Server side Control ID like
“SubmitButton”. You have to always use the generated
Client ID which is something weird like
“ctrl_00_SomeControl001_SubmitButton”. Moreover, if you
are making AJAX calls where certain call returns some JSON or
updates some UpdatePanel and then based on the server
returned response, you want to make further AJAX calls or post the
refreshed UpdatePanel, then recorded tests don’t work
properly. You *do* have the option to write the tests hand coded
and write code to handle such scenario but it’s pretty
difficult to write hand coded tests when you are using
UpdatePanels because you have to keep track of the page
viewstates, form hidden variables etc across async post backs. So,
I have built a library that makes it significantly easier to test
dynamic AJAX websites and UpdatePanel rich web pages. There
are several ExtractionRule and ValidationRule
available in the library which makes testing Cookies, Response
Headers, JSON output, discovering all UpdatePanel in a page,
finding controls in the response body, finding controls inside some
UpdatePanel all very easy.

First, let me give you an example of what can be tested using
this library. My open source project Dropthings produces a Web
2.0 Start Page where the page is composed of widgets.

Each widget is composed of two UpdatePanel. There’s
a header area in each widget which is one UpdatePanel and
the body area is another UpdatePanel. Each widget is
rendered from database using the unique ID of the widget row, which
is an INT IDENTITY. Every page has unique widgets, with unique
ASP.NET Control ID. As a result, there’s no way you can
record a test and play it back because none of the ASP.NET Control
IDs are ever same for the same page on different visits. This is
where my library comes to the rescue.

See the web test I did:

This test simulates an anonymous user visit. When anonymous user
visits Dropthings for the first time, two pages are created with
some default widgets. You can also add new widgets on the page, you
can drag & drop widgets, you can delete a widget that you
don’t like.

This Web Test simulates these behaviors automatically:

Visit the homepage
Show the widget list which is an UpdatePanel. It checks
if the UpdatePanel contains the BBC World widget.
Then it clicks on the “Edit” link of the “How
to of the day” widget which brings up some options
dynamically inside an UpdatePanel. Then it tries to change
the Dropdown value inside the UpdatePanel to 10.
Adds a new widget from the Widget List. Ensures that the
UpdatePanel postback successfully renders the new
widget.
Deletes the newly added widget and ensures the widget is
gone.
Logs user out.

If you want to learn details about the project, read my
codeproject article:

http://www.codeproject.com/KB/aspnet/aspnetajaxtesting.aspx

Please vote if you find this useful.

Web 2.0 AJAX Portal using jQuery, ASP.NET 3.5, Silverlight, Linq to SQL, WF and Unity

Dropthings
– my open
source Web 2.0 Ajax Portal has gone through a technology
overhauling. Previously it was built using ASP.NET AJAX, a little
bit of Workflow Foundation and Linq to SQL. Now Dropthings boasts
full jQuery front-end combined with ASP.NET AJAX
UpdatePanel, Silverlight widget, full
Workflow Foundation implementation on the business
layer, 100% Linq to SQL Compiled Queries on the
data access layer, Dependency Injection and Inversion of Control
(IoC) using Microsoft Enterprise Library 4.1 and
Unity. It also has a ASP.NET AJAX Web Test
framework that makes it real easy to write Web Tests that simulates
real user actions on AJAX web pages. This article will walk you
through the challenges in getting these new technologies to work in
an ASP.NET website and how performance, scalability, extensibility
and maintainability has significantly improved by the new
technologies. Dropthings has been licensed for commercial use by
prominent companies including BT Business, Intel, Microsoft IS,
Denmark Government portal for Citizens; Startups like Limead and
many more. So, this is serious stuff! There’s a very cool
open source implementation of Dropthings framework available at
National
University of Singapore portal.

Visit: http://dropthings.omaralzabir.com

I have published a new article on this on CodeProject:

http://www.codeproject.com/KB/ajax/Web20Portal.aspx

Get the source code

Latest source code is hosted at Google code:

http://code.google.com/p/dropthings

There’s a CodePlex site for documentation and issue
tracking:

http://www.codeplex.com/dropthings

You will need Visual Studio 2008 Team Suite with Service Pack 1
and Silverlight 2 SDK in order to run all the projects. If you have
only Visual Studio 2008 Professional, then you will have to remove
the Dropthings.Test project.

New features introduced

Dropthings new release has the following features:

Template users – you can define a user who’s pages
and widgets are used as a template for new users. Whatever you put
in that template user’s pages, it will be copied for every
new user. Thus this is an easier way to define the default pages
and widgets for new users. Similarly you can do the same for a
registered user. The template users can be defined in the
web.config.
Widget-to-Widget communication – Widgets can send message
to each other. Widgets can subscribe to an Event Broker and
exchange messages using a Pub-Sub pattern.
WidgetZone – you can create any number of zones in any
shape on the page. You can have widgets laid in horizontal layout,
you can have zones on different places on the page and so on. With
this zone model, you are no longer limited to the Page-Column model
where you could only have N vertical columns.
Role based widgets – now widgets are mapped to roles so
that you can allow different users to see different widget list
using ManageWidgetPersmission.aspx.
Role based page setup – you can define page setup for
different roles. For ex, Managers see different pages and widgets
than Employees.
Widget maximize – you can maximize a widget to take full
screen. Handy for widgets with lots of content.
Free form resize – you can freely resize widgets
vertically.
Silverlight Widgets – You can now make widgets in
Silverlight!

Why the technology overhauling

Performance, Scalability, Maintainability and Extensibility
– four key reasons for the overhauling. Each new technology
solved one of more of these problems.

First, jQuery was used to replace my personal hand-coded large
amount of Javascript code that offered the client side drag &
drop and other UI effects. jQuery already has a rich set of library
for Drag & Drop, Animations, Event handling, cross browser
javascript framework and so on. So, using jQuery means opening the
door to thousands of jQuery plugins to be offered on Dropthings.
This made Dropthings highly extensible on the client side.
Moreover, jQuery is very light. Unlike AJAX Control Toolkit jumbo
sized framework and heavy control extenders, jQuery is very lean.
So, total javascript size decreased significantly resulting in
improved page load time. In total, the jQuery framework, AJAX basic
framework, all my stuffs are total 395KB, sweet! Performance is
key; it makes or breaks a product.

Secondly, Linq to SQL queries are replaced with Compiled
Queries. Dropthings did not survive a load test when regular lambda
expressions were used to query database. I could only reach up to
12 Req/Sec using 20 concurrent users without burning up web server
CPU on a Quad Core DELL server.

Thirdly, Workflow Foundation is used to build operations that
require multiple Data Access Classes to perform together in a
single transaction. Instead of writing large functions with many
if…else conditions, for…loops, it’s better to
write them in a Workflow because you can visually see the flow of
execution and you can reuse Activities among different Workflows.
Best of all, architects can design workflows and developers can
fill-in code inside Activities. So, I could design a complex
operations in a workflow without writing the real code inside
Activities and then ask someone else to implement each Activity. It
is like handing over a design document to developers to implement
each unit module, only that here everything is strongly typed and
verified by compiler. If you strictly follow Single Responsibility
Principle for your Activities, which is a smart way of saying one
Activity does only one and very simple task, you end up with a
highly reusable and maintainable business layer and a very clean
code that’s easily extensible.

Fourthly, Unity
Dependency Injection (DI) framework is used to pave the path for
unit testing and dependency injection. It offers Inversion of
Control (IoC), which enables testing individual classes in
isolation. Moreover, it has a handy feature to control lifetime of
objects. Instead of creating instance of commonly used classes
several times within the same request, you can make instances
thread level, which means only one instance is created per thread
and subsequent calls reuse the same instance. Are these going over
your head? No worries, continue reading, I will explain later
on.

Fifthly, enabling API for Silverlight widgets allows more
interactive widgets to be built using Silverlight. HTML and
Javascripts still have limitations on smooth graphics and
continuous transmission of data from web server. Silverlight solves
all of these problems.

Read the article for details on how all these improvements were
done and how all these hot techs play together in a very useful
open source project for enterprises.

http://www.codeproject.com/KB/ajax/Web20Portal.aspx

Don’t forget to vote for me if you like it.

Memory Leak with delegates and workflow foundation

Recently after Load Testing my open source project Dropthings, I
encountered a lot of memory leak. I found lots of Workflow
Instances and Linq Entities were left in memory and never
collected. After profiling the web application using .NET Memory Profiler, it showed the real picture:

It shows you that instances of the several types are being
created but not being removed. You see the “New” column
has positive value, but the “Remove” column has 0. That
means new instances are being created, but not removed. Basically
the way you do Memory Profiling is, you take two snapshots. Say you
take one snapshot when you first visit your website. Then you do
some action on the website that results in allocation of objects.
Then you take another snapshot. When you compare both snapshots,
you can see how many instances of classes were created between
these two snapshots and how many were removed. If they are not
equal, then you have leak. Generally in web application many
objects are created on every page hit and the end of the request,
all those objects are supposed to be released. If they are not
released, then we have a problem. But that’s the scenario for
desktop applications because in a desktop application, objects can
remain in memory until app is closed. But you should know best from
the code which objects were supposed to go out of scope and get
released.

For beginners, leak means objects are being allocated but not
being freed because someone is holding reference to the objects.
When objects leak, they remain in memory forever, until the process
(or app domain) is closed. So, if you have a leaky website, your
website is continuously taking up memory until it runs out of
memory on the web server and thus crash. So, memory leak is a bad
– it prevents you from running your product for long duration
and requires frequent restart of app pool.

So, the above screenshot shows Workflow and Linq related classes
are not being removed, and thus leaking. This means somewhere
workflow instances are not being released and thus all workflow
related objects are remaining. You can see the number is same 48
for all workflow related objects. This is a good indication that,
almost every instance of workflow is leaked because there were
total 48 workflows created and ran. Moreover it indicates we have a
leak from a top Workflow instance level, not in some specific
Activity or somewhere deep in the code.

As the workflows use Linq stuff, they held reference to the Linq
stuffs and thus the Linq stuffs leaked as well. Sometimes you might
be looking for why A is leaking. But you actually end up finding
that since B was holding reference to A and B was leaking and thus
A was leaking as well. This is sometimes tricky to figure out and
you spend a lot of time looking at the wrong direction.

Now let me show you the buggy code:

ManualWorkflowSchedulerService manualScheduler = 
  workflowRuntime.GetService<ManualWorkflowSchedulerService>();

WorkflowInstance instance = workflowRuntime.CreateWorkflow(workflowType, properties);
instance.Start();

EventHandler<WorkflowCompletedEventArgs> completedHandler = null;
completedHandler = delegate(object o, WorkflowCompletedEventArgs e)
{
    if (e.WorkflowInstance.InstanceId == instance.InstanceId) // 1. instance
    {
        workflowRuntime.WorkflowCompleted -= completedHandler; // 2. terminatedhandler

        // copy the output parameters in the specified properties dictionary
        Dictionary<string,object>.Enumerator enumerator = 
            e.OutputParameters.GetEnumerator();
        while( enumerator.MoveNext() )
        {
            KeyValuePair<string,object> pair = enumerator.Current;
            if( properties.ContainsKey(pair.Key) )
            {
                properties[pair.Key] = pair.Value;
            }
        }
    }
};

Exception x  = null;
EventHandler<WorkflowTerminatedEventArgs> terminatedHandler = null;
terminatedHandler = delegate(object o, WorkflowTerminatedEventArgs e)
{
    if (e.WorkflowInstance.InstanceId == instance.InstanceId) // 3. instance
    {
        workflowRuntime.WorkflowTerminated -= terminatedHandler; // 4. completeHandler
        Debug.WriteLine( e.Exception );

        x = e.Exception;
    }
};

workflowRuntime.WorkflowCompleted += completedHandler;
workflowRuntime.WorkflowTerminated += terminatedHandler;

manualScheduler.RunWorkflow(instance.InstanceId);

Can you spot the code where it leaked?

I have numbered the lines in comment where the leak is
happening. Here the delegate is acting like a closure
and those who are from Javascript background know closure is evil.
They leak memory unless very carefully written. Here the
delegate keeps a reference to the
instance object. So, if somehow delegate
is not released, the instance will remain in memory
forever and thus leak. Now can you find a situation when the
delegate will not be released?

Say the workflow completed. It will fire the completeHandler. But the
completeHandler will not release the
terminateHandler. Thus the
terminateHandler remains in memory and it also holds
reference to the instance. So, we have a leaky
delegate leaking whatever it is holding onto outside
it’s scope. Here the only thing outside the scope if the
instance, which it is tried to access from the parent
function.

Since the workflow instance is not released, all the properties
the workflow and all the activities inside it are holding onto
remains in memory. Most of the workflows and activities expose
public properties which are Linq Entities. Thus the Linq Entities
remain in memory. Now Linq Entities keep a reference to the
DataContext from where it is produced. Thus we have
DataContext remaining in memory. Moreover,
DataContext keeps reference to many internal objects
and metadata cacahe, so they remain in memory as well.

So, the correct code is:

ManualWorkflowSchedulerService manualScheduler = 
    workflowRuntime.GetService<ManualWorkflowSchedulerService>();

WorkflowInstance instance = workflowRuntime.CreateWorkflow(workflowType, properties);
instance.Start();
var instanceId = instance.InstanceId;

EventHandler<WorkflowCompletedEventArgs> completedHandler = null;
completedHandler = delegate(object o, WorkflowCompletedEventArgs e)
{
    if (e.WorkflowInstance.InstanceId == instanceId) // 1. instanceId is a Guid
    {
        // copy the output parameters in the specified properties dictionary
        Dictionary<string,object>.Enumerator enumerator = 
                e.OutputParameters.GetEnumerator();
        while( enumerator.MoveNext() )
        {
            KeyValuePair<string,object> pair = enumerator.Current;
            if( properties.ContainsKey(pair.Key) )
            {
                properties[pair.Key] = pair.Value;
            }
        }
    }
};

Exception x  = null;
EventHandler<WorkflowTerminatedEventArgs> terminatedHandler = null;
terminatedHandler = delegate(object o, WorkflowTerminatedEventArgs e)
{
    if (e.WorkflowInstance.InstanceId == instanceId) // 2. instanceId is a Guid
    {
        x = e.Exception;
        Debug.WriteLine(e.Exception);
    }
};

workflowRuntime.WorkflowCompleted += completedHandler;
workflowRuntime.WorkflowTerminated += terminatedHandler;

manualScheduler.RunWorkflow(instance.InstanceId);
// 3. Both delegates are now released
workflowRuntime.WorkflowTerminated -= terminatedHandler;
workflowRuntime.WorkflowCompleted -= completedHandler;

There are two changes – in both delegates, the
instanceId variable is passed, instead of the
instance. Since instanceId is a Guid,
which is a struct type data type, not a class, there’s no
issue of referencing. Structs are copied, not referenced. So, they
don’t leak memory. Secondly, both delegates are
released at the end of the workflow execution, thus releasing both
references.

In Dropthings, I am using the famous CallWorkflow Activity by John Flanders, which
is widely used to execute one Workflow from another synchronously.
There’s a CallWorkflowService class which is
responsible for synchronously executing another workflow and that
has similar memory leak problem. The original code of the service
is as following:

public class CallWorkflowService : WorkflowRuntimeService
{
    #region Methods

    public void StartWorkflow(Type workflowType,Dictionary<string,object> inparms, 
           Guid caller,IComparable qn)
    {
        WorkflowRuntime wr = this.Runtime;
        WorkflowInstance wi = wr.CreateWorkflow(workflowType,inparms);
        wi.Start();
        ManualWorkflowSchedulerService ss = 
             wr.GetService<ManualWorkflowSchedulerService>();
        if (ss != null)
            ss.RunWorkflow(wi.InstanceId);
        EventHandler<WorkflowCompletedEventArgs> d  = null;
        d = delegate(object o, WorkflowCompletedEventArgs e)
        {
            if (e.WorkflowInstance.InstanceId ==wi.InstanceId)
            {
                wr.WorkflowCompleted -= d;
                WorkflowInstance c = wr.GetWorkflow(caller);
                c.EnqueueItem(qn, e.OutputParameters, null, null);
            }
        };
        EventHandler<WorkflowTerminatedEventArgs> te = null;
        te = delegate(object o, WorkflowTerminatedEventArgs e)
        {
            if (e.WorkflowInstance.InstanceId == wi.InstanceId)
            {
                wr.WorkflowTerminated -= te;
                WorkflowInstance c = wr.GetWorkflow(caller);
                c.EnqueueItem(qn, new Exception("Called Workflow Terminated", 
                  e.Exception), null, null);
            }
        };
        wr.WorkflowCompleted += d;
        wr.WorkflowTerminated += te;
    }

    #endregion Methods
}

As you see, it has that same delegate holding reference to
instance object problem. Moreover, there’s some queue stuff
there, which requires the caller and qn
parameter passed to the StartWorkflow function. So,
not a straight forward fix.

I tried to rewrite the whole CallWorkflowService so
that it does not require two delegates to be created per Workflow.
Then I took the delegates out. Thus there’s no chance of
closure holding reference to unwanted objects. The result looks
like this:

public class CallWorkflowService : WorkflowRuntimeService
{
    #region Fields

    private EventHandler<WorkflowCompletedEventArgs> _CompletedHandler = null;
    private EventHandler<WorkflowTerminatedEventArgs> _TerminatedHandler = null;
    private Dictionary<Guid, WorkflowInfo> _WorkflowQueue = 
       new Dictionary<Guid, WorkflowInfo>();

    #endregion Fields

    #region Methods

    public void StartWorkflow(Type workflowType,Dictionary<string,object> inparms,
        Guid caller,IComparable qn)
    {
        WorkflowRuntime wr = this.Runtime;
        WorkflowInstance wi = wr.CreateWorkflow(workflowType,inparms);
        wi.Start();

        var instanceId = wi.InstanceId;
        _WorkflowQueue[instanceId] = new WorkflowInfo { Caller = caller, qn = qn };

        ManualWorkflowSchedulerService ss = 
           wr.GetService<ManualWorkflowSchedulerService>();
        if (ss != null)
            ss.RunWorkflow(wi.InstanceId);
    }

    protected override void OnStarted()
    {
        base.OnStarted();

        if (null == _CompletedHandler)
        {
            _CompletedHandler = delegate(object o, WorkflowCompletedEventArgs e)
            {
                var instanceId = e.WorkflowInstance.InstanceId;
                if (_WorkflowQueue.ContainsKey(instanceId))
                {
                    WorkflowInfo wf = _WorkflowQueue[instanceId];
                    WorkflowInstance c = this.Runtime.GetWorkflow(wf.Caller);
                    c.EnqueueItem(wf.qn, e.OutputParameters, null, null);
                    _WorkflowQueue.Remove(instanceId);
                }
            };
            this.Runtime.WorkflowCompleted += _CompletedHandler;
        }

        if (null == _TerminatedHandler)
        {
            _TerminatedHandler = delegate(object o, WorkflowTerminatedEventArgs e)
            {
                var instanceId = e.WorkflowInstance.InstanceId;
                if (_WorkflowQueue.ContainsKey(instanceId))
                {
                    WorkflowInfo wf = _WorkflowQueue[instanceId];
                    WorkflowInstance c = this.Runtime.GetWorkflow(wf.Caller);
                    c.EnqueueItem(wf.qn, 
                      new Exception("Called Workflow Terminated", e.Exception), 
                      null, null);
                    _WorkflowQueue.Remove(instanceId);
                }
            };

            this.Runtime.WorkflowTerminated += _TerminatedHandler;
        }
    }

    protected override void OnStopped()
    {
        _WorkflowQueue.Clear();

        base.OnStopped();
    }

    #endregion Methods

    #region Nested Types

    private struct WorkflowInfo
    {
        #region Fields

        public Guid Caller;
        public IComparable qn;

        #endregion Fields
    }

    #endregion Nested Types
}

After fixing the problem, another Memory Profile result showed
the leak is gone:

As you see, the numbers vary, which means there’s no
consistent leak. Moreover, looking at the types that remains in
memory, they look more like metadata than instances of
classes. So, they are basically cached instances of metadata,
not instances allocated during workflow execution which are
supposed to be freed. So, we solved the memory leak!

Now you know how to write anonymous delegates without leaking
memory and how to run workflow without leaking them. Basically, the
principle theory is – if you are referencing some outside
object from an anonymous delegate, make sure that
object is not holding reference to the delegate in
some way, may be directly or may be via some child objects of its
own. Because then you have a circular reference. If possible, do
not try to access objects e.g. instance inside an
anonymous delegate that is declared outside the delegate. Try
accessing instrinsic data types like int, string, DateTime, Guid
etc which are not reference type variables. So, instead of
referencing to an object, you should declare local variables e.g.
instanceId that gets the value of properties (e.g.
instance.InstanceId) from the object and then use
those local variables inside the anonymous delegate.

Optimize ASP.NET Membership Stored Procedures for greater speed and scalability

Last year at Pageflakes,
when we were getting millions of hits per day, we were having query
timeout due to lock timeout and Transaction Deadlock errors. These
locks were produced from aspnet_Users and
aspnet_Membership tables. Since both of these tables
are very high read (almost every request causes a read on these
tables) and high write (every anonymous visit creates a row on
aspnet_Users), there were just way too many locks
created on these tables per second. SQL Counters showed thousands
of locks per second being created. Moreover, we had queries that
would select thousands of rows from these tables frequently and
thus produced more locks for longer period, forcing other queries
to timeout and thus throw errors on the website.

If you have read my last blog post, you know why such locks
happen. Basically every table when it grows up to hold millions of
records and becomes popular goes through this trouble. It’s
just a part of scalability problem that is common to database. But
we rarely take prevention about it in our early design.

The solution is simple, you should either have WITH (NOLOCK) or SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED before SELECT queries. Either of this will do.
They tell SQL Server not to hold any lock on the table while it is
reading the table. If some row is locked while the read is
happening, it will just ignore that row. When you are reading a
table thousand times per second, without these options, you are
issuing lock on many places around the table thousand times per
second. It not only makes read from table slower, but also so many
lock prevents insert, update, delete from happening timely and thus
queries timeout. If you have queries like “show the currently
online users from last one hour based on
LastActivityDate field”, that is going to issue
such a wide lock that even other harmless select queries will
timeout. And did I tell you that there’s no index on
LastActivityDate on aspnet_Users
table?

Now don’t blame yourself for not putting either of these
options on your every stored proc and every dynamically generated
SQL from the very first day. ASP.NET developers made the same
mistake. You won’t see either of these used in any of the
stored procs used by ASP.NET Membership. For example, the following
stored proc gets called whenever you access Profile
object:

ALTER PROCEDURE [dbo].[aspnet_Profile_GetProperties]
    @ApplicationName      nvarchar(256),
    @UserName             nvarchar(256),
    @CurrentTimeUtc       datetime
AS
BEGIN

    DECLARE @ApplicationId uniqueidentifier
    SELECT  @ApplicationId = NULL
    SELECT  @ApplicationId = ApplicationId FROM 
      dbo.aspnet_Applications WHERE LOWER(@ApplicationName) = LoweredApplicationName
    IF (@ApplicationId IS NULL)
        RETURN

    DECLARE @UserId uniqueidentifier
    DECLARE @LastActivityDate datetime
    SELECT  @UserId = NULL

    SELECT @UserId = UserId, @LastActivityDate = LastActivityDate
    FROM   dbo.aspnet_Users 
    WHERE  ApplicationId = @ApplicationId AND LoweredUserName = LOWER(@UserName)

    IF (@UserId IS NULL)
        RETURN
    SELECT TOP 1 PropertyNames, PropertyValuesString, PropertyValuesBinary
    FROM         dbo.aspnet_Profile
    WHERE        UserId = @UserId

    IF (@@ROWCOUNT > 0)
    BEGIN
        UPDATE dbo.aspnet_Users
        SET    LastActivityDate=@CurrentTimeUtc
        WHERE  UserId = @UserId
    END
END

There are two
SELECT operations that hold lock on two very high read tables
– aspnet_Users and aspnet_Profile.
Moreover, there’s a nasty UPDATE statement. It tries to
update the LastActivityDate of a user whenever you
access Profile object for the first time within a http
request.

This stored proc alone is enough to bring your site down. It did
to us because we are using Profile Provider
everywhere. This stored proc was called around 300 times/sec. We
were having nightmarish slow performance on the website and many
lock timeouts and transaction deadlocks. So, we added the
transaction isolation level and we also modified the UPDATE
statement to only perform an update when the
LastActivityDate is over an hour. So, this means, the
same user’s LastActivityDate won’t be
updated if the user hits the site within the same hour.

So, after the modifications, the stored proc looked like
this:

ALTER PROCEDURE [dbo].[aspnet_Profile_GetProperties]
    @ApplicationName      nvarchar(256),
    @UserName             nvarchar(256),
    @CurrentTimeUtc       datetime
AS
BEGIN
    -- 1. Please no more locks during reads
    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

    DECLARE @ApplicationId uniqueidentifier
    --SELECT  @ApplicationId = NULL
    --SELECT  @ApplicationId = ApplicationId FROM dbo.aspnet_Applications 
    WHERE LOWER(@ApplicationName) = LoweredApplicationName
    --IF (@ApplicationId IS NULL)
    --    RETURN
    
    -- 2. No more call to Application table. We have only one app dude!
    SET @ApplicationId = dbo.udfGetAppId()

    DECLARE @UserId uniqueidentifier
    DECLARE @LastActivityDate datetime
    SELECT  @UserId = NULL

    SELECT @UserId = UserId, @LastActivityDate = LastActivityDate
    FROM   dbo.aspnet_Users 
    WHERE  ApplicationId = @ApplicationId AND LoweredUserName = LOWER(@UserName)

    IF (@UserId IS NULL)
        RETURN
    SELECT TOP 1 PropertyNames, PropertyValuesString, PropertyValuesBinary
    FROM         dbo.aspnet_Profile
    WHERE        UserId = @UserId

    IF (@@ROWCOUNT > 0)
    BEGIN
        -- 3. Do not update the same user within an hour
        IF DateDiff(n, @LastActivityDate, @CurrentTimeUtc) > 60
        BEGIN
            -- 4. Use ROWLOCK to lock only a row since we know this query
            -- is highly selective
            UPDATE dbo.aspnet_Users WITH(ROWLOCK)
            SET    LastActivityDate=@CurrentTimeUtc
            WHERE  UserId = @UserId
        END
    END
END

The changes I
made are numbered and commented. No need for further explanation.
The only tricky thing here is, I have eliminate call to Application
table just to get the ApplicationID from ApplicationName. Since
there’s only one application in a database (ever heard of
multiple applications storing their user separately on the same
database and the same table?), we don’t need to look up the
ApplicationID on every call to every Membership stored proc. We can
just get the ID and hard code it in a function.

CREATE FUNCTION dbo.udfGetAppId()
RETURNS uniqueidentifier
WITH EXECUTE AS CALLER
AS
BEGIN
    RETURN CONVERT(uniqueidentifier, 'fd639154-299a-4a9d-b273-69dc28eb6388')
END;

This UDF returns the ApplicationID that I have
hardcoded copying from the Application table. Thus it eliminates
the need for quering on the Application table.

Similarly you should do the changes in all other stored
procedures that belong to Membership Provider. All the stroc procs
are missing proper locking, issues aggressive lock during update
and too frequent updates than practical need. Most of them also try
to resolve ApplicationID from ApplicationName, which is unnecessary
when you have only one web application per database. Make these
changes and enjoy lock contention free super performance from
Membership Provider!

Linq to SQL solve Transaction deadlock and Query timeout problem using uncommitted reads

When your database tables start accumulating thousands of rows
and many users start working on the same table concurrently, SELECT
queries on the tables start producing lock contentions and
transaction deadlocks. This is a common problem in any high volume
website. As soon as you start getting several concurrent users
hitting your website that results in SELECT queries on some large
table like aspnet_users table that are also being updated
very frequently, you end up having one of these errors:

Transaction (Process ID ##) was deadlocked on lock resources
with another process and has been chosen as the deadlock victim.
Rerun the transaction.

Or,

Timeout Expired. The Timeout Period Elapsed Prior To Completion
Of The Operation Or The Server Is Not Responding.

The solution to these problems are – use proper index on
the table and use transaction isolation level Read
Uncommitted or WITH (NOLOCK) in your SELECT queries. So,
if you had a query like this:

SELECT * FORM aspnet_users

where ApplicationID =’xxx’ AND LoweredUserName = 'someuser'

You should end up having any of the above errors under high
load. There are two ways to solve this:

SET TRANSACTION LEVEL READ UNCOMMITTED;

SELECT * FROM aspnet_Users

WHERE ApplicationID =’xxx’ AND LoweredUserName = 'someuser'

Or use the WITH (NOLOCK):

SELECT * FROM aspnet_Users WITH (NOLOCK)

WHERE ApplicationID =’xxx’ AND LoweredUserName = 'someuser'

The reason for the errors are that since aspnet_users is
a high read and high write table, during read, the table is
partially locked and during write, it is also locked. So, when the
locks overlap on each other from several queries and especially
when there’s a query that’s trying to read a large
number of rows and thus locking large number of rows, some of the
queries either timeout or produce deadlocks.

Linq to Sql does not produce queries with the WITH
(NOLOCK) option nor does it use READ UNCOMMITTED. So, if
you are using Linq to SQL queries, you are going to end up with any
of these problems on production pretty soon when your site becomes
highly popular.

For example, here’s a very simple query:

using (var db = new DropthingsDataContext()) { var user = db.aspnet_Users.First(); var pages = user.Pages.ToList(); }

DropthingsDataContext is a DataContext built from Dropthings database.

When you attach SQL Profiler, you get this:

You see none of the queries have READ UNCOMMITTED or WITH

(NOLOCK).

The fix is to do this:

using (var db = new DropthingsDataContext2()) { db.Connection.Open(); db.ExecuteCommand("SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;"); var user = db.aspnet_Users.First(); var pages = user.Pages.ToList(); }

This will result in the following profiler output

As you see, both queries execute within the same connection and
the isolation level is set before the queries execute. So, both
queries enjoy the isolation level.

Now there’s a catch, the connection does not close. This
seems to be a bug in the DataContext that when it is disposed, it
does not dispose the connection it is holding onto.

In order to solve this, I have made a child class of the
DropthingsDataContext named DropthingsDataContext2
which overrides the Dispose method and closes the
connection.

   class DropthingsDataContext2 : DropthingsDataContext, IDisposable { public new void Dispose() { if (base.Connection != null) if (base.Connection.State != System.Data.ConnectionState.Closed) { base.Connection.Close(); base.Connection.Dispose(); } base.Dispose(); } }

This solved the connection problem.

There you have it, no more transaction deadlock or lock
contention from Linq to SQL queries. But remember, this is only to
eliminate such problems when your database already has the right
indexes. If you do not have the proper index, then you will end up
having lock contention and query timeouts anyway.

There’s one more catch, READ UNCOMMITTED will return rows
from transactions that have not completed yet. So, you might be
reading rows from transactions that will rollback. Since
that’s generally an exceptional scenario, you are more or
less safe with uncommitted read, but not for financial applications
where transaction rollback is a common scenario. In such case, go
for committed read or repeatable read.

There’s another way you can achieve the same, which seems
to work, that is using .NET Transactions. Here’s the code
snippet:

using (var transaction = new TransactionScope( TransactionScopeOption.RequiresNew, new TransactionOptions() { IsolationLevel = IsolationLevel.ReadUncommitted, Timeout = TimeSpan.FromSeconds(30) })) { using (var db = new DropthingsDataContext()) { var user = db.aspnet_Users.First(); var pages = user.Pages.ToList(); transaction.Complete(); } }

Profiler shows a transaction begins and ends:

The downside is it wraps your calls in a transaction. So, you

are unnecessarily creating transactions even for SELECT operations.
When you do this hundred times per second on a web application,
it’s a significant over head.

Some really good examples of deadlocks are given in this
article:

http://www.code-magazine.com/article.aspx?quickid=0309101&page=2

I highly recommend it.

Strongly typed workflow input and output arguments

When you run a Workflow using Workflow Foundation, you
pass arguments to the workflow in a Dictionary form where
the type of Dictionary is Dictionary.
This means you miss the strong typing features of .NET languages.
You have to know what arguments the workflow expects by looking at
the Workflow public properties. Moreover, there’s no
way to make arguments required. You pass parameter, expect it to
run, if it throws exception, you pass more arguments, hope it works
now. Similarly, if you are running workflow synchronously using
ManualWorkflowSchedulerService, you expect return arguments
from the Workflow immediately, but there again, you have to rely on
the Dictionary key and value pair. No strong typing there as
well.

In order to solve this, so that you could pass Workflow
arguments as strongly typed classes, you can establish a format
that every Workflow has only two arguments named
“Request” and “Response” and none other. Whatever
needs to be passed to the Workflow and expected out of it,
must be passed via Request and must be expected via Response
properties. Now the type of these arguments can be workflow
specific, it can be any class with one or more parameters. This
way, you could write code like this:

The advantages of these strongly typed approach are:

Compile time validation of input parameters passed to workflow.
No risk of passing unexpected object in Dictionary’s
object type value.
Enforce required values by creating Request objects with
non-default constructor.
Establish a fixed contract for Workflow input and output via
the strongly typed Request and Response classes or interfaces.
Validate input arguments for the Workflow directly from the
Request class, without going through the overhead of running a
workflow.

If we follow this approach, we create workflows with only two
DependencyProperty, one for Request and one for
Response. Showing you an example from my open source project
Dropthings, which uses Workflow for the entire
Business Layer. Below you see the Workflow that executes when a new
user visits Dropthings.com, creates a new user and setups all the
pages and widgets for the user. It has only two Dependency
property – Request and Response.

The Request parameters is of type
IUserVisitWorkflowRequest. So, you can pass any class as
Request argument that implements the interface.

Here I have used fancy inheritance to create Request object
hierarchy. You don’t need to do that. Just remember, you can
pass any class. You don’t even need to use interface for
Request parameter. It can be a class directly. I use all these
interfaces in order to facilitate Dependency Inversion.

Similarly, the Response object is also a class.

The Response returns quite some properties. So, it’s kinda
handy to wrap them all in one property.

So, there you have it, strongly typed Workflow arguments. You
can attach properties of the Request object to any activity
directly form the designer:

There’s really no compromise to make in this approach.
Everything works as before.

In order to make workflow execution simpler, I use a helper
method like the following, that takes the Request and Response
object and creates the Dictionary for me. This Dictionary always
contains one “Request” and one “Response”
entry.

This way, I can run Workflow in strongly typed fashion:

Here I can specify the Request, Response and Workflow type using
strong typing. This way I get strongly typed return object as well
as pass strongly type Request object. There’s no dictionary
building, no risky string key and object type value passing.
You can ignore the ObjectContainer.Resolve() stuff, because
that’s just returning me an existing reference of
WorkflowRuntime.

Hope you like this approach.

99.99% available ASP.NET and SQL Server SaaS Production Architecture

You have a hot ASP.NET+SQL Server product, growing at thousand
users per day and you have hit the limit of your own garage hosting
capability. Now that you have enough VC money in your pocket, you
are planning to go out and host on some real hosting facility,
maybe a colocation or managed hosting. So, you are thinking, how to
design a physical architecture that will ensure performance,
scalability, security and availability of your product? How can you
achieve four-nine (99.99%) availability? How do you securely let
your development team connect to production servers? How do you
choose the right hardware for web and database server? Should you
use Storage Area Network (SAN) or just local disks on RAID? How do
you securely connect your office computers to production
environment?

Here I will answer all these queries. Let me first show you a
diagram that I made for Pageflakes where we ensured we get
four-nine availability. Since Pageflakes is a Level 3
SaaS, it’s absolutely important that we build a high
performance, highly available product that can be used from
anywhere in the world 24/7 and end-user gets quick access to their
content with complete personalization and customization of content
and can share it with others and to the world. So, you can take
this production architecture as a very good candidate for Level 3
SaaS:

Here’s a CodeProject article that explains all the
ideas:

99.99% available ASP.NET and SQL Server SaaS Production
Architecture

Hope you like it. Appreciate your vote.

Linq to SQL: Delete an entity using Primary Key only

Linq to Sql does not come with a function like .Delete(ID) which allows you to
delete an entity using it’s primary key. You have to first
get the object that you want to delete and then call .DeleteOnSubmit(obj) to queue
it for delete. Then you have to call DataContext.SubmitChanges() to
play the delete queries on database. So, how to delete object
without getting them from database and avoid database
roundtrip?

You can call this function using DeleteByPK(10,
dataContext);

First type is the entity type and second one is the type of the
primary key. If your object’s primary key is a Guid field, specify
Guid instead of
int.

How it works:

It figures out the table name and the primary key field name
from the entity
Then it uses the table name and primary key field name to build
a DELETE query

Figuring out the table name and primary key field name is a bit
hard. There’s some reflection involved. The GetTableDef()
returns the table name and primary key field name for an
entity.

Every Linq Entity class is decorated with a Table attribute that has the
table name:

Then the primary key field is decorated with a Column attribute with
IsPrimaryKey =
true.

So, using reflection we can figure out the table name and the
primary key property and the field name.

Here’s the code that does it:

Before you scream “Reflection is SLOW!!!!” the
definition is cached. So, reflection is used only once per
appDomain per entity. Subsequent call is just a dictionary lookup
away, which is as fast as it can get.

You can also delete a collection of object without ever getting
any one of them. The the following function to delete a whole bunch
of objects:

The code is available here:

http://code.msdn.microsoft.com/DeleteEntitiesLinq

How to convince developers and management to use automated unit test for AJAX websites

Everyone agrees that unit testing is a good thing, we should all
write unit tests. We read articles and blogs to keep us up-to-date
on what’s going on in the unit test world so that we can
sound cool talking to peers at lunch. But when we really sit down
and try to write unit tests ourselves – “Naaah, this is
waste of time, let’s ask my QA to test it; that’s much
more reliable and guaranteed way to test this. What’s the
point testing these functions when there are so many other
functions that we should unit test first?” Had such moment
yourself or with someone else? Read on.

I had a conversation with our development lead Mike (using a
highly generic name since my
last post caused some trouble), who runs “the show”
in our engineering team. As usual there was reservation in
introducing unit test to regular development schedule. Mike also
had valid points about lack of powerful tools for doing unit test
on AJAX websites. He also had confusion on ‘what’ and
‘how’ to unit test our code so that we aren’t
just testing database failures but real user actions that executes
both business and rendering logics. So, the discussion has a lot of
useful information, that will help you take the right decision when
you want to sell unit test to your ASP.NET and/or AJAX development
team and finally to higher management so that you can buy enough
time for the effort.

Friday, Jan 2007 – hallway
Omar: Hey Mike, we need to start doing unit testing at
least on our web services. We are wasting way too much time on
manual QA. Since we are an AJAX shop, unit testing all our web
services should give us pretty well coverage.

Mike: Sure, that sounds fun. I will do some
feasibility check and see how can we chip this in into our next
sprint.

Friday, Feb 2007 – washroom
Omar: Hey Mike, let’s start doing unit
tests. I haven’t seen any tests last month. Can we start from
this sprint?

Mike: Sure, we can surely start from this
sprint. Let me find out which tool is the right one for us.

Friday, March 2007 – meeting room
Omar: Hey Mike, haven’t seen any unit tests
in the solution so far. Let’s seriously start writing unit
tests. Did you make any plan how you want to start unit testing the
webservices?

Mike: Yeah, I did some digging around and found
some tools. But most of them are for non-AJAX sites where you can
programmatically hit a URL or programmatically do HTTP POST on a
URL. You can also record button clicks and form posts from the
browser. There’s Visual
Studio’s Web Test, which does pretty good job recording
regular ASP.NET site, but poor on AJAX sites. Moreover, you need to
buy Team Suite edition to get that Web Test feature. Besides,
recording tests and playing them back really does not help us
because all those tests contain hard coded data. We can’t
repeat a particular step many times with random data, at least not
using any off-the-shelf tools. We need to test things carefully and
systematically using random data set and sometimes use real data
from database. For example, a common scenario is loading 100 random
user accounts from database and programmatically log those users
into their portal and test whether the portal shows those
users’ personalized data. All these need to be done from
AJAX, without using any browser redirect or form post, because
there’s one page that allows user to login using Ajax call
and then dynamically renders the portal on the same page after
successful login. The UI is rendered by Javascript, so only a real
browser can render it and we have to test the output looking at the
browser.

Omar: I see, so you can’t use Visual
Studio Web Test to run unit tests on a browser because it does not
let you access the html that browser renders. You can only test the
html that’s returned by webserver. As we are AJAX website,
most of our stuffs are done by Javascripts – they call
Webservice and they render the UI. Hmm, thinking how we can do this
using VS. We can at least hit the webservices and see if they are
returning the right JSON. This way we can pretty much test the
entire webservice, business and data access layer. But it does not
really replace the need for manual QA since there’s a
lot of rendering logic in Javascript.

Mike: Now there’s a new project called
Watin that seems promising. You
can write C# code to instruct a browser to do stuffs like click on
a button, run some javascript and then you can check what the
browser rendered in its DOM and run your tests. But still,
it’s in its infancy. So, there’s really no good tool
for unit testing AJAX sites. Let’s stick to manual QA, which
is proven to be more accurate than anything developers can come up
with. We can handover a set of data to QA and ask them to enter and
check the result.

Omar: We definitely need to figure out ways to
reduce our dependency on manual QA. It simply does not scale. Every
sprint, we have to freeze code and then hand over to QA. They run
their gigantic test scripts for a whole day. Then next day, we get
bug reports to fix. If there’s severe regression bug we have
to either cancel sprint or work whole night to fix it and run
overnight QA to meet deployment date. For last one year, every
sprint we ended up having some bug that made dev and QA work over
night. We have to empower our developers with automated unit test
tool so that they can run the whole regression test script
automatically.

Mike: You are talking about a very long project
then. Writing so many unit tests for complete regression test is
going to be more than a month long project. We have to find the
right set of tool, plan what areas to unit test and how, then
engage both dev and QA to work together and prepare the right
tests. And then we have to keep the test suite up-to-date after
every sprint to catch the new bugs and features.

Omar: Yes, this is certainly a complex project.
We have to get to a stage that can empower a developer to run
automated unit tests and not ask QA to test every task for
regression bugs. In fact, we should have automated build that runs
all unit tests and does the regression test for us automatically
after every checkin.

Mike: We have automated build and deploy. So,
that’s done. We need to add automated unit test to it.
Seriously, given our product size, this is absolutely impossible to
engage in writing so many unit tests so that we can do the entire
regression test automatically. It’s not worth the time and
money. Our QA team is doing fine. They can take one day leave after
deployment when they do overnight work.

Omar: Actually QA team is at the edge of
quitting. They seem to have endless work load. After deployment,
they have to do manual regression test on production site to ensure
nothing broke on production. While they are at it, they have to
participate in sprint initiation meetings and write test plans.
When they are about to complete that, devs checkin stuffs and ask
for regression test of different modules. Before they can finish
that , we reach code freeze and they have to finish all those task
level tests as well as the entire regression test. So, they end up
working round-the-clock several days every sprint. They simply
can’t take it anymore.

Mike: How is it different than our life? After
spending sleepless night on the deployment date, next day we have
to attend 8 hours long sprint planning meeting. Then we have to
immediately start working on the tasks from the next day and have
to reach code-freeze within a week. Then QA comes up with so many
bugs at the last moment. We have to work round-the-clock last 3
days of sprint to get those bugs fixed. Then after a nerve wrecking
deployment day, we have to stay up at night to wait for QA to
report any critical bug and fix it immediately on production. We
are at the brink of destruction as well.

Omar: That’s understood. The whole team
is surely getting pushed to their limit. So, that’s why we
urgently need automated test so that it addresses the problems of
both dev and QA team. Dev will get tests done at a faster rate so
that they don’t get bug reports at the very end and then work
over-night to fix them. Similarly, we offload QA team’s
continuous overwork by letting the system do the bulk of their
test.

Mike: This is going to kill the team for sure.
We have so many product features and bug fixes to do every sprint.
Now, if we ask everyone to start writing unit tests for every task
they do, it’s a lot of burden. We can’t do both at the
same time.

Omar: Agree. We have to cut down product
features or bug fixes. We have to make room in every sprint to
write unit tests.

Mike: Good luck with that. Let’s see how
you convince product team.

Omar: First let me convince you. Are you
convinced that we should do it.

Mike: Not yet. I don’t really see its
fruit in near future, even after two months. There’s so many
features we have to do and so many customers to ship to, we just
can’t do enough unit tests that will really shed off QA load.
It’ll just be a distraction and delay in every sprint, heck,
in every task.

Omar: Let me show you a graph which I believe
is going to make an impact:

So, you see the more automates test we write, the less time
spent on Manual QA. That time can be spent on doing new tests or
task level tests and increase quality of every new feature shipped
and drastically reduce new bugs shipped to production. Thus we get
less and less bugs after every successful sprint.

Mike: Ya, I get it, you don’t need to
convince me for this. But I don’t see the benefit from
overall gain perspective. Are we shipping better product faster
over next two months? We aren’t. We are shipping less
features and bug fixes by spending a lot of time on writing unit
tests that has no impact on end-user.

Omar: Let me see if your assumption is
correct:

You see here, the more automated tests kick in, the more time QA
can spend on new features or new bugs. I agree that the speed of
testing new features/bugs decrease first one or two sprints, but
then they gradually get picked up and get even better. In the
beginning, there’s a big overhead of getting started with
automated test. But as sprints go by, the number of unit tests to
write gradually gets stable and soon it becomes proportional to new
features/bugs. No more time spent on writing tests for old stuff.
So, the number of unit tests you write after four sprints is
exactly what needed for the new tasks you did on that sprint.

Mike: Let’s see what if we just
don’t do any automated test and keep things manual. How does
the graph look like?

Omar: The future looks quite gloomy. We will be
spending so much time on regression test as we keep adding stuffs
to the product that at some point QA will end up doing regression
test full time. They will not spend time on new features and we
will end up having a lot more new bugs slipped from QA to
production due to lack of attention from QA.

Mike: OK, how do we start?

Omar: First step is to get the regression tests
done so that we can get rid of that 24 hour long marathon QA period
end of every sprint. Moreover, I see too many devs asking QA to do
regression test here and there after they commit some tasks. So, QA
is always doing regression tests from the beginning to the end of
each sprint. They should only test new things for which automated
test is not yet written and let the automated test do the existing
tests.

Mike: This will be hard to sell to management.
We are going to say “Look for next one month, we will be half
productive because we want to spend time automating our QA process
so that from second month, we can do tests automatically and QA can
have more free time.”

Omar: No, we say it like this, “We are
going to spend 50% of our time automating QA for next oen month so
that QA can spend 50% more time on testing new features. This will
prevent 50% new bugs from occurring every sprint. This will give
developers 50% more time to build new features after one
month.” We show them this graph:

Mike: Seems like this will sell. But for first
couple of sprints, we will be so dead slow that some of us might
get fired. Think about it, from management point of view, the
development team has suddenly become half productive. They
aren’t building only few new features and bugs are not
getting fixed as fast rate. Customer are screaming, investors
asking for money back. It’s going to get really dirty. Do you
want to take this risk?

Omar: I can see that this decision is a very
hard decision to take. I know what CEO will say, “We need to
be double productive from tomorrow, otherwise we might as well pack
our bags and go home. Tell me something that will make us double
productive from tomorrow, not half productive.” But you can
see what will happen after couple of months. Situation will be so
bad that doing this after couple of months will be out of question.
We won’t be in a position to even propose this. Now, at least
we can argue and they still have the mind to listen to long term
ideas. But in future, when our QA team is doing full time
regression test, new buggy features going to production, ratio of
new bugs increasing after every release, more customers screaming,
half baked features running on the production – we might have
to shut down the company to save our life.

Mike: We should have started doing automated
tests from day one.

Omar: Yes, unfortunately we haven’t and the more we delay,
the harder it is going to get. I am sure we will write automated
tests from day one in our next project, but we have to rescue this
project.

Mike: OK, I am sold. How do we start? We surely
need to unit test the business and data access layer. Do we start
writing unit test for every function in DAL and Business layer?

Omar: Writing unit test for DAL seems pointless
to me. Remember, we have very little time. We will get max two
sprints to automate unit tests. After that, we won’t get the
luxury to spend half of our time writing unit tests. We will have
to go back to our feature and bug fix mode. So, let’s spend
the time wisely. How about we only test the business layer
function?

Mike: So, we test functions like
CreateCustomer, EditCustomer, DeleteCustomer, AddNewOrder in
business layer?

Omar: Is that the final layer in business
layer? Is there another high level layer that aggregates such CRUD
like functions?

Mike: For many areas, it’s like CRUD, a
dumb wrapper on DAL with some minor validation and exception
handling. But there are places where there are complex functions
that do a lot of different DAL call. For example,
UpdateCustomerBalance – that calls a lot of DAL classes to
figure out customer’s current balance.

Omar: Does webservices call multiple business
classes? Do they act like another level that aggregates business
layer?

Mike: Yes, webservices are called mostly from
user actions and they generally call multiple business layer
classes to get the job done.

Omar: Where’s the caching done?

Mike: Webservice layer.

Omar: That sounds like a good place to start
unit testing. We will write small number of unit tests and still
test majority of business layer and data access classes and we
ensure validation, caching, exception handling code are working
fine.

Mike: But there are other tools and services
that call the business layer. For example, we have a windows
service running that directly calls the business layer.

Omar: Can we refactor it to call webservices
instead?

Mike: No, that’ll be like creating 10
more webservices. A lot more development effort.

Omar: OK, let’s write unit tests for
those business layer classes separately then. I suppose there will
be some overlap. Some webservice call will test those business
classes as well. But that’s fine. We *should be* unit testing
from business layer. But we don’t have time, so we are
starting from one level up. Webservices aren’t really
“unit” but you have to do what you have to do. At least
testing webservices will give us guarantee that we covered all user
actions under unit test.

Mike: Yes, testing webservices will at least
ensure user actions are tested. The background windows service is
not much of our headache. Now how do we test presentation logic? We
have ASP.NET pages and there’s all those Javascript rendering
code.

Omar: Let’s use Watin for that.

Mike: How to make that part of a unit test
suite?

Omar: Watin integrates nicely with NUnit,
mbUnit. mbUnit is pretty good. I used it before. It has more test
attributes and Assert functions than NUnit.

Mike: OK, so how do we unit test UI? A test
function will click on Login link, fill up the email, password box
and click “OK”. Then wait for one sec and then see if
Javascript has rendered the UI correctly?

Omar: Something like that. We can discuss later
exactly how we test it. But how do you test if UI is rendered
correctly?

Mike: We check from browser’s DOM for
user’s data like name, email, balance etc are available in
browser’s HTML.

Omar: Does that really test presentation logic?
What if the data is misplaced? What if due to CSS error, it does
not render correctly.

Mike: Well, there’s really no way to
figure it out if things are rendered correctly. We can ask the QA
guys to keep watching the UI while Watin runs the tests on the
browser. You can see on the browser what Watin is doing.

Omar: OK, that’s one way and certainly
faster than QA doing the whole step. But can it be done
automatically like matching browser’s screen with some
screenshot?

Mike: Yeah, we need AI for that.

Omar: Seriously, can we write a simple UI
capture and comparison tool? Say we take a screenshot of correct
output and then clear up some areas which can vary. Then Watin runs
the test, it takes the screenshot of current browser’s view
and then matches with some screenshot? Here’s the idea:

Say this is a template screenshot that we want to match with the
browser. We are testing Google’s search result page to ensure
the page always returns a particular result when we provide some
predefined query. So, when Watin runs the test and takes browser to
Google search result page, it takes a screenshot and ignores
whatever is on those gray area. Then it does a pixel by pixel match
on the rest of the template. So, no matter what the search query is
and no matter what ad Google serves on top of results, as long as
the first result is the one we are looking for, test passes.

Mike: As I said, this is AI stuff. Some highly
sophisticated being will be matching two screenshots to say, Yah,
they more or less match, test pass.

Omar: I think a pretty dumb bitmap matching
will work in many cases. Just an idea, think about it. This way we
can test if CSS is giving us pixel perfect result. QA takes a
screenshot of expected output and then let the automated test to
match with browser’s actual output.

Mike: OK, all good ideas. Let’s see how
much we can do. We will be starting from webservice unit testing.
Then we will gradually move to Watin based testing. Now it’s
time to sell this proposal to product team and then to management
team.

Omar: Yep, at least get the webservices tested,
that will catch a lot of bugs before QA spends time on testing.
Goal is to get as much testing done by developers, really fast,
automatically then letting QA spend time on them. Also we can
run those webservice unit tests in a load test suite and load test
the entire webservice layer. That’ll give us guaranty our
code is production quality and it can survive the high traffic.

Mike: Understood, see ya.

. . .

March 2008, Friday – The Code Freeze Day

Omar: Hey Mike, how are we doing this
sprint?

Mike: Pretty good. 3672 unit tests out of 3842
passed. We know why some of them failed. We can get them fixed
pretty soon and run the complete regression tests once during lunch
and once before we leave. QA has completed testing new features
pretty well yesterday and they can check again today. We got some
of the new features covered by unit tests as well. Rest we can
finish next sprint, no worries.

Omar: Excellent. Enjoy your weekend. See you on
Monday.

——————————