tips – Omar AL Zabir Blog

A Manager’s Checklist

Managing people is hard. Many books have been written, lectures given, articles written on how to manage people effectively. Remembering all those good lessons while working under pressure is difficult. Things tend to slip here and there. Sparks flare. Faces get grumpier. People quit… This checklist should help keep things in order.

I have compiled this list from various resources that I have come across over the years. Hope it helps other managers out there. Credit goes to those amazing authors and speakers whose ideas have changed my life, and gave birth to this checklist. I am just a compiler.

Google sheet: http://bit.ly/mgrchklst

Download PDF for print: ManagerChecklist

Any feedback welcome!

Safely deploying changes to production servers

When you deploy incremental changes on a production server, which is running and live all the time, you some times see error messages like “Compiler Error Message: The Type ‘XXX’ exists in both…”. Sometimes you find Application_Start event not firing although you shipped a new class, dll or web.config. Sometimes you find static variables not getting initialized and so on. There are so many weird things happen on webservers when you incrementally deploy changes to the server and the server has been up and running for several weeks. So, I came up with a full proof house keeping steps that we always do whenever we deploy some incremental change to our websites. These steps ensure that the web sites are properly recycled , cached are cleared, all the data stored at Application level is initialized.

First of all you should have multiple web servers behind a load balancer. This way you can take one server out of the production traffic, do your deployment and house keeping tasks like restarting IIS, and then put it back. Then you can do it for the second server and so on. This ensures there’s no outage for customer. If you can do it reasonable fast, hopefully customers won’t notice discrepancy between the servers some having new code and some having old code. You should only do this when your changes aren’t drastic. For ex, you aren’t delivering a complete revamped UI. In that case, some users hitting server1 with latest UI will suddenly get a completely different experience and then on next page refresh, they might hit server2 with old code and get a totally different experience. This works for incremental non-dramatic changes only.

During deployment you should follow these steps:

Take server X out of load balancer so that it does not get any traffic.
Stop all your .NET windows services on the server.
Stop IIS.
Delete the Temporary ASP.NET folders of all .NET versions incase you have multiple .NET versions running. You can follow this link.
Deploy the changes.
Flush any distributed cache you have, for ex, Velocity or Memcached.
Start IIS.
Start your .NET windows services on the server.
Warm up all websites by hitting major URLs on the websites. You should have some automated script to do this. You can use tinyget to hit some major URLs, especially pages that take a lot of time to compile. Read my post on keeping websites warm with zero coding.
Put server X back to load balancer so that it starts receiving traffic.

That’s it. It should give you a clean deployment and prevent unexpected errors. You should print these steps and hang on the desk of your deployment guys so that they never forget during deployment pressure.

Doing all these steps manually is risky. Under deployment time pressure, your production guys can make mistakes and screw up a server for good. So, I always prefer having a batch file that takes a server out and makes it ready for deploying code and then after the deployment is done, use another batch file to put the server back into load balancer traffic rotation after the server is warmed up.

Generally load balancers are configured to hit some page on your website and keep the server alive if that page returns a HTTP 200. If not, it assumes the server is dead and takes it our of rotation. For ex, say you have an alive.txt file on your website which is what load balancer is keeping an eye on. If it’s gone, the server is put out of the rotation. In that case, you can create some batch files that will take the server out, wait for couple of seconds to ensure the in-flight requests complete and then stop IIS, delete temporary ASP.NET files and make server ready to deploy stuff. Something like this:

serverout.bat
=====================
Ren alive.txt dead.txt
typeperf "ASP.NET Applications(__Total__)Requests Executing" -sc 30
iisreset /stop
rmdir /q /s "C:WINDOWSMicrosoft.NETFramework64v1.1.4322Temporary ASP.NET Files"
rmdir /q /s "C:WINDOWSMicrosoft.NETFramework64v2.0.50727Temporary ASP.NET Files"
md "C:WINDOWSMicrosoft.NETFramework64v1.1.4322Temporary ASP.NET Files"
md "C:WINDOWSMicrosoft.NETFramework64v2.0.50727Temporary ASP.NET Files"
xcacls "C:WINDOWSMicrosoft.NETFramework64v1.1.4322Temporary ASP.NET Files" /E /G MYMACHINEIIS_WPG:F /Q
xcacls "C:WINDOWSMicrosoft.NETFramework64v2.0.50727Temporary ASP.NET Files" /E /G MYMACHINEIIS_WPG:F /Q

Similarly you should have a batch file that starts IIS, warms up some pages, and then puts the server back into load balancer.

serverin.bat
============
SET TINYGET=C:Program Files (x86)IIS ResourcesTinyGettinyget.exe
iisreset /start"%TINYGET%" -srv:localhost -uri:http://localhost/ -status:200
ren dead.txt alive.txt
typeperf "ASP.NET Applications(__Total__)Requests Executing" -sc 30

Always try to automate this kind of admin chores. It’s difficult to do it right all the time manually under deployment pressure.

Ten Caching Mistakes that Break your App

Caching frequently used objects, that are expensive to fetch from the source, makes application perform faster under high load. It helps scale an application under concurrent requests. But some hard to notice mistakes can lead the application to suffer under high load, let alone making it perform better, especially when you are using distributed caching where there’s separate cache server or cache application that stores the items. Moreover, code that works fine using in-memory cache can fail when the cache is made out-of-process. Here I will show you some common distributed caching mistakes that will help you make better decision when to cache and when not to cache.

Here are the top 10 mistakes I have seen:

Relying on .NET’s default serializer.
Storing large objects in a single cache item.
Using cache to share objects between threads.
Assuming items will be in cache immediately after storing it.
Storing entire collection with nested objects.
Storing parent-child objects together and also separately.
Caching Configuration settings.
Caching Live Objects that has open handle to stream, file, registry, or network.
Storing same item using multiple keys.
Not updating or deleting items in cache after updating or deleting them on persistent storage.

Let’s see what they are and how to avoid them.

http://www.codeproject.com/KB/web-cache/cachingmistakes.aspx

Please vote if you find this useful.

How to make screencasts in optimized animated GIF for free

I have been using animated GIFs to show short screencasts in my blogs and articles. Animated GIF is supported by all browsers and supports virtually any website in the world where even Flash is blocked. A picture is worth a thousand words, and an animation is worth a thousand multiplied by [frames in animation] words. So, I have been looking for a complete free solution to capturing screencasts and then converting it to animated GIF and then heavily compressing it.

First use CamStudio to capture the screenshot into an AVI. Before you capture, you need to set the CamStudio video recording setting to one frame per second, otherwise there will be too many frames in your animated GIF. You can set it so 2 or more frames per second if you are recording some frequent changes on the screen.

This will put one frame in animated GIF per second. Since animated GIF gets pretty large due to its lossless primitive compression, you need to put as little frames on it as possible.

Now you can record screenshot using CamStudio and save it in an AVI file.

Once you have the AVI file, you need to open the AVI using Microsoft GIF animator.

Then you need to click the “Select all” button and go to Image tab and put 100 on the Duration. This will set each frame delay to 1 second, exactly what you have set in the CamStudio Video Options. If you have set 2 frames per second in CamStudio, then you need to set 50 in Microsoft GIF Animator.

Now you can save the file as an animated GIF and use it wherever you like.

I would highly recommend you further optimize the animated GIF and eliminate duplicate frames and use some advanced compression. For this you can use the ImageMagick utility. You will find various ways to optimize animated GIF on this page. I just use the following command line and it gives me pretty good output:

c:Program Files (x86)ImageMagick-6.6.3-Q16>convert SourceImage.gif -layers OptimizePlus DestImage.gif

This optimizes animated GIFs pretty well. I have seen average 60% reduction on screen captures having white background and when there’s no translucent areas (eg Windows Vista/7 title bars).

Building High Performance Queue in Database for storing Orders, Notifications, Tasks

We have Queues everywhere. There are queues for asynchronously sending notifications like email and SMS in most websites. E-Commerce sites have queues for storing orders, processing and dispatching them. Factory Assembly line automation systems have queues for running tasks in parallel, in a certain order. Queue is a widely used data structure that sometimes have to be created in a database instead of using specialized queue technologies like MSMQ. Running a high performance and highly scalable queue using database technologies is a big challenge and it’s hard to maintain when the queue starts to get millions of rows queued and dequeued per day. Let me show you some common design mistakes made in designing Queue-like tables and how to get maximum performance and scalability from a queue implemented using simple database features.

Let’s first identify the challenges you have in such queue tables:

The table is both read and write. Thus queuing and dequeuing impact each other and cause lock contention, transaction deadlocks, IO timeouts etc under heavy load.
When multiple receivers try to read from the same queue, they randomly get duplicate items picked out of the queue, thus resulting in duplicate processing. You need to implement some kind of high performance row lock on the queue so that same item never gets picked up by concurrent receivers.
The Queue table needs to store rows in certain order and read in certain order, which is an index design challenge. It’s not always first in and first out. Sometimes Orders have higher priority and need to be processed regardless of when they are queued.
The Queue table needs to store serialized objects in XML or binary form, which becomes a storage and index rebuild challenge. You can’t rebuild index on the Queue table because it contains text and/or binary fields. Thus the tables keep getting slower and slower every day and eventually queries start timing out until you take a downtime and rebuild the indexes.
During dequeue, a batch of rows are selected, updated and then returned for processing. You have a “State” column that defines the state of the items. During dequeue, you select items of certain state. Now State only has a small set of values eg PENDING, PROCESSING, PROCESSED, ARCHIVED. As a result, you cannot create index on “State” column because that does not give you enough selectivity. There can be thousands of rows having the same state. As a result, any dequeue operation results in a clustered index scan that’s both CPU and IO intensive and produces lock contention.
During dequeue, you cannot just remove the rows from table because that causes fragmentation in the table. Moreover, you need to retry orders/jobs/notification N times incase they fail on first attempt. This means rows are stored for longer period, indexes keep growing and dequeue gets slower day by day.
You have to archive processed items from the Queue table to a different table or database, in order to keep the main Queue table small. That means moving large amount of rows of some particular status to another database. Such large data removal leaves the table highly defragmented causing poor queue/dequeue performance.
You have a 24×7 business. You have no maintenance window where you can take a downtime and archive large number of rows. This means you have to continuously archive rows without affecting production queue-dequeue traffic.

If you have implemented such queue tables, you might have suffered from one or more of the above challenges. Let me give you some tips on how to overcome these challenges and how to design and maintain a high performance queue table.

Read the article for details:

http://www.codeproject.com/KB/database/fastqueue.aspx

Please vote if you find this useful.

Exporting normalized relational data from database to flat file format

Sometimes you need to export relational normalized data into flat files where a single row comes from various tables. For example, say you want to export all customer records along with their work and home address, and primary phone number in a single row. But the address and contact information are coming from different tables and there can be multiple rows in those table for a single customer. Sometimes there can be no row available in address/phone table for a customer. In such a case, neither INNER JOIN, nor LEFT JOIN/OUTER JOIN will work. How do you do it?

Solution is to use OUTER APPLY.

Consider some tables like this:

Customer Table

CustomerID	FirstName	LastName	DOB
1	Scott	Guthrie	1/1/1950
2	Omar	AL Zabir	1/1/1982

Contact table

CustomerID	ContactType	ContactValue	IsPrimary
1	WorkAddress	Microsoft	TRUE
1	HomeAddress	Seattle	FALSE
1	Phone	345345345	FALSE
1	Phone	123123123	TRUE
2	WorkAddress	London	TRUE
2	Phone	1312123123	FALSE

We need to create a flat file export from this where the output needs to look like:

CustomerID	FirstName	LastName	DOB	HomeAddress	WorkAddress	PrimaryPhone	IsPhonePrimary
1	Scott	Guthrie	1/1/1950	Seattle	Microsoft	123123123	Yes
2	Omar	AL Zabir	1/1/1982	No Home Address	London	1312123123	No

There are some complex requirement in the output:

If customer has multiple phone, then it needs to select the one which is flagged as primary.
If customer has no home address, then it needs to show “No home address” instead of NULL.
It needs to tell if the phone address we got is primary phone or not.

The query to generate this will be:

SELECT 
    c.CustomerID,
    c.FirstName,
    c.LastName,
    c.DOB,

    'HomeAddress' = 
    CASE 
        WHEN home.ContactValue IS NULL THEN 'No Home Address'
        ELSE home.ContactValue
    END,
    work.ContactValue,
    phone.ContactValue as PrimaryPhone,
    'IsPhonePrimary' = 
    CASE 
        WHEN phone.IsPrimary = 1 THEN 'Yes'
        ELSE 'No'
    END
FROM Customer c

OUTER APPLY (
    SELECT TOP 1 ContactValue from Contact WHERE CustomerID = c.CustomerID
    AND ContactType = 'HomeAddress'
    ORDER BY IsPrimary DESC
) AS home

OUTER APPLY (
    SELECT TOP 1 ContactValue from Contact WHERE CustomerID = c.CustomerID
    AND ContactType = 'WorkAddress'
    ORDER BY IsPrimary DESC
) AS work

OUTER APPLY (
    SELECT TOP 1 ContactValue, IsPrimary from Contact WHERE CustomerID = c.CustomerID
    AND ContactType = 'Phone'
    ORDER BY IsPrimary DESC
) AS phone

All the tricks are in the OUTER APPLY blocks. OUTER APPLY selects the row that needs to appear as the value of the columns in the output after the customer table fields.

The primary address is selected by reverse ordering the rows selected from Contact table by IsPrimary field. Thus the rows having True comes first.

User story is worthless, Behavior is what we need

User Story is suitable for describing what user needs but not what user does and how system reacts to user actions within different contexts. It basically gives product team a way to quantify their output and let their boss know that they are doing their job. As a developer, you can’t write code from user stories because you have no clue on what what is the sequence of user actions and system reactions, what are the validations, what APIs to call and so on. As a QA, you can’t test the software from user stories because it does not capture the context, the sequence of events, all possible system reactions. User stories add little value to dev lifecycle. It only helps product team understand how much work they have to do eventually and it helps finance team get a view on how much money people are talking about. But to UI designers, solution designers, developers, they are nothing but blobs of highly imprecise statements that leave room for hundreds of questions to be answered. The absence of “Context” and “Cause and Effect”, and the imprecise way of saying “As a…I want… so that…” leaves room for so many misinterpretations that there’s no way development team can produce software from just user stories without spending significant time all over again analysing the user stories. Software, and the universe eventually, is all about Cause and Effect. The Cause and Effect is not described in a user story.

Unlike user stories, the “Behavior” suggested by Behavior Driven Development (BDD) is a much better approach because the format of a behavior (Givencontext, When event, Then outcome), when used correctly, lets you think in terms of sequence of events, where the context, event and outcome are captured for each and every action user or system does, and thus works as a definite spec for designing the UI and architecture. It follows the Cause and Effect model, thus can explain how the world (or your software) works. It can be so precise that sometimes a behavior work as guideline for a developer to write a single function! Not just the develoeprs, even the QA team can clearly capture what action they need to perform and how the system should respond. However, to get the real fruit out of behaviors, you need to to write them properly, following the right format. So, let me give you some examples on how you can write good behaviors for UI, business layer, services and even functions and thus eliminate repeated requirement analysis that usually happens throughout the user-story driven development lifecycle.

Read more about how user stories suck and if behavior is used throughout the development lifecycle, it can greatly reduce repeated requirement analysis effort and can make the communication between product, design, development and QA team much more effective:

http://www.codeproject.com/KB/architecture/userstorysucks.aspx

If you like it, vote for it!

Keep website and webservices warm with zero coding

If you want to keep your websites or webservices warm and save user from seeing the long warm up time after an application pool recycle, or IIS restart or new code deployment or even windows restart, you can use the tinyget command line tool, that comes with IIS Resource Kit, to hit the site and services and keep them warm. Here’s how:

First get tinyget from here. Download and install the IIS 6.0 Resource Kit on some PC. Then copy the tinyget.exe from “C:Program Files (x86)IIS ResourcesTinyGet” to the server where your IIS 6.0 or IIS 7 is running.

Then create a batch file that will hit the pages and webservices. Something like this:

SET TINYGET=C:Program Files (x86)IIS ResourcesTinyGettinyget.exe

"%TINYGET%" -srv:dropthings.omaralzabir.com -uri:http://dropthings.omaralzabir.com/ -status:200
"%TINYGET%" -srv:dropthings.omaralzabir.com -uri:http://dropthings.omaralzabir.com/WidgetService.asmx?WSDL - status:200

Save this in a batch file and run it as a scheduled task at some interval like 10 minutes and your website will always remain nice and warm.

First I am hitting the homepage to keep the webpage warm. Then I am hitting the webservice URL with ?WSDL parameter, which allows ASP.NET to compile the service if not already compiled and walk through all the operations and reflect on them and thus loading all related DLLs into memory and reducing the warmup time when hit.

Tinyget gets the servers name or IP in the –srv parameter and then the actual URI in the –uri. I have specified what’s the HTTP response code to expect in –status parameter. It ensures the site is alive and is returning http 200 code.

Besides just warming up a site, you can do some load test on the site. Tinyget can run in multiple threads and run loops to hit some URL. You can literally blow up a site with commands like this:

"%TINYGET%" -threads:30 -loop:100 -srv:google.com -uri:http://www.google.com/ -status:200

Tinyget is also pretty useful to run automated tests. You can record http posts in a text file and then use it to make http posts to some page. Then you can put matching clause to check for certain string in the output to ensure the correct response is given. Thus with some simple command line commands, you can warm up, do some transactions, validate the site is giving off correct response as well as run a load test to ensure the server performing well. Very cheap way to get a lot done.

Rescue overdue offshore projects and convince management to use automated tests

I have published two articles on codeproject recently. One is a story where an offshore project was two months overdue, my friend who runs it was paying the team from his own pocket and he was drowning in ever increasing number of change requests and how we brainstormed together to come out of that situation.

Tips and Tricks to rescue overdue projects

Next one is about convincing management to go for automated test and give developers extra time per sprint, at the cost of reduced productivity for couple of sprints. It’s hard to negotiate this with even dev leads, let alone managers. Whenever you tell them – there’s going to be less features/bug fixes delivered for next 3 or 4 sprints because we want to automate the tests and reduce manual QA effort; everyone gets furious and kicks you out of the meeting. Especially in a startup where every sprint is jam packed with new features and priority bug fixes to satisfy various stakeholders, including the VCs, it’s very hard to communicate the benefits of automated tests across the board. Let me tell you of a story of one of my startups where I had the pleasure to argue on this and came out victorious.

How to convince developers and management to use automated test instead of manual test

If you like these, please vote for me!