ReadLine on Binary Stream

When you are reading data from a binary stream, like NetworkStream or FileStream and you need to read both binary chunks as well as read one text line at a time, you are on your own as BinaryReader nor Stream supports ReadLine. You can use StreamReader to do ReadLine, but it does not allow you to read chunks of bytes. The Read(byte[], int, int) is not there on StreamReader.

Here’s an extension of BinaryReader for doing ReadLine over a binary stream. You can read both byte chunks, as well as read text lines at the same time.

public class LineReader : BinaryReader
{
  private Encoding _encoding;
  private Decoder _decoder;

  const int bufferSize = 1024;
  private char[] _LineBuffer = new char[bufferSize];

  public LineReader(Stream stream, int bufferSize, Encoding encoding)
    : base(stream, encoding)
  {
    this._encoding = encoding;
    this._decoder = encoding.GetDecoder();
  }

  public string ReadLine()
  {
    int pos = 0;

    char[] buf = new char[2];

    StringBuilder stringBuffer = null;
    bool lineEndFound = false;

    while(base.Read(buf, 0, 2) > 0)
    {
      if (buf[1] == 'r')
      {
        // grab buf[0]
        this._LineBuffer[pos++] = buf[0];
        // get the 'n'
        char ch = base.ReadChar();
        Debug.Assert(ch == 'n');

        lineEndFound = true;
      }
      else if (buf[0] == 'r')
      {
        lineEndFound = true;
      }
      else
      {
        this._LineBuffer[pos] = buf[0];
        this._LineBuffer[pos+1] = buf[1];
        pos += 2;

        if (pos >= bufferSize)
        {
          stringBuffer = new StringBuilder(bufferSize + 80);
          stringBuffer.Append(this._LineBuffer, 0, bufferSize);
          pos = 0;
        }
      }

      if (lineEndFound)
      {
        if (stringBuffer == null)
        {
          if (pos > 0)
            return new string(this._LineBuffer, 0, pos);
          else
            return string.Empty;
        }
        else
        {
          if (pos > 0)
            stringBuffer.Append(this._LineBuffer, 0, pos);
          return stringBuffer.ToString();
        }
      }
    }

    if (stringBuffer != null)
    {
      if (pos > 0)
        stringBuffer.Append(this._LineBuffer, 0, pos);
      return stringBuffer.ToString();
    }
    else
    {
      if (pos > 0)
        return new string(this._LineBuffer, 0, pos);
      else
        return null;
    }
  }

}

Enjoy.

Scaling ASP.NET websites from thousands to millions–LIDNUG

Here’s the recent presentation made on LIDNUG on scaling ASP.NET websites from thousands to millions of users. The action starts at 0:02:05.

Scaling ASP.NET websites from thousands to millions of users by Omar AL Zabir

Here’re the slides.

Browse internet faster and save power using a smart HOSTS file

Internet is full of flash ads nowadays that make page load slower, render slower and consumes more CPU, thus power. If you can browse without having any flash ads or in fact any ads loaded and without any of the tracking scripts – you can browse much faster, scroll through pages much smoother and have more hours from your battery. Nowadays, most websites use scripts from various analytics sites that track your browsing habit, use IFRAME to load tracking and social networking widgets. All of these add considerable delay to page loading and make browser consume more CPU and bandwidth. If you can turn all of them off, browsing internet feels a lot smoother, faster and you get more work hours while running on battery.

Moreover, you don’t get distracted by the flashy ads and save your children and young family members from looking at foul things.

If we could get 10% of the total internet users (2bn as of Jan 2011) to save 10% CPU, power and bandwidth while browsing everyday, we could save mega watts of power everyday throughout the world!

Using this solution, you can prevent ads and tracking scripts, prevent malicious and porn websites.

How bad is it?

Let’s take an example on a popular website. The red boxes are Flash Ads (read power suckers).

image

Once we disable all ads and tracking scripts, here’s how it looks:

image

Statistics:

  Before After
Total Requests 111 100
Total Download Size 1.2 MB 0.98 MB
Page load time 4.34s 3.64

 

Not just during page loading, while you are on the page, doing nothing, just reading, browser continuously consumes CPU.

Before:

image

After:

image

Before disabling the ads and tracking scripts, CPU is always around 20-25%. After disabling it is around 8-10%. The more CPU works, the more power it consumes. If you are running on battery, you can get at least 20% more time from your battery. If you have many tabs open all the time, you can save more.

Here’s how to save CPU, bandwidth and power

Go to this website and download the HOSTS file:

http://winhelp2002.mvps.org/hosts.htm

Follow the instruction to put the HOSTS file in your C:WindowsSystem32Driversetc folder.

Now go to Start Menu, type Notepad but do not hit enter. Right click on Notepad and select Run As Administrator.

image

Go to File menu and click Open.

Copy and paste this into the File Name and click OK.

c:windowssystem32driversetcHOSTS

image

Now go to Edit menu and select Replace. Put 127.0.0.1 in Find box and put 255.255.255.0 in Replace box. Click Replace All.

image

Once done, you need to type back 127.0.0.1 for the first entry localhost.

image

Remember, localhost cannot be 255.255.255.0.

When you have done this correctly, it will look like this.

image

Save the file and exit Notepad.

Then go to Start menu and type: services.msc

From the service list, double click on “DNS Client”.

image

First click “Stop” to stop the service.

Then from the Startup Type dropdown, select Disabled.

Click OK.

image

Close all your browsers and reopen them. I highly recommend restarting your PC.

You are ready to browse faster, smarter, cheaper! 

I also highly recommend everyone to use OpenDNS. You can save yourself from getting into malicious sites and being ripped off your bank balance, property, spouse and children. Just go to www.opendns.com and follow the instruction. It is the best thing happened on the internet after Wikipedia!

How does the HOSTS file trick work?

Here’s how internet works. You type www.something.com and it goes and finds out what is the IP address for this domain. First Windows checks a file called HOSTS. If it is not defined there, it will ask the DNS Server configured for your network to give it the IP for the domain so that it can connect to the webserver. If you put fake IP in HOSTS file, Windows will hand over fake IP to the browser and browser will connect to the fake IP. Thus by putting an invalid IP, we prevent browser or any application running on your PC from reaching the ads, tracker, malicious and porn websites.

Don’t forget to share this with your friends and families!

Get Dropthings license by donating to charity

Now you no longer pay me for Dropthings license instead you donate the money to a charity and I will give you the license. In case you don’t know what Dropthings is, it is a Web 2.0 Personalizable Dashboard framework that you can use to build Web 2.0 personalizable websites and enterprise dashboards. It is built using ASP.NET AJAX, jQuery, Silverlight, .NET 3.5, Entity Framework, SQL Server. It is in use in big companies like BT, Intel, Microsoft, Thomson Reuters; many government organizations like State Police, Canada Border Protection etc. Since it is a state of the art .NET 3.5 codebase, it is sometimes used as a starting point for an application with all the best practices already in place in order to build an N-tier web app using popular technologies, design patterns and testing methods. Dropthings helps you build web app utilizing extensive performance and scalability research that I have done to scale websites to millions of users. It also helps you build a codebase that is highly testable. It shows you how to test AJAX applications using automated test tools like WatiN. It has a business layer and a data access layer that is fully unit testable, nearly 100% test coverage and uses Inversion of Control pattern to the fullest.

You can find details about the Project here: http://code.google.com/p/dropthings/

There are two codeproject articles that show you how it was built, tested, deployed and the production challenges I had to overcome scaling this to millions of requests per day:

Build Google IG like Portal in 7 days

Web 2.0 AJAX Portal using jQuery, ASP.NET 3.5, Silverlight, Linq to SQL, WF and Unity

Finally, there’s a book on it, that takes you from the initial idea to design, coding, testing, all the way to purchasing right production hardware, deployment and production troubleshooting. It is a complete end-to-end guide for a developer/startup CTO to take an idea from design to VC funded successful startup used by millions. I have captured many experiences I have learnt from my startup years at Pageflakes that I co-founded and was the founding CTO.

Building a Web 2.0 Portal with ASP.NET 3.5 from O‘Reilly.

Let’s build great web apps and save the world at the same time!

MVP Open Day 2011 at Cambridge

Microsoft Research arranged MVP Open Day 2011 at Cambridge on Oct 14, 2011. Beautiful university, made me feel like giving up my job and going back to study. Amazing research work going there, highly thought provoking. The session on DNA programming was out of the world. The most surprising thing I learnt that a 10cm long DNA strand can hold 10TB worth digitally encoded data and cells are thousand times more robust computing system than silicon based chips. Moreover, cells are self-powered, super energy efficient micro processors, hundred years ahead of Intel processors.

Can’t wait for the day when we will be able to use C# to program DNA:

protected void CancerCell_Found(object body, CellEventArgs e) { this.Attack(e.TargetCell); }

Here’s my presentation slide. Nothing NDA or DNA in this, feel free to distribute.

Prevent ASP.NET cookies from being sent on every css, js, image request

ASP.NET generates some large cookies if you are using ASP.NET membership provider. Especially if you are using the Anonymous provider, then a typical site will send the following cookies to every request when a user is logged in, whether the request is to a dynamic page or to any static resource:

.DBANON=w3kYczsH8Wvzs6MgryS4JYEF0N-8ZR6aLRSTU9KwVaGaydD6WwUHD7X9tN8vBgjgzKf3r3SJHusTYFjU85y
YfnunyCeuExcZs895JK9Fk1HS68ksGwm3QpxnRZvpDBAfJKEUKee2OTlND0gi43qwwtIPLeY1;
ASP.NET_SessionId=bmnbp155wilotk45gjhitoqg; DBAUTH12=2A848A8C200CB0E8E05C6EBA8059A0DBA228FC5F6EDD29401C249D2
37812344C15B3C5C57D6B776037FAA8F14017880E57BDC14A7963C58B0A0B30229
AF0123A6DF56601D814E75525E7DCA9AD4A0EF200832B39A1F35A5111092F0805B
0A8CD3D2FD5E3AB6176893D86AFBEB68F7EA42BE61E89537DEAA3279F3B576D0C
44BA00B9FA1D9DD3EE985F37B0A5A134ADC0EA9C548D

There are 517 bytes of worthless data being sent to every css, js and images from the browser to your webserver!

You might think 517 bytes is peanut. Do the math:

  • Avg page has 40 requests to server. 40 x 517 bytes = 20 KB per page view.
  • 1M page views = 20 GB
  • That’s 20GB of data getting uploaded to your server for just 1M page views. It does not take millions of users to produce 1M page views. Around 100k+ users using your site every day can produce 1M page views every day.

Here’s how to prevent this:

  • Setup a new website and map a different subdomain to it. If your main site is www.yoursite.com then map static.yoursite.com to it.
  • Manually change all the <link>, <script>, <img> css url(…) and prefix each resource with http://static.yoursite.com
  • If you don’t want to do it manually, use this solution I have done before.
  • Add a Global.asax and in the EndRequest do this trick:
    HttpContext context = HttpContext.Current;
    if (context.Request.Url.ToString.StartsWith("http://static.yoursite.com")
    {
      List<string> cookiesToClear = new List<string>();
      foreach (string cookieName in context.Request.Cookies)
      {
        HttpCookie cookie = context.Request.Cookies[cookieName];
        cookiesToClear.Add(cookie.Name);
      }
    
      foreach (string name in cookiesToClear)
      {
        HttpCookie cookie = new HttpCookie(name, string.Empty);
        cookie.Expires = DateTime.Today.AddYears(-1);
    
        context.Response.Cookies.Set(cookie);
      }
    }

    This code reads all the cookies it receives from request and expires them so that browser does not send those cookies again. If by any chance ASP.NET cookies get injected into the static.yoursite.com domain, this code will take care of removing them.

Tweaking WCF to build highly scalable async REST API

At 9 AM in the morning, during the peak traffic for your business, you get an emergency call that the website you built is no more. It’s not responding to any request. Some people can see some page after waiting for long time but most can’t. So, you think it must be some slow query or the database might need some tuning. You do the regular checks like looking at CPU and Disk on database server. You find nothing is wrong there. Then you suspect it must be webserver running slow. So, you check CPU and Disk on webservers. You find no problem there either. Both web servers and database servers have very low CPU and Disk usage. Then you suspect it must be the network. So, you try a large file copy from webserver to database server and vice versa. Nope, file copies perfectly fine, network has no problem. You also quickly check RAM usage on all servers but find RAM usage is perfectly fine. As the last resort, you run some diagnostics on Load Balancer, Firewall, and Switches but find everything to be in good shape. But your website is down. Looking at the performance counters on the webserver, you see a lot of requests getting queued, and there’s very high request execution time, and request wait time.

image001

So, you do an IIS restart. Your websites comes back online for couple of minutes and then it goes down again. After doing restart several times you realize it’s not an infrastructure issue. You have some scalability issue in your code. All the good things you have read about scalability and thought that those were fairy tales and they will never happen to you is now happening right in front of you. You realize you should have made your services async.

However, just converting your sync services to async mode does not solve the scalability problem. WCF has a bug due to which it cannot serve requests as fast as you would like it to. The thread pool it uses to handle the async calls cannot start threads as requests come in. It only adds a new thread to the pool every 500ms. As a result, you get slow rampup of threads:

image018

Read my article to learn details on how WCF works for async services and how to fix this bug to make your async services truly async and scale under heavy load.

http://www.codeproject.com/KB/webservices/fixwcf_for_restapi.aspx

Don’t forget to vote.

Build truly RESTful API and website using same ASP.NET MVC code

A truly RESTful API means you have unique URLs to uniquely represent entities and collections, and there is no verb/action on the URL. You cannot have URL like /Customers/Create or /Customers/John/Update, /Customers/John/Delete where the action is part of the URL that represents the entity. An URL can only represent the state of an entity, like /Customers/John represents the state of John, a customer, and allow GET, POST, PUT, DELETE on that very URL to perform CRUD operations. Same goes for a collection where /Customers returns list of customers and a POST to that URL adds new customer(s). Usually we create separate controllers to deal with API part of the website but I will show you how you can create both RESTful website and API using the same controller code working over the exact same URL that a browser can use to browse through the website and a client application can perform CRUD operations on the entities.

I have tried Scott Gu’s examples on creating RESTful routes, this MSDN Magazine article, Phil Haack’s REST SDK for ASP.NET MVC, and various other examples. But they have all made the same classic mistake – the action is part of the URL. You have to have URLs like http://localhost:8082/MovieApp/Home/Edit/5?format=Xml to edit a certain entity and define the format eg xml, that you need to support. They aren’t truly RESTful since the URL does not uniquely represent the state of an entity. The action is part of the URL. When you put the action on the URL, then it is straightforward to do it using ASP.NET MVC. Only when you take the action out of the URL and you have to support CRUD over the same URL, using three different formats – html, xml and json, it becomes tricky and you need some custom filters to do the job. It’s not very tricky though, you just need to keep in mind your controller actions are serving multiple formats and design your website in a certain way that makes it API friendly. You make the website URLs look like API URL.

The example code has a library of ActionFilterAttribute and ValurProvider that make it possible to serve and accept html, json and xml over the same URL. A regular browser gets html output, an AJAX call expecting json gets json response and an XmlHttp call gets xml response.

You might ask why not use WCF REST SDK? The idea is to reuse the same logic to retrieve models and emit html, json, xml all from the same code so that we do not have to duplicate logic in the website and then in the API. If we use WCF REST SDK, you have to create a WCF API layer that replicates the model handling logic in the controllers.

The example shown here offers the following RESTful URLs:

  • /Customers – returns a list of customers. A POST to this URL adds a new customer.
  • /Customers/C0001 – returns details of the customer having id C001. Update and Delete supported on the same URL.
  • /Customers/C0001/Orders – returns the orders of the specified customer. Post to this adds new order to the customer.
  • /Customers/C0001/Orders/O0001 – returns a specific order and allows update and delete on the same URL.

All these URLs support GET, POST, PUT, DELETE. Users can browse to these URLs and get html page rendered. Client apps can make AJAX calls to these URLs to perform CRUD on these. Thus making a truly RESTful API and website.

Customers

They also support verbs over POST in case you don’t have PUT, DELETE allowed on your webserver or through firewalls. They are usually disabled by default in most webservers and firewalls due to security common practices. In that case you can use POST and pass the verb as query string. For ex, /Customers/C0001?verb=Delete to delete the customer. This does not break the RESTfulness since the URL /Customers/C0001 is still uniquely identifying the entity. You are passing additional context on the URL. Query strings are also used to do filtering, sorting operations on REST URLs. For ex, /Customers?filter=John&sort=Location&limit=100 tells the server to return a filtered, sorted, and paged collection of customers.

Read my CodeProject article for details:

http://www.codeproject.com/KB/aspnet/aspnet_mvc_restapi.aspx

The source code is available here:

http://code.msdn.microsoft.com/Build-truly-RESTful-API-194a6253

Enjoy!

Automatic Javascript, CSS versioning to refresh browser cache

When you update javascript or css files that are already cached in users’ browsers, most likely many users won’t get that for some time because of the caching at the browser or intermediate proxy(s). You need some way to force browser and proxy(s) to download latest files. There’s no way to do that effectively across all browsers and proxies from the webserver by manipulating cache headers unless you change the file name or you change the URL of the files by introducing some unique query string so that browsers/proxies interpret them as new files. Most web developers use the query string approach and use a version suffix to send the new file to the browser. For example,

<script src="someJs.js?v=1001" ></script>
<link href="someCss.css?v=2001"></link>

In order to do this, developers have to go to all the html, aspx, ascx, master pages, find all references to static files that are changed, and then increase the version number. If you forget to do this on some page, that page may break because browser uses old cached script. So, it requires a lot of regression test effort to find out whether changing some css or js breaks something anywhere in the entire website.

Another approach is to run some build script that scans all files and updates the reference to the javascript and css files in each and every page in the website. But this approach does not work on dynamic pages where the javascript and css references are added at run-time, say using ScriptManager.

If you have no way to know what javascript and css will get added to the page at run-time, the only option is to analyze the page output at runtime and then change the javascript, css references on the fly.

Here’s an HttpFilter that can do that for you. This filter intercepts any ASPX hit and then it automatically appends the last modification date time of javascript and css files inside the emitted html. It does so without storing the whole generated html in memory nor doing any string operation because that will cause high memory and CPU consumption on webserver under high load. The code works with character buffers and response streams directly so that it’s as fast as possible. I have done enough load test to ensure even if you hit an aspx page million times per hour, it won’t add more than 50ms delay over each page response time.

First, you add set the filter called StaticContentFilter in the Global.asax file’s Application_BeginRequest event handler:

Response.Filter = new Dropthings.Web.Util.StaticContentFilter(
    Response,
    relativePath =>
      {
        if (Context.Cache[physicalPath] == null)
        {
          var physicalPath = Server.MapPath(relativePath);
          var version = "?v=" +
            new System.IO.FileInfo(physicalPath).LastWriteTime
            .ToString("yyyyMMddhhmmss");
          Context.Cache.Add(physicalPath, version, null,
            DateTime.Now.AddMinutes(1), TimeSpan.Zero,
            CacheItemPriority.Normal, null);
          Context.Cache[physicalPath] = version;
          return version;
        }
        else
        {
          return Context.Cache[physicalPath] as string;
        }
      },
    "http://images.mydomain.com/",
    "http://scripts.mydomain.com/",
    "http://styles.mydomain.com/",
    baseUrl,
    applicationPath,
    folderPath);
}

The only tricky part here is the delegate that is fired whenever the filter detects a script or css link and it asks you to return the version for the file. Whatever you return gets appended right after the original URL of the script or css. So, here the delegate is producing the version as “?v=yyyyMMddhhmmss” using the file’s last modified date time. It’s also caching the version for the file to make sure it does not make a File I/O request on each and every page view in order to get the file’s last modified date time.

For example, the following scripts and css in the html snippet:

<script type="text/javascript" src="scripts/jquery-1.4.1.min.js" ></script>
<script type="text/javascript" src="scripts/TestScript.js" ></script>
<link href="Styles/Stylesheet.css" rel="stylesheet" type="text/css" />

It will get emitted as:

<script type="text/javascript" src="scripts/jquery-1.4.1.min.js?v=20100319021342" ></script>
<script type="text/javascript" src="scripts/TestScript.js?v=20110522074353" ></script>
<link href="Styles/Stylesheet.css?v=20110522074829" rel="stylesheet" type="text/css" />

As you see there’s a query string generated with each of the file’s last modified date time. Good thing is you don’t have to worry about generating a sequential version number after changing a file. it will take the last modified date, which will change only when a file is changed.

The HttpFilter I will show you here can not only append version suffix, it can also prepend anything you want to add on image, css and link URLs. You can use this feature to load images from a different domain, or load scripts from a different domain and benefit from the parallel loading feature of browsers and increase the page load performance. For example, the following tags can have any URL prepended to them:

<script src="some.js" ></script>
<link href="some.css" />
<img src="some.png" />

They can be emitted as:

<script src="http://javascripts.mydomain.com/some.js" ></script>
<link href="http://styles.mydomain.com/some.css" />
<img src="http://images.mydomain.com/some.png" />

Loading javascripts, css and images from different domain can significantly improve your page load time since browsers can load only two files from a domain at a time. If you load javascripts, css and images from different subdomains and the page itself on www subdomain, you can load 8 files in parallel instead of only 2 files in parallel.

Read here to learn how this works:

http://www.codeproject.com/KB/aspnet/autojscssversion.aspx

Appreciate your feedback.