At 9 AM in the morning, during the peak traffic for your business, you get an emergency call that the website you built is no more. It’s not responding to any request. Some people can see some page after waiting for long time but most can’t. So, you think it must be some slow query or the database might need some tuning. You do the regular checks like looking at CPU and Disk on database server. You find nothing is wrong there. Then you suspect it must be webserver running slow. So, you check CPU and Disk on webservers. You find no problem there either. Both web servers and database servers have very low CPU and Disk usage. Then you suspect it must be the network. So, you try a large file copy from webserver to database server and vice versa. Nope, file copies perfectly fine, network has no problem. You also quickly check RAM usage on all servers but find RAM usage is perfectly fine. As the last resort, you run some diagnostics on Load Balancer, Firewall, and Switches but find everything to be in good shape. But your website is down. Looking at the performance counters on the webserver, you see a lot of requests getting queued, and there’s very high request execution time, and request wait time.
So, you do an IIS restart. Your websites comes back online for couple of minutes and then it goes down again. After doing restart several times you realize it’s not an infrastructure issue. You have some scalability issue in your code. All the good things you have read about scalability and thought that those were fairy tales and they will never happen to you is now happening right in front of you. You realize you should have made your services async.
However, just converting your sync services to async mode does not solve the scalability problem. WCF has a bug due to which it cannot serve requests as fast as you would like it to. The thread pool it uses to handle the async calls cannot start threads as requests come in. It only adds a new thread to the pool every 500ms. As a result, you get slow rampup of threads:
Read my article to learn details on how WCF works for async services and how to fix this bug to make your async services truly async and scale under heavy load.
http://www.codeproject.com/KB/webservices/fixwcf_for_restapi.aspx
Don’t forget to vote.
My takeaway from this is that WCF is a low-quality framework (needlessly complex for relatively simple tasks) and it’s better to avoid using it…