Simple way to cache objects and collections for greater performance and scalability

Caching of frequently used data greatly increases the
scalability of your application since you can avoid repeated
queries on database, file system or to webservices. When objects
are cached, it can be retrieved from the cache which is lot faster
and more scalable than loading from database, file or web service.
However, implementing caching is tricky and monotonous when you
have to do it for many classes. Your data access layer gets a whole
lot of code that deals with caching objects and collection,
updating cache when objects change or get deleted, expire
collections when a contained object changes or gets deleted and so
on. The more code you write, the more maintenance overhead you add.
Here I will show you how you can make the caching a lot easier
using Linq to SQL and my library AspectF. It’s a
library that helps you get rid of thousands of lines of repeated
code from a medium sized project and eliminates plumbing (logging,
error handling, retrying etc) type code completely.

Here’s an example how caching significantly improves the
performance and scalabitlity of applications. Dropthings – my
open source Web 2.0 AJAX portal, without caching can only serve
about 11 request/sec with 10 concurrent users on a dual core 64 bit
PC. Here data is loaded from database as well as from external
sources. Avg page response time is 1.44 sec.


Load Test Without Cache

After implementing caching, it became significantly faster,
around 32 requests/sec. Page load time decreased
significantly as well to 0.41 sec only. During the
load test, CPU utilization was around 60%.


Load Test with in memory cache

It shows clearly the significant difference it can make to your
application. If you are suffering from poor page load performance
and high CPU or disk activity on your database and application
server, then caching Top 5 most frequently used objects in your
application will solve that problem right away. It’s a quick
win to make your application a lot faster than doing complex
re-engineering in your application.

Common approaches to caching objects and
collections

Sometimes the caching can be simple, for example caching a
single object which does not belong to a collection and does not
have child collections that are cached separately. In such case,
you write simple code like this:

  • Is the object being requested already in cache?
    • Yes, then serve it from cache.
    • No, then load it from database and then cache it.

On the other hand, when you are dealing with cached collection
where each item in the collection is also cached separately, then
the caching logic is not so simple. For example, say you have
cached a User collection. But each User
object is also cached separately because you need to load
individual User objects frequently. Then the caching
logic gets more complicated:

  • Is the collection being requested already in cache?
    • Yes. Get the collection. For each object in the collection:
      • Is that object individually available in cache?
        • Yes, get the individual object from cache. Update it in the
          collection.
        • No, discard the whole collection from cache. Go to next
          step:
    • No. Load the collection from source (eg database) and cache
      each item in the collection separately. Then cache the
      collection.

You might be thinking why do we need to read each individual
item from cache and why do we need to cache each item in collection
separarely when the whole collection is already in cache? There are
two scenarios you need to address when you cache a collection and
individual items in that collection are also cached separately:

  • An individual item has been updated and the updated item is in
    cache. But the collection, which contains all those individual
    items, has not been refreshed. So, if you get the collection from
    cache and return as it is, you will get stale individual items
    inside that collection. This is why each item needs to be retrieved
    from cache separately.
  • An item in the collection may have been force expired in cache.
    For ex, something changed in the object or the object has been
    deleted. So, you expired it in cache so that on next retrieval it
    comes from database. If you load the collection from cache only,
    then the collection will contain the stale object.

If you are doing it the conventional way, you will be writing a
lot of repeated code in your data access layer. For example, say
you are loading a Page collection that belongs to a
user. If you want to cache the collection of Page for
a user as well as cache individual Page objects so
that each Page can be retrieved from Cache directly.
Then you need to write code like this:

public List<Page> GetPagesOfUserOldSchool(Guid userGuid)
{
    ICache cache = Services.Get<ICache>();
    bool isCacheStale = false;
    string cacheKey = CacheSetup.CacheKeys.PagesOfUser(userGuid);
    var cachedPages = cache.Get(cacheKey) as List<Page>;
    if (cachedPages != null)
    {
        var resultantPages = new List<Page>();
        // If each item in the collection is no longer in cache, invalidate the collection
        // and load again.
        foreach (Page cachedPage in cachedPages)
        {
            var individualPageInCache = cache.Get(CacheSetup.CacheKeys.PageId(cachedPage.ID)) as Page;
            if (null == individualPageInCache)
            {
                // Some item is missing in cache. So, the collection is stale.
                isCacheStale = true;
            }
            else
            {
                resultantPages.Add(individualPageInCache);
            }
        }

        cachedPages = resultantPages;
    }

    if (isCacheStale)
    {
        // Collection not cached. Need to load collection from database and then cache it.
        var pagesOfUser = _database.GetList<Page, Guid>(...);
        pagesOfUser.Each(page =>
        {
            page.Detach();
            cache.Add(CacheSetup.CacheKeys.PageId(page.ID), page);
        });
        cache.Add(cacheKey, pagesOfUser);
        return pagesOfUser;
    }
    else
    {
        return cachedPages;
    }
}

Imagine writing this kind of code over and over again for each
and every entity that you want to cache. This becomes a maintenace
nightmare as your project grows.

Here’s how you could do it using AspectF:

public List<Page> GetPagesOfUser(Guid userGuid)
{
    return AspectF.Define
        .CacheList<Page, List<Page>>(Services.Get<ICache>(), 
CacheSetup.CacheKeys.PagesOfUser(userGuid),
page => CacheSetup.CacheKeys.PageId(page.ID)) .Return<List<Page>>(() => _database.GetList<Page, Guid>(...).Select(p => p.Detach()).ToList()); }

Instead of 42 lines of code, you can do it in 5 lines!

Read my article Simple
way to cache objects and collections for greater performance and
scalability
on CodeProject and learn:

  • Caching Linq to SQL entities
  • Handling update and delete scenarios
  • Expiring dependent objects and collections in cache
  • Handling objects that’s cached with multiple keys
  • Avoid database query optimizations when you cache sets of
    data

Enjoy. Don’t forget to vote for me!

7 thoughts on “Simple way to cache objects and collections for greater performance and scalability”

  1. Thank you for submitting this cool story – Trackback from PimpThisBlog.com

  2. Is using LINQ a good idea now? I am asking this because, on a conference you told us about not using LINQ in our product code.

Leave a Reply