



You may not have heard this name before. In fact, I Googled it and got 0 responses. However the pattern is very important and is very well publicised in Azure circles. Since there seems to be no actual name for this pattern, I’m seeking to give it a name, so that we can all speak a common language. First I’ll explain the pattern in concept, then I’ll explain it in the context of Azure.
The Asynchronous Work Queue Pattern allows workers to pull work items that are guaranteed to be unique from a robust, redundant queuing mechanism, in a fashion that is ignorant to leasing and locking of the work items provided. In other words, the leasing and locking functionality is removed from the worker which can concentrate on the work to be done, and the queue guarantees that no work item enqueued will ever be dequeued more than once.
As a developer I’ve worked on a lot of large systems and have often found myself dealing with the problem of resource contention. Whether multiple threads are trying to access a resource for the same reason, or perhaps two different tiers want the same piece of information for two different reasons, sharing resources can be hard.
Imagine an event that occurs as the result of some interaction with a website. That event might require some data to be saved, an email to be sent, a log service to be called, and a bunch of other things. We never want this to happen all in the original request; we like our UI to be responsive, otherwise the user is just going to press the submit button again right?
To get around this problem we create a WorkItem class and our UI thread now saves a work item. Running on a Windows Service in the background (probably on another server) we have a work processor whose job it is to check the work item table every 15 minutes and pickup any work items that need doing and process them. We sit back and put up our feet, comfortable that our separation of work items from the UI has made our application extremely responsive, and added some robustness to boot.
A month goes by and all of a sudden the sales team lands three massive clients and our site traffic has increased ten fold! Our web front end is doing great though, especially since it can just hand off work items and return control to the user very quickly. However the backend Windows Service is choking under the pressure and work items are coming in faster than it can process them.
No problem, we decide to add a second server and install the Windows Service there as well. But wait; this won’t work will it? Both services hit the database and get the next work item; they get the SAME work item! So now we need to consider taking a lease over a certain number of work items to indicate that they are being processed. We pick an arbitrary number; each service will pick off ten work items and put a flag next to them saying they are being processed. In doing so we realise we are polluting our data schema with information about how the data is being used, but we really have no other choice.
Of course we then ponder; what happens if they both query and get work items at the same time? They might not have the flag and could therefore still process the same records. So now we need a double verification. The complexity grows.
By providing a queue implementation that ensures that the ‘next’ work item cannot be dequeued by more than one requester, the workers can focus on the work that needs to be done and can remove complex code pollution required when worrying about leases.
The primary advantage of this approach is that it scales extremely well. In the problem scenario depicted above, adding another 20 windows services will result in each other service slowing down because more lease checking occurs when looking for free work items to process, and as a result less work gets done. But in the queue scenario, 20 times more services will mean 20 times more productivity.
In Azure the equivalent scenario would be worker roles with multiple instances. Windows Azure Storage provides a highly scalable Queue that is accessible via REST. In essence, the Windows Azure Storage Queue service was designed specifically for asynchronous work distribution/consumption. Each retrieval of a work item from the queue is guaranteed to be unique, except where the worker fails to notify the queue of successful processing, in which case the work item is automatically re-enqueued after a certain amount of time. This ensures the work item is not lost due to a worker failure. Also, Azure Queues are at least three times redundant, ensuring no work item is ever lost.
In some example code I’ve posted from previous presentations, there is an application that searches for images based on their colour content. The Asynchronous Work Queue Pattern is applied in this example application; a search is made against the Flickr API for a specific keyword and a bunch of results are returned. Each result is placed into a queue, and multiple workers listen at the other end, waiting to pick up an image that they will chop up and analyse for colour content.
To be honest I don’t care what its called; its just important that you are aware of the pattern and know why to use it.
It was just pointed out to me by Paul Stovell that this fits very closely with the Message Dispatcher pattern as identified in the book Enterprise Integration Patterns. The key difference is that the dispatcher pushes work to the consumers whereas workers will pull the work load from the queue instead. The message dispatcher pattern makes the assumption that the message channel is in fact dumb, whereas Windows Azure Storage Queues are smart and can ensure no message duplication, as well as reliable messaging, even in the event of consumer failure. Currently I believe these are still separate patterns however am happy to have the discussion over a beer or two.




According to MSDN, Windows Azure Storage “provides persistent, redundant storage in the cloud”. Microsoft’s goal is to create storage that is durable and secure, scalable and efficient all at once. In the current CTP of Azure you can store your data in Windows Azure 3 different ways:
I’ll aim to dive into each of these in more detail individually over the coming weeks but for the rest of this post will discuss an overview of the storage mechanisms and methods of accessing the storage.
Windows Azure Storage allows you to store data for any length of time and to store any amount of data. Currently there is a lock at 50Gb of storage but for a CTP this should be pretty sufficient. In the future this will scale (at a cost obviously) to as much as you need.
Windows Azure Storage can be geo located, meaning you can choose which region it can be hosted. As previously discussed, you can also associate your data with your Azure services through an ‘Affinity Group’ which helps the Azure Fabric Controller deploy your services and data into a similar place within the data centre. The Azure data-centre’s are so large that having data and services located near each other (network hops) can drastically improve performance (lower communication times).
Its important to ensure you understand that Windows Azure Storage is not the same as SQL Data Services (SDS). If you want a relational data store in the cloud, you should be looking at SDS. Windows Azure Storage is about delivering quick access storage directly to your services, intercommunication between services, and state representation for your application.
Windows Azure Storage is a type of project that you can add in your Azure portal. In the current CTP, you can only add two storage projects. As with adding a service project, you get to select a name that will resolve the services for your storage.
This will result in a set of services to access blob, queue and table storage. You will also be provided with a key that can be used to authenticate you to your storage, allowing you to safely insert, update, and delete data from your applications in a RESTful manner.
Over the next few posts I’ll attempt to break down each of the types of storage, and how to access them through the REST API.


More Options ...

Categories
Tag Cloud
Blog RSS
Comments RSS

Void
Life
Earth
Wind « Default
Water
Fire
Light 