



You may not have heard this name before. In fact, I Googled it and got 0 responses. However the pattern is very important and is very well publicised in Azure circles. Since there seems to be no actual name for this pattern, I’m seeking to give it a name, so that we can all speak a common language. First I’ll explain the pattern in concept, then I’ll explain it in the context of Azure.
The Asynchronous Work Queue Pattern allows workers to pull work items that are guaranteed to be unique from a robust, redundant queuing mechanism, in a fashion that is ignorant to leasing and locking of the work items provided. In other words, the leasing and locking functionality is removed from the worker which can concentrate on the work to be done, and the queue guarantees that no work item enqueued will ever be dequeued more than once.
As a developer I’ve worked on a lot of large systems and have often found myself dealing with the problem of resource contention. Whether multiple threads are trying to access a resource for the same reason, or perhaps two different tiers want the same piece of information for two different reasons, sharing resources can be hard.
Imagine an event that occurs as the result of some interaction with a website. That event might require some data to be saved, an email to be sent, a log service to be called, and a bunch of other things. We never want this to happen all in the original request; we like our UI to be responsive, otherwise the user is just going to press the submit button again right?
To get around this problem we create a WorkItem class and our UI thread now saves a work item. Running on a Windows Service in the background (probably on another server) we have a work processor whose job it is to check the work item table every 15 minutes and pickup any work items that need doing and process them. We sit back and put up our feet, comfortable that our separation of work items from the UI has made our application extremely responsive, and added some robustness to boot.
A month goes by and all of a sudden the sales team lands three massive clients and our site traffic has increased ten fold! Our web front end is doing great though, especially since it can just hand off work items and return control to the user very quickly. However the backend Windows Service is choking under the pressure and work items are coming in faster than it can process them.
No problem, we decide to add a second server and install the Windows Service there as well. But wait; this won’t work will it? Both services hit the database and get the next work item; they get the SAME work item! So now we need to consider taking a lease over a certain number of work items to indicate that they are being processed. We pick an arbitrary number; each service will pick off ten work items and put a flag next to them saying they are being processed. In doing so we realise we are polluting our data schema with information about how the data is being used, but we really have no other choice.
Of course we then ponder; what happens if they both query and get work items at the same time? They might not have the flag and could therefore still process the same records. So now we need a double verification. The complexity grows.
By providing a queue implementation that ensures that the ‘next’ work item cannot be dequeued by more than one requester, the workers can focus on the work that needs to be done and can remove complex code pollution required when worrying about leases.
The primary advantage of this approach is that it scales extremely well. In the problem scenario depicted above, adding another 20 windows services will result in each other service slowing down because more lease checking occurs when looking for free work items to process, and as a result less work gets done. But in the queue scenario, 20 times more services will mean 20 times more productivity.
In Azure the equivalent scenario would be worker roles with multiple instances. Windows Azure Storage provides a highly scalable Queue that is accessible via REST. In essence, the Windows Azure Storage Queue service was designed specifically for asynchronous work distribution/consumption. Each retrieval of a work item from the queue is guaranteed to be unique, except where the worker fails to notify the queue of successful processing, in which case the work item is automatically re-enqueued after a certain amount of time. This ensures the work item is not lost due to a worker failure. Also, Azure Queues are at least three times redundant, ensuring no work item is ever lost.
In some example code I’ve posted from previous presentations, there is an application that searches for images based on their colour content. The Asynchronous Work Queue Pattern is applied in this example application; a search is made against the Flickr API for a specific keyword and a bunch of results are returned. Each result is placed into a queue, and multiple workers listen at the other end, waiting to pick up an image that they will chop up and analyse for colour content.
To be honest I don’t care what its called; its just important that you are aware of the pattern and know why to use it.
It was just pointed out to me by Paul Stovell that this fits very closely with the Message Dispatcher pattern as identified in the book Enterprise Integration Patterns. The key difference is that the dispatcher pushes work to the consumers whereas workers will pull the work load from the queue instead. The message dispatcher pattern makes the assumption that the message channel is in fact dumb, whereas Windows Azure Storage Queues are smart and can ensure no message duplication, as well as reliable messaging, even in the event of consumer failure. Currently I believe these are still separate patterns however am happy to have the discussion over a beer or two.




I’ve seen a few demos of Windows Azure and one of the common themes I see around the worker role is that people want to demonstrate scalability through increasing the number of instances in their service definition. That’s fine but we need to also remember that we are scaling up an entire virtual machine each time we increase our instance count, and maybe that just isn’t necessary when we consider parallelization instead.
When we create a new worker role, we get some boiler plate code that guides us into overriding the “Start” and “GetHealthStatus” methods. The latter is irrelevant to this discussion; the former is more important. The default template constructs the Start method in a way that indicates it should do some work and then Sleep.
public override void Start()
{
while (true)
{
// Do Work
Thread.Sleep(10000);
}
}
The problem with this convention is two-fold. First, it assumes that one worker should be performing one role, but that’s not the case a lot of the time. Secondly, it doesn’t leverage parallel processing, in particular multi-threading.
The next obvious step is to break up your Start method and kick of multiple threads, each working on their own little piece of the application. But this kind of code is quite common, and I thought, wouldn’t it be better if we could just focus on what our individual application roles are without needing to think about how many workers/instances we need, and whether or not each of those roles should be run multi-threaded?
Thus, the need for a framework (although I shudder at the term) to help us separate that logic, and to provide some common classes to aid with parallelism. I’m not great with names, but for this release, I’ll call it: “WorkSharing”.
The WorkSharing assemblies let you focus more on the roles of your application, the work that actually needs to be done, and it keeps this separate from when/where the work occurs.
You begin as usual by creating a Cloud application with a worker role. For the moment we don’t need to think about the worker role itself; instead we want to focus on what actual work our application needs to perform. We create a new class for every unit of work we need to do. In this example we’ll save time and create just one worker who will be instantiated a few different times.
public class MyCustomWorker : IWorker
{
private readonly string name;
private static Random random = new Random();public MyCustomWorker(string name)
{
this.name = name;
}public bool HasWorkToDo() { return true; }
public void DoWork()
{
RoleManager.WriteToLog("Information", name + ": Before sleep");
var wait = random.Next(700, 1200);
Thread.Sleep(wait);
RoleManager.WriteToLog("Information", name + ": After sleep");
}
}
The key is implementing the IWorker interface – this forces your class to expose two methods: “HasWorkToDo” and “DoWork”, both fairly self explanatory. The actual work that our class performs is two log entries with a random period of sleep in between. This will help us show the async work being performed later on.
Here is where you would define all the workers for your application. These don’t have to be in the same project as your cloud worker role – in fact it usually won’t be. If the roles are small but numerous, I’d put them in their own project. If they are each very large, you’d probably have one project per role. Either way, we’ve focused our attention on the work that needs to be done without coupling it with the “where” and “when”.
Heading back to our Azure worker role, we need to make some changes to the default template code. Strip out the override methods (although you can leave the health status override if you have special logic that needs to go there). We then change the inheritance of the worker role. Currently it inherits from ‘RoleEntryPoint’ but we will change this to ‘WorkSharingRole’ instead – this is a new class in the WorkSharing assemblies.
We want a parameterless constructor for our worker role as well, and it is in here that we define what it needs to do and how to do it. Here is some code illustrating sample usage of our custom worker:
public class WorkerRole : WorkSharingRole
{
public WorkerRole()
{
base.SleepTime = 1000;
base.PrimaryWorker = new MyCustomWorker("Custom");
}
}
Let’s step through the code. First, you can see that our worker inherits from ‘WorkSharingRole’ as mentioned before, and we have a parameterless constructor where we refer to two properties on the base class. ‘SleepTime’ is essentially the period (in milliseconds) between “executions” (a term I will explain further in a moment). The ‘PrimaryWorker’ property is of type IWorker which means we can assign our custom worker to it. That’s all that’s needed to get up and running, and when we fire it up, we can see in our development fabric the output from our worker role:
07/13/2009 03:48:38:648,Event=Information,Level=Info,ThreadId=4316,=Custom: Before sleep
07/13/2009 03:48:39:695,Event=Information,Level=Info,ThreadId=4316,=Custom: After sleep
07/13/2009 03:48:40:695,Event=Information,Level=Info,ThreadId=4316,=Working
07/13/2009 03:48:40:695,Event=Information,Level=Info,ThreadId=4316,=Custom: Before sleep
07/13/2009 03:48:41:744,Event=Information,Level=Info,ThreadId=4316,=Custom: After sleep
Ok so there’s nothing special there just yet, we haven’t really change much of the face of our app except that we’ve hidden away the loop. However there are a number of other IWorker implementations that are part of the WorkSharing assemblies, and it is these guys that will let you easily parallelize your work. Consider this change to the constructor:
public WorkerRole()
{
base.SleepTime = 1000;
base.PrimaryWorker = new WeightedWorker()
.Add(new MyCustomWorker("AAA"), 2)
.Add(new WaitingAsyncWorker()
.Add(new MyCustomWorker("BBB"))
.Add(new MyCustomWorker("CCC")));
}
Here we’ve introduced two new classes: WeightedWorker and WaitingAsyncWorker. The WeightedWorker allows us to state how often the worker role should be focusing on this particular task. it acts much like a ratio. In the above example, the WeightedWorker accepts a custom worker called ‘AAA’ with a weight of “2”, and a WaitingAsyncWorker with a weight of “1”. This means the custom worker will be called twice as often.
The WaitingAsyncWorker is one of 2 kinds of async workers that let you parallelize your work. In this case, it takes two custom workers, ‘BBB’ and ‘CCC’ which it will execute side-by-side. The WaitingAsyncWorker will then wait for both its children to finish executing.
Here’s the output you would see in the development fabric:
07/13/2009 02:37:06:688,Event=Information,Level=Info,ThreadId=5624,=AAA: Before sleep
07/13/2009 02:37:07:755,Event=Information,Level=Info,ThreadId=5624,=AAA: After sleep
07/13/2009 02:37:07:755,Event=Information,Level=Info,ThreadId=3528,=BBB: Before sleep
07/13/2009 02:37:07:755,Event=Information,Level=Info,ThreadId=5452,=CCC: Before sleep
07/13/2009 02:37:08:606,Event=Information,Level=Info,ThreadId=5452,=CCC: After sleep
07/13/2009 02:37:08:947,Event=Information,Level=Info,ThreadId=3528,=BBB: After sleep
07/13/2009 02:37:08:947,Event=Information,Level=Info,ThreadId=5624,=AAA: Before sleep
07/13/2009 02:37:10:061,Event=Information,Level=Info,ThreadId=5624,=AAA: After sleep07/13/2009 02:37:11:062,Event=Information,Level=Info,ThreadId=5624,=AAA: Before sleep
07/13/2009 02:37:12:236,Event=Information,Level=Info,ThreadId=5624,=AAA: After sleep
07/13/2009 02:37:12:236,Event=Information,Level=Info,ThreadId=3528,=BBB: Before sleep
07/13/2009 02:37:12:236,Event=Information,Level=Info,ThreadId=5452,=CCC: Before sleep
07/13/2009 02:37:13:291,Event=Information,Level=Info,ThreadId=3528,=BBB: After sleep
07/13/2009 02:37:13:413,Event=Information,Level=Info,ThreadId=5452,=CCC: After sleep
07/13/2009 02:37:13:413,Event=Information,Level=Info,ThreadId=5624,=AAA: Before sleep
07/13/2009 02:37:14:245,Event=Information,Level=Info,ThreadId=5624,=AAA: After sleep
As you can see, the ‘AAA’ task executes twice, whilst tasks ‘BBB’ and ‘CCC’ could happen in any order and are both starting before the other finishes.
There are 2 other predefined workers in the WorkSharing framework. The ContinuousAsyncWorker also operates in an asynchronous fashion, however where the WaitingAsyncWorker will wait for all its children to finish before itself finishing, the ContinuousAsyncWorker will continuously keep working all of its children over and over. When each child finishes its work, it will kick off a new thread to start it again. This makes it a little different from all the other workers because it never really finishes its DoWork method – it just keeps kicking off its children over and over. For this reason, the DoWork method will return immediately after being called.
The final worker worth mentioning (or perhaps not really) is the LinearWorker which simply performs a number of tasks in the order defined. You can think of it as similar to the WeightedWorker except that it only fires off its children once each.
Yes and a meager one at that. Only the Add method will return the same object, letting you chain your adds together. But more important here is that you can make all workers children of other workers. In the above example we saw a WeightedWorker own a WaitingAsyncWorker. You could have any number of workers owning any others, as many levels deep as you want. It also lets you be creative – you might need to dynamically discover what work this worker will perform.
Lets take the WaitingAsyncWorker from our example. The total amount of work that needs to be done by this worker is:
After this, it has completed its DoWork method, which means it has completed its ‘Execution’. It may or may not get fired again. Now consider the WeightedWorker – its total sum of work is:
… after which it has completed its execution. However in this example, its execution includes the execution of the WaitingAsyncWorker and all its children.
The top of the hierarchy is always the PrimaryWorker property of the WorkSharingRole class. Therefore its execution includes the execution of both the WeightedWorker and the WaitingAsyncWorker and all their children.
Between each execution of the PrimaryWorker there is a user defined pause. This is simply a Thread.Sleep call and defaults to 0 which means it won’t sleep between executions (which is a perfectly valid scenario).
When the base WorkSharingRole class starts an execution, it calls the HasWorkToDo method on the PrimaryWorker. If the result is true then it proceeds to call the DoWork method. In the case of the prebaked workers, all calls to HasWorkToDo will actually forward the checks onto their children. If any child has work to do, it returns true – this is important because these workers don’t have any work of their own to do, they just call out to their children.
Likewise, when DoWork is called on one of the prebaked workers, it will in turn call the DoWork methods on its children (as per the descriptions given of each earlier).
You can download the framework here (full source and example applications including an Azure worker and a console app). Its free to use but I’d like to know how you think it can be improved. So far here is the list of things I’d like to add:
Please let me know what you think and have fun playing.


More Options ...

Categories
Tag Cloud
Blog RSS
Comments RSS

Void
Life
Earth
Wind « Default
Water
Fire
Light 