



Is this the end of the road for this blog?
Windows Azure:
Sql Azure:
AppFabric:
To summarise: I’m read-only, disabled, and inactive.
I’m now officially taking votes for the next technology I should start evangelising. Perhaps I’ll become a functional programmer, or perhaps write DSLs instead. Or maybe switch to Ruby, I’ve heard you become rich over night if you write a cool Rails app.
However word on the street is that Windows Azure is now productised and has been handed to hosting providers to ‘trial’. Well I own 7 (working) machines at my house, surely I can build my own platform as a service?
The alternative is to start advertising on this blog and generate some revenue so that I can continue to explore the Windows Azure Platform via my wallet. Could you, the reader, handle in your face advertisements?
I’m open to suggestions.




There are three kinds of storage in Windows Azure: Tables, Blobs, and Queues. Blobs are binary large objects and Queues are robust enterprise level communication queues. Tables are non-relational entity storage mechanisms. All storage is three times redundant and available via REST using the ATOM format.
Tables can store multiple entities with different kinds of shapes. That is to say, you can safely store 2 objects in a table that look completely different. For example, a Product entity might have a name and a category, whereas a User entity might have name, date of birth, and login name properties. Despite the difference between these two objects, they can both be stored in the same table in Windows Azure Table Storage.
Some of the reasons we prefer to use Table storage over other database mechanisms (such as Sql Azure) is that it is optimised for performance and scalability. It achieves this through an innate partitioning mechanism based on an extra property assigned to the object, called ‘Partition Key’. Five objects each with different partition keys that are otherwise identical, will be stored on five different storage nodes.
There are a number of ways we can get and put entities into our table storage, and this article will address a few. However before we investigate some scenarios, we need to setup our table and the entities that will go into it.
Before we view the ways we can interact with entities in a table, we must first setup the table. We are going to create a table called ‘Products’ for the purposes of this article. Here is some code that demonstrates how (this could go in Session or Application start events or anywhere you see fit).
var account = CloudStorageAccount.FromConfigurationSetting(“ProductStorage”);
var tableClient = account.CreateCloudTableClient();
tableClient.CreateTableIfNotExist(“Products”);
The above code assumes you have a connection string configured already called ‘ProductStorage’ which points to your Windows Azure Storage account (local development storage works just as well for testing purposes).
For the purposes of this article we are going to put an entity called ‘Product’ into the table. That entity can be a simple POCO (plain old CLR object); only the publicly accessible properties will be persisted and retrievable however. Lets define a simple product entity with a name, category and a price. However all objects stored in tables must also have a row key, partition key, and a time stamp, otherwise we will get errors when we try to persist the item. Here’s our product class:
public class Product { // Required public DateTime Timestamp { get; set; } public string PartitionKey { get; set; } public string RowKey { get; set; } // Optional public string Name { get; set; } public string Category { get; set; } public double Price { get; set; } }
Pretty simple eh? However we can clean up some of the code; because the Timestamp, PartitionKey, and RowKey are all required for every single table entity, we could pull those properties out into a base entity class. However we don’t have to; there already exists one in the StorageClient namespace called ‘TableServiceEntity’. It has the following definition:
[CLSCompliant(false)] [DataServiceKey(new string[] {"PartitionKey", "RowKey"})] public abstract class TableServiceEntity { protected TableServiceEntity(string partitionKey, string rowKey); protected TableServiceEntity(); public DateTime Timestamp { get; set; } public virtual string PartitionKey { get; set; } public virtual string RowKey { get; set; } }
It makes sense for us to inherit from this class instead. We’ll also follow the convention of having a partition key and row key injected in the constructor on our Product class, while also leaving a parameterless constructor for serialisation reasons:
public class Product : TableServiceEntity { public Product() { } public Product(string partitionKey, string rowKey) : base(partitionKey, rowKey) {} public string Name { get; set; } public string Category { get; set; } public double Price { get; set; } }
Done. We’ll use this Product entity from now on. All scenarios below will use a test Product with the following information:
var testProduct = new Product("PK", "1") { Name = "Idiots Guide to Azure", Category = "Book", Price = 24.99 };
The easiest way to get started with basic CRUD methods for our entity is by using a specialised ‘Data Service Context’. The Data Service Context is a special class belonging to the WCF Data Services client namespace (System.Data.Services.Client) and relates to a specific technology for exposing and consuming entities in a RESTful fashion. Read more about WCF Data Services here.
In a nutshell, a Data Service Context lets us consume a REST based entity (or list of entities) and that logic is given to us for free in the ‘DataServiceContext’ class, which can be found in the afore mentioned System.Data.Services.Client namespace (you’ll probably need to add a reference). Consuming RESTful services is not an Azure specific thing, which is why we need to import this new namespace.
Because table storage entities act exactly like other RESTful services, we can use a data services context to interact with our entity. Tables and their entities have a few additional bits surrounding them (such as credential information like the 256bit key needed to access the table storage) so we need to be able to include this information with our data context. The Azure SDK makes this easy by providing a class derived from DataServiceContext called ‘TableServiceContext’. You’ll notice that to instantiate one of these we need to pass it a base address (our storage account) and some credentials.
If you review some of the original code above, you’ll notice we created a CloudTableClient based on connection string information in our configuration file. That same table client instance has the ability to create our TableServiceContext, using the code below:
var context = tableClient.GetDataServiceContext();
That’s it! All the explanation above just for one line of code eh? Well hopefully you understand what’s happening when we get that context. It is generating a TableServiceContext which inherits from DataServiceContext which contains all the smarts for communicating to our storage table. Simple.
Now we can call all sorts of methods to create/delete/update our products. We’ll use the ‘testProduct’ defined earlier:
context.AddObject("Products", testProduct); context.SaveChanges(); testProduct.Price = 21.99; context.UpdateObject(testProduct); context.SaveChanges(); var query = context.CreateQuery<Product>("Products"); query.AddQueryOption("Rowkey", "1"); var result = query.Execute().FirstOrDefault(); context.DeleteObject(result); context.SaveChanges();
The methods being called here only know about ‘object’, not ‘Product’ and are therefore not type safe. We’ll look at a more type safe example in the next scenario.
In the previous example we saw that the Table Service Context was a generic way to get going with table entities quickly. This works well because we can put any type of entity into the table via the same ‘AddObject’ method. However sometimes in code we like to be more type safe than that and want to enforce that a particular table only accepts certain objects. Or perhaps we want unique data access classes for our different entity types so that we can put some validation in.
Either way, this is relatively easy to achieve by creating our own Data Service Context class. We still need to wrap up table storage credentials, so its actually easier if we inherit from TableServiceContext, as follows:
public class ProductDataContext : TableServiceContext { public ProductDataContext(string baseAddress, StorageCredentials credentials) : base(baseAddress, credentials) { } // TODO }
The base constructor of TableServiceContext requires us to supply a base address and credentials, so we simply pass on this requirement. Our constructor doesn’t need to do anything else though.
The next step is to start adding methods to this new class that perform the CRUD operations we require. Let’s start with a simple query:
public IQueryable<Product> Products { get { return CreateQuery<Product>("Products"); } }
This will give us a ‘Products’ property on our ProductDataContext that will allow us to query against the product set using LINQ. We’ll see an example of that in a minute. For now, we’ll add in some strongly typed wrappers for the other CRUD behaviours:
public void Add(Product product) { AddObject("Products", product); } public void Delete(Product product) { DeleteObject(product); } public void Update(Product product) { UpdateObject(product); }
Nothing very special there, but at least we can enforce a particular type now. Let’s see how this might work in code to make calls to our new data context. As before we’ll assume the table client has already been created from configuration (see ‘Setting Up The Table’ above) and we’ll use the same test product as before:
var context = new ProductDataContext( tableClient.BaseUri.ToString(), tableClient.Credentials ); context.Add(testProduct); context.SaveChanges(); testProduct.Price = 21.99; context.Update(testProduct); context.SaveChanges(); var result = context.Products .Where(x => x.RowKey == "1") .FirstOrDefault(); context.Delete(result); context.SaveChanges();
You can see the key differences from the weakly typed scenario mentioned earlier. We now use the new ProductDataContext, however we can’t automatically create it like we can with the generic table context, so we need to instantiate it ourselves, passing the base URI and credentials from the table client. We also use our more explicitly typed methods for our CRUD operations, however you might notice there is a big change in the way we query data. The ‘Products’ property returns IQueryable<Product> which means we can use LINQ to query the table store. Careful though, not all operations are supported by the LINQ provider. For example this will fail:
var result = context.Products.FirstOrDefault(x => x.RowKey == "1");
.. because FirstOrDefault is not supported with predicates. However this new query API is much nicer and allows us to do a lot more than we could when the base entity type was unknown by the data context.
Before reading on you might want to familiarise yourself with the concepts of these patterns. To prevent blog duplication, please refer to this article that someone smarter than me wrote:
Implementing Repository and Specification patterns using Linq.
The goal is to create a repository class that can take a generic type parameter which is an entity we want to work with. Such a repository class will be reusable for all types of entities but still be strongly typed. We also want to have it abstracted via an interface so that we are never concerned with the concrete implementation. For more information on why this is good practise, please refer to the SOLID principles.
We also want to use the specification pattern to provide filter/search information to our repository. We want to leverage the goodness of LINQ but also explicitly define those filters as specifications so that they are easily identifiable.
I usually find it easiest to start with the interface and worry about the implementation later. Let’s define an interface for a repository that will take any kind of table entity:
public interface IRepository<T> where T : TableServiceEntity { void Add(T item); void Delete(T item); void Update(T item); IEnumerable<T> Find(params Specification<T>[] specifications); void SubmitChanges(); }
Seems simple enough, however you might note that our find process is less flexible than in scenario 2 where we could just use LINQ directly against our data service. We want to provide the flexibility of LINQ yet still provide explicitness and reusability of those very same queries. We could add a bunch of methods for each query we want to do. For example, to retrieve a single product, we could create an extra method called ‘GetSingle(string rowkey)’. However that only applies to products, and may not apply to other entity types. Likewise, if we want to get all Products over $15, we can’t do that in our repository because it makes no sense to get all User entities that are over $15.
That’s where the specification pattern comes in. A specification is a piece of information about how to refine our search. Think of it as a search object, except it contains a LINQ expression. We’ll see with an example soon, but lets just define our specification class and adjust our Find method on our IRepository<T> interface first:
IEnumerable<T> Find(params Specification<T>[] specifications); ... public abstract class Specification<T> { public abstract Expression<Func<T, bool>> Predicate { get; } }
Our Find method has been adjusted to Find entities that satisfy the specifications provided. And a specification is just a wrapper around a predicate. Oh, and a predicate is just a fancy word for a condition. For example, consider this code:
if (a < 3) a++;
The part that says “a < 3” is the predicate. We can effectively change that same code to the following:
Func<int, bool> predicate = someInt => someInt < 3;if (predicate(a)) a++;
It might seem like code bloat in such a simple example, but the ability to reuse a ‘condition’ to check in many places will be a life saver when your systems start to grow. In our case we care about predicates because LINQ is full of them. For example, the “Where” statement takes a predicate in the form of Func<T, bool> (where T is the generic type on your IEnumerable). In fact, this is the exact reason we are also interested in predicates in our specification. Each specification represents some kind of filter. For example:
Products.Where(x => x.Rowkey == “1”)
The part that says x.Rowkey == “1” is a predicate, and can be made reusable as a specification. You’ll see it in action in the final code below, but for now we’ll move on to our Repository implementation. Just keep in mind that we will be reusing those ‘conditions’ and storing them in their own classes.
We’ll focus first on the definition of the repository class and its constructor:
public class TableRepository<T> : IRepository<T> where T : TableServiceEntity { private readonly string _tableName; private readonly TableServiceContext _dataContext; public TableRepository(string tableName, TableServiceContext dataContext) { _tableName = tableName; _dataContext = dataContext; } // TODO CRUD methods }
Our table repository implements our interface and most importantly takes a TableServiceContext as one of its constructor parameters. And to complete the interface contract we must also ensure that all generic types used in this repository inherit from TableServiceEntity. Next we’ll add in the Add/Update/Delete methods since they are the easiest:
public void Add(T item) { _dataContext.AddObject(_tableName, item); } public void Delete(T item) { _dataContext.DeleteObject(item); } public void Update(T item) { _dataContext.UpdateObject(item); }
Simple enough, since we have the generic table service context at our disposal. Likewise we can add in the SubmitChanges() method:
public void SubmitChanges() { _dataContext.SaveChanges(); }
We could just call SaveChanges whenever we add or delete an item, but this makes it more difficult to do batch operations. For example we might want to add 5 products and then submit them all as one query to the table storage API. This method lets us submit whenever we like, which is keeping with the same approach used when creating your own TableServiceContext or using the default one.
Finally, we need to define our Find method which takes zero or more specifications:
public IEnumerable<T> Find(params Specification<T>[] specifications) { IQueryable<T> query = _dataContext.CreateQuery<T>(_tableName); foreach (var spec in specifications) { query = query.Where(spec.Predicate); } return query.ToArray(); }
Every specification must have a predicate (refer to the initial definition and you will see the property is defined as ‘abstract’ which means it must be overridden). And a predicate is a Func<T, bool> and the T type is the same type as our repository. Therefore we can simply chain all the predicates together by calling the .Where() extension method on the query over and over for each specification. At the end of the day the code is really quite small.
And that’s all the framework-like code for setting up the Repository and Specification patterns against table storage. To show you how it works we first need a specification that allows us to get a product back based on its row key. Here’s an example:
public class ByRowKeySpecification : Specification<Product> { private readonly string _rowkey; public ByRowKeySpecification(string rowkey) { _rowkey = rowkey; } public override Expression<Func<Product, bool>> Predicate { get { return p => p.RowKey == _rowkey; } } }
In this specification, we take a row key in the constructor, and use that in the predicate that gets returned. The predicate simply says: “For any product, only return those products that have this row key”. We can use this specification along with our repository to perform CRUD operations as follows:
var context = tableClient.GetDataServiceContext(); IRepository<Product> productRepository = new TableRepository<Product>("Products", context); productRepository.Add(testProduct); productRepository.SubmitChanges(); testProduct.Price = 21.99; productRepository.Update(testProduct); productRepository.SubmitChanges(); var byRowkey = new ByRowKeySpecification("1"); var results = productRepository.Find(byRowkey); var result = results.FirstOrDefault(); productRepository.Delete(result); productRepository.SubmitChanges();
Tada! We now have a strongly typed repository that will work on any entity type you want to use. And the great thing about repositories is that because we have an IRepository abstraction we can implement an ‘in memory’ version of the repository which is very useful for unit testing.
As we progressed through the three options the amount of code got larger but I think we also got closer to true object oriented programming by the end there. Personally I like to always use repositories and specifications because it means we can write our code in a way that the persistence mechanism is irrelevant. We could easily decide to move products into our Sql Azure database and instead use a SqlRepository<T> instead of the TableRepository<T>.
Hopefully you’ll find the concept useful and aim to start with scenario 3 in all cases. To help you get started, I’ve assembled all 3 options into a reusable library for you, downloadable from here:
In each of the scenario folders you’ll find a single starter class that inherits from the TableStorageTest abstract class; you can look at that class to work out how the particular scenario works.
In the near future I will be looking to create a number of these basic classes as a reusable library to help Windows Azure developers get up and running faster with their applications. But in the mean time, happy coding.




Last night I delivered another Windows Azure Platform talk, this time at the my local user group. I think this is one of the best sessions I’ve done so far. I think I had a good vibe going on with the audience. And I appreciate that so many of you stuck around for the second talk; I think we went for a good 2 hours at least! Plus another 2 hours at the pub afterwards…
Anyway, here’s the slide deck.




Earlier this year Eric Nelson from Microsoft put out the call to Azure authors to build a community eBook about the Windows Azure Platform. Today the book was finally released to web today, and I have 2 articles included: Auto-scaling Azure, and Building Highly Scalable Applications in the Cloud.
I won’t bang on about it here, just check it out for yourself:




Earlier last week I had the privilege of presenting at Remix10, a two day web conference held in Melbourne Australia. I presented in front of a packed crowd, and the topic seemed so popular that I had to deliver it again a second time the next day.
I really enjoyed the “love fest” and while I didn’t get to see much of the other presentations, the keynote was awesome, demonstrating some great integration points with Microsoft Surface, Slate, and Windows Phone 7.
My presentation topic was “Architecting for the Cloud” with a focus on how to build highly scalable applications, leveraging aspects of the Windows Azure Platform.
The presentation began with a quick overview of the platform, then followed by identifying some key aspects to highly scalable applications such as minimising state, caching mechanisms and messaging patterns.
As with my talk at the Windows Azure launch in the Philippines, I decided to take some photos of the crowd at the start. I got to speak to a lot of these individuals during the mixer drinks that evening and it sounds like the Azure platform is starting to gain momentum in Australia.
I really enjoyed the event and hope to be back next year!




On the 8th June 2010 Brisbane will have its first ever CloudCamp event. I’m very excited about this. I’ve had my head in the Azure space for so long I’ve somewhat neglected what everyone else is doing.
Register for CloudCamp Brisbane here.
Here’s the line direct from the website:
CloudCamp is an unconference where early adopters of Cloud Computing technologies exchange ideas. With the rapid change occurring in the industry, we need a place where we can meet to share our experiences, challenges and solutions. At CloudCamp, you are encouraged to share your thoughts in several open discussions, as we strive for the advancement of Cloud Computing. End users, IT professionals and vendors are all encouraged to participate.
And the tentative schedule:
The event will be held at Griffith University’s Nathan Campus. Three rooms have been allocated to us (for free thanks to the School of Computing and Information Technology)
As you can see from the map, the University is right in the middle of Toohey Forest.
Getting to and from the University is relatively easy. The two main options are driving and buses (there is no train station close by).
Parking will cost. Campus security is very liberal with handing out fines so be prepared to pay for the full time period.
Buses are available from the city or from Garden City shopping centre.
Start prepping your talk ideas and come along. I look forward to meeting anyone else with their head in the clouds. Tell your friends!




Just the other day the Microsoft Azure team shifted the CDN functionality into release mode and offered up a pricing model. I reflected on the release in my post ‘Azure CDN Pricing’ however some of the information I provided was in fact incorrect.
This article is to correct some of those mistakes, and also offer a few more insights into the Microsoft CDN. I spoke with Jason Sherron from the Azure CDN team this morning and clarified a number of points and learnt more about Microsoft’s CDN offering.
Some years ago it was true that Microsoft used partner networks like Akamai and Limelight for content delivery. The team responsible for managing the global CDN was the “Edge Computing Network” (ECN) team.
Microsoft also had/has extensive co-location spaces; racks of servers sitting in a 3rd party data centre. Usually these have dedicated backbones and fibre. This localised service has extended over the years to include Microsoft.com, MSN.com, Bing Maps, etc.
Slowly Microsoft has started removing dependencies on 3rd party providers and moving on to their own infrastructure. This article gives some indication of where Microsoft are going with their services. Jason indicated to me that today, Microsoft serves 60% of edge content themselves.
Jason went on further to explain that as of this launch of the Azure CDN, Microsoft are now hosting 100% of your blob storage at those edges around the world. Your data does not sit with Akamai or Limelight, as I previously indicated.
Currently Microsoft has a single point of presence in Australia. The goal of any edge is to be located close to key egress and ingress points in the local area of internet exchange. For us, this is in Sydney, and while Microsoft doesn’t have an entire data centre here, they partner with someone who does, yet the racks are all Microsoft servers. Hopefully we’ll see more presence in the future in other capital cities.
We discussed briefly “what next” with the CDN. While nothing is bedded down completely, the team is investigating the possibility of Silverlight smooth streaming (apparently one of the most requested features) and also the potential of ‘compute at the edge’. How this latter service would differ from an implementation of the Azure Fabric is beyond me at this stage, and Jason was not at liberty to provide further information. I’m certainly very interested to see what this is about though.
In yesterday’s article I indicated you pay twice for CDN retrievals. This is partially true but really should be clarified.
The first time your data gets requested at the edge, the node has to retrieve the blob from Azure storage. You pay at the Azure storage data centre (normal Azure bandwidth charges) and then you pay again when it is delivered from the edge to the user (CDN charges). The content is of course cached at that point. Subsequent requests will hit the cache, which means only one charge.
Essentially if your data is “hot” then you only pay once. If you are constantly finding that your data is “cold” then perhaps CDN isn’t for you.
You have the option of either specifying the time-to-live for your blob object or you can rely on the heuristics of the cache network to determine the best time-to-live. More information can be found in this article: Delivering High-Bandwidth Content with the Windows Azure CDN.
Thanks to Jason for giving up some time to chat with me today, and put up with my follow up emails. Its important to remember that the authoritive source of your content is still your blob storage account, and the CDN cache is just a copy. The cache expiry will also affect the ‘freshness’ of your content so keep this in mind if you have content that changes frequently.




* Update: Please note this article is now redundant. Please defer instead to this clarification: Azure CDN Updated
The pricing structure for the CDN aspect of Windows Azure has just been announced. You may remember that I previously wrote about Global Foundation Services and the mechanism that Microsoft uses to globally distribute its own content. Since CDN might become more interesting to you now that it is officially released, I thought I’d summarise two important points.
I did cover this in greater detail in the previous post about GFS mentioned above; Microsoft does not have a CDN of their own and 3rd party services are utilised to achieve this. While I don’t see it generally being a problem, this might bother some people, mostly the fact that they can’t be sure where their data is actually sitting when it is cached in a CDN node.
CDN nodes are not part of the Microsoft network, therefore you will pay for output data transaction and bandwidth from the Azure Storage service, as well as for connections to CDN node. This means you are really paying a premium for this service. To quote the original release:
Any data transfers and storage transactions incurred to get data from Windows Azure Storage to the CDN will be charged separately at our normal Windows Azure Storage rates.
You have been warned!




I was just putting together a doc for some Azure quick links that will be content handed out to delegates at the upcoming Australian Remix conference. I figured I would also post the list here. If you’re into Azure, you should know about all these destination addresses.
Windows Azure Platform – Home
http://www.microsoft.com/windowsazure/
Windows Azure Platform – Developer Centre
http://msdn.microsoft.com/en-au/azure/default.aspx
PDC Videos 08/09
http://channel9.msdn.com/posts/pdc2008/tags/Azure/
http://microsoftpdc.com/Sessions#/tags/WindowsAzurePlatform
Windows Azure Platform Training Kit
http://www.microsoft.com/downloads/details.aspx?FamilyID=413e88f8-5966-4a83-b309-53b7b77edf78&displaylang=en
Blogs
http://blogs.msdn.com/b/windowsazure/
http://blogs.msdn.com/b/windowsazurestorage/
http://blogs.msdn.com/b/sqlazure/
http://blogs.msdn.com/b/netservices/
http://blog.smarx.com/
http://blogs.msdn.com/b/jnak/
http://www.davidaiken.com/
http://azure.snagy.name/blog
Other
http://www.cerebrata.com/products/cloudstoragestudio/
http://lokadcloud.codeplex.com/
http://azuresecurity.codeplex.com/
http://wag.codeplex.com/
Community Support:
http://www.codify.com/lists/ozazure
http://social.msdn.microsoft.com/forums/en-US/windowsazure/threads/




You may not have heard this name before. In fact, I Googled it and got 0 responses. However the pattern is very important and is very well publicised in Azure circles. Since there seems to be no actual name for this pattern, I’m seeking to give it a name, so that we can all speak a common language. First I’ll explain the pattern in concept, then I’ll explain it in the context of Azure.
The Asynchronous Work Queue Pattern allows workers to pull work items that are guaranteed to be unique from a robust, redundant queuing mechanism, in a fashion that is ignorant to leasing and locking of the work items provided. In other words, the leasing and locking functionality is removed from the worker which can concentrate on the work to be done, and the queue guarantees that no work item enqueued will ever be dequeued more than once.
As a developer I’ve worked on a lot of large systems and have often found myself dealing with the problem of resource contention. Whether multiple threads are trying to access a resource for the same reason, or perhaps two different tiers want the same piece of information for two different reasons, sharing resources can be hard.
Imagine an event that occurs as the result of some interaction with a website. That event might require some data to be saved, an email to be sent, a log service to be called, and a bunch of other things. We never want this to happen all in the original request; we like our UI to be responsive, otherwise the user is just going to press the submit button again right?
To get around this problem we create a WorkItem class and our UI thread now saves a work item. Running on a Windows Service in the background (probably on another server) we have a work processor whose job it is to check the work item table every 15 minutes and pickup any work items that need doing and process them. We sit back and put up our feet, comfortable that our separation of work items from the UI has made our application extremely responsive, and added some robustness to boot.
A month goes by and all of a sudden the sales team lands three massive clients and our site traffic has increased ten fold! Our web front end is doing great though, especially since it can just hand off work items and return control to the user very quickly. However the backend Windows Service is choking under the pressure and work items are coming in faster than it can process them.
No problem, we decide to add a second server and install the Windows Service there as well. But wait; this won’t work will it? Both services hit the database and get the next work item; they get the SAME work item! So now we need to consider taking a lease over a certain number of work items to indicate that they are being processed. We pick an arbitrary number; each service will pick off ten work items and put a flag next to them saying they are being processed. In doing so we realise we are polluting our data schema with information about how the data is being used, but we really have no other choice.
Of course we then ponder; what happens if they both query and get work items at the same time? They might not have the flag and could therefore still process the same records. So now we need a double verification. The complexity grows.
By providing a queue implementation that ensures that the ‘next’ work item cannot be dequeued by more than one requester, the workers can focus on the work that needs to be done and can remove complex code pollution required when worrying about leases.
The primary advantage of this approach is that it scales extremely well. In the problem scenario depicted above, adding another 20 windows services will result in each other service slowing down because more lease checking occurs when looking for free work items to process, and as a result less work gets done. But in the queue scenario, 20 times more services will mean 20 times more productivity.
In Azure the equivalent scenario would be worker roles with multiple instances. Windows Azure Storage provides a highly scalable Queue that is accessible via REST. In essence, the Windows Azure Storage Queue service was designed specifically for asynchronous work distribution/consumption. Each retrieval of a work item from the queue is guaranteed to be unique, except where the worker fails to notify the queue of successful processing, in which case the work item is automatically re-enqueued after a certain amount of time. This ensures the work item is not lost due to a worker failure. Also, Azure Queues are at least three times redundant, ensuring no work item is ever lost.
In some example code I’ve posted from previous presentations, there is an application that searches for images based on their colour content. The Asynchronous Work Queue Pattern is applied in this example application; a search is made against the Flickr API for a specific keyword and a bunch of results are returned. Each result is placed into a queue, and multiple workers listen at the other end, waiting to pick up an image that they will chop up and analyse for colour content.
To be honest I don’t care what its called; its just important that you are aware of the pattern and know why to use it.
It was just pointed out to me by Paul Stovell that this fits very closely with the Message Dispatcher pattern as identified in the book Enterprise Integration Patterns. The key difference is that the dispatcher pushes work to the consumers whereas workers will pull the work load from the queue instead. The message dispatcher pattern makes the assumption that the message channel is in fact dumb, whereas Windows Azure Storage Queues are smart and can ensure no message duplication, as well as reliable messaging, even in the event of consumer failure. Currently I believe these are still separate patterns however am happy to have the discussion over a beer or two.


More Options ...

Categories
Tag Cloud
Blog RSS
Comments RSS

Void
Life
Earth
Wind « Default
Water
Fire
Light 