24 Jul 2010 @ 3:03 PM 

Overview Of Tables

There are three kinds of storage in Windows Azure: Tables, Blobs, and Queues. Blobs are binary large objects and Queues are robust enterprise level communication queues. Tables are non-relational entity storage mechanisms. All storage is three times redundant and available via REST using the ATOM format.

Tables can store multiple entities with different kinds of shapes. That is to say, you can safely store 2 objects in a table that look completely different. For example, a Product entity might have a name and a category, whereas a User entity might have name, date of birth, and login name properties. Despite the difference between these two objects, they can both be stored in the same table in Windows Azure Table Storage.

Some of the reasons we prefer to use Table storage over other database mechanisms (such as Sql Azure) is that it is optimised for performance and scalability. It achieves this through an innate partitioning mechanism based on an extra property assigned to the object, called ‘Partition Key’. Five objects each with different partition keys that are otherwise identical, will be stored on five different storage nodes.

There are a number of ways we can get and put entities into our table storage, and this article will address a few. However before we investigate some scenarios, we need to setup our table and the entities that will go into it.

Setting Up The Table

Before we view the ways we can interact with entities in a table, we must first setup the table. We are going to create a table called ‘Products’ for the purposes of this article. Here is some code that demonstrates how (this could go in Session or Application start events or anywhere you see fit).

var account = CloudStorageAccount.FromConfigurationSetting(“ProductStorage”);
var tableClient = account.CreateCloudTableClient();
tableClient.CreateTableIfNotExist(“Products”);

The above code assumes you have a connection string configured already called ‘ProductStorage’ which points to your Windows Azure Storage account (local development storage works just as well for testing purposes).

Setting Up The Entity

For the purposes of this article we are going to put an entity called ‘Product’ into the table. That entity can be a simple POCO (plain old CLR object); only the publicly accessible properties will be persisted and retrievable however. Lets define a simple product entity with a name, category and a price. However all objects stored in tables must also have a row key, partition key, and a time stamp, otherwise we will get errors when we try to persist the item. Here’s our product class:

public class Product
{
   // Required
   public DateTime Timestamp { get; set; }
   public string PartitionKey { get; set; }
   public string RowKey { get; set; }
   // Optional
   public string Name { get; set; }
   public string Category { get; set; }
   public double Price { get; set; }
}

Pretty simple eh? However we can clean up some of the code; because the Timestamp, PartitionKey, and RowKey are all required for every single table entity, we could pull those properties out into a base entity class. However we don’t have to; there already exists one in the StorageClient namespace called ‘TableServiceEntity’. It has the following definition:

[CLSCompliant(false)]
[DataServiceKey(new string[] {"PartitionKey", "RowKey"})]
public abstract class TableServiceEntity
{
   protected TableServiceEntity(string partitionKey, string rowKey);
   protected TableServiceEntity();
   public DateTime Timestamp { get; set; }
   public virtual string PartitionKey { get; set; }
   public virtual string RowKey { get; set; }
}

It makes sense for us to inherit from this class instead. We’ll also follow the convention of having a partition key and row key injected in the constructor on our Product class, while also leaving a parameterless constructor for serialisation reasons:

public class Product : TableServiceEntity
{
   public Product() { }
   public Product(string partitionKey, string rowKey)
       : base(partitionKey, rowKey) {}
   public string Name { get; set; }
   public string Category { get; set; }
   public double Price { get; set; }
}

Done. We’ll use this Product entity from now on. All scenarios below will use a test Product with the following information:

var testProduct = new Product("PK", "1")
{
    Name = "Idiots Guide to Azure",
    Category = "Book",
    Price = 24.99
};

Scenario 1: Weakly Typed Table Service Context

The easiest way to get started with basic CRUD methods for our entity is by using a specialised ‘Data Service Context’. The Data Service Context is a special class belonging to the WCF Data Services client namespace (System.Data.Services.Client) and relates to a specific technology for exposing and consuming entities in a RESTful fashion. Read more about WCF Data Services here.

In a nutshell, a Data Service Context lets us consume a REST based entity (or list of entities) and that  logic is given to us for free in the ‘DataServiceContext’ class, which can be found in the afore mentioned System.Data.Services.Client namespace (you’ll probably need to add a reference). Consuming RESTful services is not an Azure specific thing, which is why we need to import this new namespace.

Because table storage entities act exactly like other RESTful services, we can use a data services context to interact with our entity. Tables and their entities have a few additional bits surrounding them (such as credential information like the 256bit key needed to access the table storage) so we need to be able to include this information with our data context. The Azure SDK makes this easy by providing a class derived from DataServiceContext called ‘TableServiceContext’. You’ll notice that to instantiate one of these we need to pass it a base address (our storage account) and some credentials.

If you review some of the original code above, you’ll notice we created a CloudTableClient based on connection string information in our configuration file. That same table client instance has the ability to create our TableServiceContext, using the code below:

var context = tableClient.GetDataServiceContext();

That’s it! All the explanation above just for one line of code eh? Well hopefully you understand what’s happening when we get that context. It is generating a TableServiceContext which inherits from DataServiceContext which contains all the smarts for communicating to our storage table. Simple.

Now we can call all sorts of methods to create/delete/update our products. We’ll use the ‘testProduct’ defined earlier:

context.AddObject("Products", testProduct);
context.SaveChanges();

testProduct.Price = 21.99;
context.UpdateObject(testProduct);
context.SaveChanges();

var query = context.CreateQuery<Product>("Products");
query.AddQueryOption("Rowkey", "1");
var result = query.Execute().FirstOrDefault();

context.DeleteObject(result);
context.SaveChanges();

The methods being called here only know about ‘object’, not ‘Product’ and are therefore not type safe. We’ll look at a more type safe example in the next scenario.

Scenario 2: Strongly Typed Table Service Context

In the previous example we saw that the Table Service Context was a generic way to get going with table entities quickly. This works well because we can put any type of entity into the table via the same ‘AddObject’ method. However sometimes in code we like to be more type safe than that and want to enforce that a particular table only accepts certain objects. Or perhaps we want unique data access classes for our different entity types so that we can put some validation in.

Either way, this is relatively easy to achieve by creating our own Data Service Context class. We still need to wrap up table storage credentials, so its actually easier if we inherit from TableServiceContext, as follows:

public class ProductDataContext : TableServiceContext
{
    public ProductDataContext(string baseAddress,
                              StorageCredentials credentials)
        : base(baseAddress, credentials)
    { }
    // TODO
}

The base constructor of TableServiceContext requires us to supply a base address and credentials, so we simply pass on this requirement. Our constructor doesn’t need to do anything else though.

The next step is to start adding methods to this new class that perform the CRUD operations we require. Let’s start with a simple query:

public IQueryable<Product> Products
{
   get { return CreateQuery<Product>("Products"); }
}

This will give us a ‘Products’ property on our ProductDataContext that will allow us to query against the product set using LINQ. We’ll see an example of that in a minute. For now, we’ll add in some strongly typed wrappers for the other CRUD behaviours:

public void Add(Product product)
{
   AddObject("Products", product);
}

public void Delete(Product product)
{
   DeleteObject(product);
}

public void Update(Product product)
{
   UpdateObject(product);
}

Nothing very special there, but at least we can enforce a particular type now. Let’s see how this might work in code to make calls to our new data context. As before we’ll assume the table client has already been created from configuration (see ‘Setting Up The Table’ above) and we’ll use the same test product as before:

var context = new ProductDataContext(
   tableClient.BaseUri.ToString(),
   tableClient.Credentials
);

context.Add(testProduct);
context.SaveChanges();

testProduct.Price = 21.99;
context.Update(testProduct);
context.SaveChanges();

var result = context.Products
   .Where(x => x.RowKey == "1")
   .FirstOrDefault();

context.Delete(result);
context.SaveChanges();

You can see the key differences from the weakly typed scenario mentioned earlier. We now use the new ProductDataContext, however we can’t automatically create it like we can with the generic table context, so we need to instantiate it ourselves, passing the base URI and credentials from the table client. We also use our more explicitly typed methods for our CRUD operations, however you might notice there is a big change in the way we query data. The ‘Products’ property returns IQueryable<Product> which means we can use LINQ to query the table store. Careful though, not all operations are supported by the LINQ provider. For example this will fail:

var result = context.Products
   .FirstOrDefault(x => x.RowKey == "1");

.. because FirstOrDefault is not supported with predicates. However this new query API is much nicer and allows us to do a lot more than we could when the base entity type was unknown by the data context.

Scenario 3: Using The Repository and Specification Patterns

Before reading on you might want to familiarise yourself with the concepts of these patterns. To prevent blog duplication, please refer to this article that someone smarter than me wrote:

Implementing Repository and Specification patterns using Linq.

The goal is to create a repository class that can take a generic type parameter which is an entity we want to work with. Such a repository class will be reusable for all types of entities but still be strongly typed. We also want to have it abstracted via an interface so that we are never concerned with the concrete implementation. For more information on why this is good practise, please refer to the SOLID principles.

We also want to use the specification pattern to provide filter/search information to our repository. We want to leverage the goodness of LINQ but also explicitly define those filters as specifications so that they are easily identifiable.

I usually find it easiest to start with the interface and worry about the implementation later. Let’s define an interface for a repository that will take any kind of table entity:

public interface IRepository<T> where T : TableServiceEntity
{
   void Add(T item);
   void Delete(T item);
   void Update(T item);
   IEnumerable<T> Find(params Specification<T>[] specifications);
   void SubmitChanges();
}

Seems simple enough, however you might note that our find process is less flexible than in scenario 2 where we could just use LINQ directly against our data service. We want to provide the flexibility of LINQ yet still provide explicitness and reusability of those very same queries. We could add a bunch of methods for each query we want to do. For example, to retrieve a single product, we could create an extra method called ‘GetSingle(string rowkey)’. However that only applies to products, and may not apply to other entity types. Likewise, if we want to get all Products over $15, we can’t do that in our repository because it makes no sense to get all User entities that are over $15.

That’s where the specification pattern comes in. A specification is a piece of information about how to refine our search. Think of it as a search object, except it contains a LINQ expression. We’ll see with an example soon, but lets just define our specification class and adjust our Find method on our IRepository<T> interface first:

IEnumerable<T> Find(params Specification<T>[] specifications);
...
public abstract class Specification<T>
{
    public abstract Expression<Func<T, bool>> Predicate { get; }
}

Our Find method has been adjusted to Find entities that satisfy the specifications provided. And a specification is just a wrapper around a predicate. Oh, and a predicate is just a fancy word for a condition. For example, consider this code:

if (a < 3) a++;

The part that says “a < 3” is the predicate. We can effectively change that same code to the following:

Func<int, bool> predicate = someInt => someInt < 3;
if (predicate(a)) a++;

It might seem like code bloat in such a simple example, but the ability to reuse a ‘condition’ to check in many places will be a life saver when your systems start to grow. In our case we care about predicates because LINQ is full of them. For example, the “Where” statement takes a predicate in the form of Func<T, bool> (where T is the generic type on your IEnumerable). In fact, this is the exact reason we are also interested in predicates in our specification. Each specification represents some kind of filter. For example:

Products.Where(x => x.Rowkey == “1”)

The part that says x.Rowkey == “1” is a predicate, and can be made reusable as a specification. You’ll see it in action in the final code below, but for now we’ll move on to our Repository implementation. Just keep in mind that we will be reusing those ‘conditions’ and storing them in their own classes.

We’ll focus first on the definition of the repository class and its constructor:

public class TableRepository<T> : IRepository<T>
       where T : TableServiceEntity
{
   private readonly string _tableName;
   private readonly TableServiceContext _dataContext;
   public TableRepository(string tableName,
                          TableServiceContext dataContext)
   {
      _tableName = tableName;
      _dataContext = dataContext;
   }
   // TODO CRUD methods
}

Our table repository implements our interface and most importantly takes a TableServiceContext as one of its constructor parameters. And to complete the interface contract we must also ensure that all generic types used in this repository inherit from TableServiceEntity. Next we’ll add in the Add/Update/Delete methods since they are the easiest:

public void Add(T item)
{
   _dataContext.AddObject(_tableName, item);
}
public void Delete(T item)
{
   _dataContext.DeleteObject(item);
}
public void Update(T item)
{
   _dataContext.UpdateObject(item);
}

Simple enough, since we have the generic table service context at our disposal. Likewise we can add in the SubmitChanges() method:

public void SubmitChanges()
{
   _dataContext.SaveChanges();
}

We could just call SaveChanges whenever we add or delete an item, but this makes it more difficult to do batch operations. For example we might want to add 5 products and then submit them all as one query to the table storage API. This method lets us submit whenever we like, which is keeping with the same approach used when creating your own TableServiceContext or using the default one.

Finally, we need to define our Find method which takes zero or more specifications:

public IEnumerable<T> Find(params Specification<T>[] specifications)
{
   IQueryable<T> query = _dataContext.CreateQuery<T>(_tableName);
   foreach (var spec in specifications)
   {
      query = query.Where(spec.Predicate);
   }
   return query.ToArray();
}

Every specification must have a predicate (refer to the initial definition and you will see the property is defined as ‘abstract’ which means it must be overridden). And a predicate is a Func<T, bool> and the T type is the same type as our repository. Therefore we can simply chain all the predicates together by calling the .Where() extension method on the query over and over for each specification. At the end of the day the code is really quite small.

And that’s all the framework-like code for setting up the Repository and Specification patterns against table storage. To show you how it works we first need a specification that allows us to get a product back based on its row key. Here’s an example:

public class ByRowKeySpecification : Specification<Product>
{
   private readonly string _rowkey;

   public ByRowKeySpecification(string rowkey)
   {
      _rowkey = rowkey;
   }

   public override Expression<Func<Product, bool>> Predicate
   {
      get { return p => p.RowKey == _rowkey; }
   }
}

In this specification, we take a row key in the constructor, and use that in the predicate that gets returned. The predicate simply says: “For any product, only return those products that have this row key”. We can use this specification along with our repository to perform CRUD operations as follows:

var context = tableClient.GetDataServiceContext();

IRepository<Product> productRepository =
   new TableRepository<Product>("Products", context);

productRepository.Add(testProduct);
productRepository.SubmitChanges();

testProduct.Price = 21.99;
productRepository.Update(testProduct);
productRepository.SubmitChanges();

var byRowkey = new ByRowKeySpecification("1");
var results = productRepository.Find(byRowkey);
var result = results.FirstOrDefault();

productRepository.Delete(result);
productRepository.SubmitChanges();

Tada! We now have a strongly typed repository that will work on any entity type you want to use. And the great thing about repositories is that because we have an IRepository abstraction we can implement an ‘in memory’ version of the repository which is very useful for unit testing.

Summary

As we progressed through the three options the amount of code got larger but I think we also got closer to true object oriented programming by the end there. Personally I like to always use repositories and specifications because it means we can write our code in a way that the persistence mechanism is irrelevant. We could easily decide to move products into our Sql Azure database and instead use a SqlRepository<T> instead of the TableRepository<T>.

Hopefully you’ll find the concept useful and aim to start with scenario 3 in all cases. To help you get started, I’ve assembled all 3 options into a reusable library for you, downloadable from here:

In each of the scenario folders you’ll find a single starter class that inherits from the TableStorageTest abstract class; you can look at that class to work out how the particular scenario works.

In the near future I will be looking to create a number of these basic classes as a reusable library to help Windows Azure developers get up and running faster with their applications. But in the mean time, happy coding.

Tags Tags: , , , , ,
Categories: Azure
Posted By: Steven Nagy
Last Edit: 24 Jul 2010 @ 03 40 PM

E-mailPermalinkComments (4)
 03 May 2009 @ 8:42 PM 

According to MSDN, Windows Azure Storage “provides persistent, redundant storage in the cloud”. Microsoft’s goal is to create storage that is durable and secure, scalable and efficient all at once. In the current CTP of Azure you can store your data in Windows Azure 3 different ways:

  • Blobs – Large binary data
  • Queues – Service Communication abstraction
  • Tables – Service state and user data

I’ll aim to dive into each of these in more detail individually over the coming weeks but for the rest of this post will discuss an overview of the storage mechanisms and methods of accessing the storage.

Windows Azure Storage allows you to store data for any length of time and to store any amount of data. Currently there is a lock at 50Gb of storage but for a CTP this should be pretty sufficient. In the future this will scale (at a cost obviously) to as much as you need.

Windows Azure Storage can be geo located, meaning you can choose which region it can be hosted. As previously discussed, you can also associate your data with your Azure services through an ‘Affinity Group’ which helps the Azure Fabric Controller deploy your services and data into a similar place within the data centre. The Azure data-centre’s are so large that having data and services located near each other (network hops) can drastically improve performance (lower communication times).

Its important to ensure you understand that Windows Azure Storage is not the same as SQL Data Services (SDS). If you want a relational data store in the cloud, you should be looking at SDS. Windows Azure Storage is about delivering quick access storage directly to your services, intercommunication between services, and state representation for your application.

Windows Azure Storage is a type of project that you can add in your Azure portal. In the current CTP, you can only add two storage projects. As with adding a service project, you get to select a name that will resolve the services for your storage.

image

This will result in a set of services to access blob, queue and table storage. You will also be provided with a key that can be used to authenticate you to your storage, allowing you to safely insert, update, and delete data from your applications in a RESTful manner.

image 

Over the next few posts I’ll attempt to break down each of the types of storage, and how to access them through the REST API.

Tags Tags: , , , , ,
Categories: Azure
Posted By: Steven Nagy
Last Edit: 03 May 2009 @ 08 42 PM

E-mailPermalinkComments (2)
\/ More Options ...
Change Theme...
  • Users » 76
  • Posts/Pages » 60
  • Comments » 96
Change Theme...
  • VoidVoid
  • LifeLife
  • EarthEarth
  • WindWind « Default
  • WaterWater
  • FireFire
  • LiteLight
  • No Child Pages.