Wednesday, 16 February 2011

Working with reference data: Lookups part 2 - A generic solution for reference data

In part 1 I talked about certain objects that are often present in an application’s model that are static lists of reference data that can be identified as a Lookup which can have three purposes:
  1. A means of defining entities through configurable attributes, such as the DeliveryOption class described in part 1.
  2. A way of grouping entities to a certain type; perhaps a TeamMember’s RoleType or a Product’s Category.
  3. An object that has no meaning in the domain and serves no other purpose than providing information to users or reports; a Person’s Title for example.
Lookups have a common structure to them. Each item has an ID, description, code (optionally) and a set of attributes. Applications often have a table for each lookup,  i.e. a Title table or a DeliveryOption table. But given that all lookups have the same structure it makes more sense to create a generic data structure, meaning:
  1. The data access code only needs to be written once, rather than for each type of lookup.
  2. New lookups can be added with ease simply by adding data rather than schema changes and new data access code.
  3. All lookups can be configured by the users in one central screen; we wouldn’t have to create a config screen for each one.
  4. Given that reference data rarely changes it makes sense to cache the data. With a generic solution, lookups can be cached at one central access point as opposed to a cache for each type.
In part one I explained why code should reference ‘attributes’ rather than a row ID or string literal. This was achieved by referencing a specific boolean column that represented the piece of logic that requires a decision to be made. An example of this was the ‘IsRoyalMail’ column, whereby the value was used to decide whether the  order should be sent to the post office or a courier. Since we’re trying to create a generic solution, it’s impractical to keep adding new columns to our generic lookup item table – for a large system there could be hundreds/thousands of columns that represent points of logic. Instead we can create a new table called ‘Attributes’ that will represent these points of logic. Consider the following data structure:

  • Lookup – This table is the grouping table for the individual items. There would be one record per type of lookup, i.e. Title, DeliveryOption, RoleType
  • LookupItemThis table stores the individual items. i.e. “Mr”, “Mrs” “Miss”, “Doctor”.
  • Attribute – This table simply stores a list of points of logic. i.e. “IsRoyalMail”, “IsCourier”.
  • LookupItemAttribute – The link table that stores which lookup items have which attributes. i.e. The “Royal Mail” DeliveryOption has the “IsRoyalMail” attribute.
In part one, the ‘attribute’ columns contained either true or false. You may however wish to store other information against a particular attribute. Perhaps you wish the UI screen to change colour depending on the gender of the person entity. The link table ‘LookupItemAttribute’ has a ‘Value’ column, making this possible. By defining the attribute ‘GenderScreenColour’ and linking this to both the male and female gender lookup items you are able to specify different values for the colour you want. For maximum flexibility the ‘Value’ column could be an image/binary column meaning you can serialize any type of object that represents configurable values; such as a colour.
Once we have this structure in place, managing reference data becomes a piece of cake, particularly with the use of generics:
1:        ILookupService lookupService = ServiceLocator.Current.GetInstance<ILookupService>();  
2:        //Get all roles  
3:        ILookup<TeamMemberRoleType> roles = lookupService.GetLookup<TeamMemberRoleType>();  
4:        //Get specific role by ID  
5:        TeamMemberRoleType role = lookupService.GetLookupItem<TeamMemberRoleType>(command.RoleTypeID);  
6:        //Get the developer role  
7:        TeamMemberRoleType developerRole = TeamMemberRoleType.GetDeveloperRole(roles);  
Here is an example of how we can now define our lookup types:
1:    [Serializable]  
2:    public class TeamMemberRoleType : LookupItem<TeamMemberRoleType>  
3:    {  
4:      internal static readonly Guid DeveloperAttributeID = new Guid("89FD406C-034E-4F12-938C-8DF5C783F153");  
5:      internal static readonly Guid TesterAttributeID = new Guid("8D768B47-93DA-40D9-8581-743501C0E1B6");  
6:      internal static readonly Guid ProductManagerAttributeID = new Guid("EA16D97A-DC2A-4468-96F7-8796FC0085EB");  
7:    
8:      public bool IsDeveloper  
9:      {  
10:        get { return HasAttribute(DeveloperAttributeID); }  
11:      }  
12:    
13:      public bool IsTester  
14:      {  
15:        get { return HasAttribute(TesterAttributeID); }  
16:      }  
17:    
18:      public bool IsProductManager  
19:      {  
20:        get { return HasAttribute(ProductManagerAttributeID); }  
21:      }  
22:    }  
The base type LookupItem<T> provides most of the standard properties that we need for our lookup (such as description and list of attributes it has). All we are left to do is add our type specific properties which the application can make decisions on. At the top of the class you’ll notice that the attribute ID’s are hard coded. You may be thinking “but I thought he said don’t code against IDs?”. Which is true, I did. The difference here is that attribute ID’s represent a logic point in the application, NOT a configurable database record that the user may wish to remove or add items to. The ID 89FD406C-034E-4F12-938C-8DF5C783F153 is an identifier for the application signifying a developer role. Even if the attribute record is accidentally removed from the database, within the application code that ID remains as an identifier that represents the developer role. By having these ID’s we can then have public properties on TeamMemberRoleType that indicate the type of role without the need to have a boolean column on our table for each one.
For the Gender example we could implement a GetColourFromAttributeValue method which would deserialise the attribute value to a Color object (or perhaps the value would store the RGB value), and then provide a property on our Gender lookup such as:
1:      public Color ScreenColor  
2:      {  
3:        get { return GetColorFromAttributeValue(GenderScreenColorAttributeID); }  
4:      }  

Friday, 11 February 2011

Working with reference data: Lookups Part 1 - Program against attributes, not Ids.

In a few of projects I've worked on I have noticed a common problem that has such a simple solution. Very often in business applications various entities have associations to objects that represent things like a type/status/category. I'm talking about the objects that are typically displayed in the UI as a dropdown list. These are not entities that are searched for; they are static lists that only change with configuration; reference data. For example a Person entity has a Title which has items such “Mr”, “Mrs”, “Miss”, “Dr” etc. Likewise perhaps an Order entity has a Delivery Option (i.e. “Royal Mail First Class”, “Royal Mail Second Class”, “24 Hour Courier”). These can be termed ‘Lookups’, and can be defined as an object that's used to help define entities but doesn't have any associations itself to other entities; entities can only have associations to them. From a DDD perspective, they fall under the value object category.
Sometimes these objects serve nothing more than an information purpose; they mean nothing to the domain, but mean something to the user. An example of this is a person's title. Which ever item is selected for the person, it has no bearing on how the application functions. However, more often than not the application makes decisions based on which of these items has been chosen. In an order system the Order entity’s DeliveryOption is used to decide where packages are sent to; either to the post office or the courier service. Generally, the database would have a table called 'DeliveryOption' with a schema of three columns; a primary key identifier, a description and possibly a code. Problems arise with the developer’s decision to use one of these three columns to switch the logic. Perhaps the developer chooses to switch the logic by the Delivery Option’s ID:
1:      public void DispatchOrder(Order order)
2:      {
3:        if (order.DeliveryOption.ID == 1 || order.DeliveryOption.ID == 2)
4:          DispatchOrderToRoyalMail(order);
5:        if (order.DeliveryOption.ID == 3)
6:          DispatchOrderToCourierService(order);
7:     } 
Or maybe the code property:
1:      public void DispatchOrder(Order order)
2:      {
3:        if (order.DeliveryOption.Code == "RM1"
4:          || order.DeliveryOption.Code == "RM2")
5:        {
6:          DispatchOrderToRoyalMail(order);
7:        }
8:        if (order.DeliveryOption.Code == "CS")
9:          DispatchOrderToCourierService(order);
10:     }
If the database uses integers for primary keys, another common solution is to use an enum to represent the items, and then cast the database’s integer value to the enum item and base the decision on this value, as this seems to be a slightly more elegant solution:
1:      public enum DeliveryOption
2:      {
3:        RoyalMailFirstClass = 1,
4:        RoyalMailSecondClass = 2,
5:        CourierService = 3
6:     }
7:      public void DispatchOrder(Order order)
8:      {
9:        if (order.DeliveryOption == DeliveryOption.RoyalMailFirstClass
10:          ||
11:          order.DeliveryOption == DeliveryOption.RoyalMailSecondClass)
12:        {
13:          DispatchOrderToRoyalMail(order);
14:        }
15:        if (order.DeliveryOption == DeliveryOption.CourierService)
16:        {
17:          DispatchOrderToCourierService(order);
18:        }
19:     }
I've seen all three of these of these techniques used in projects. Neither of these techniques are particularly clever. In fact, they are spectacularly unintelligent and completely oppose the principles of agile development.
The problem is obvious: The code is not susceptible to change; the switches on the Ids/codes are hard coded. These items should be configurable allowing the customer to add/remove/change items without breaking the application. The likelihood is that new delivery options will need to be added at some point; perhaps a Royal Mail airmail option is required, or a 48 hour European courier service. When this requirement is realised the method ‘DispatchOrder’ becomes out of date and requires a code change. If additional items are added with new IDs or codes, they won’t get dispatched anywhere because the code won’t recognise the values.
So how could it be designed so such additions to the delivery option wouldn’t require a code change? The answer is simple – attributes. Consider the revised DeliveryOption table with data:



ID

Description

Code

IsRoyalMail

IsCourier
1 Royal Mail 1st Class RM1 True False
2 Royal Mail 2nd Class RM2 True False
3 Courier Service CS False True


By updating the DeliveryOption class such as:

1:      public class DeliveryOption
2:      {
3:        public int ID { get; set; }
4:        public string Description { get; set; }
5:        public string Code { get; set; }
6:        public bool IsRoyalMail { get; set; }
7:        public bool IsCourierService { get; set; }
8:     }
The DispatchOrder method can now be:
1:      public void DispatchOrder(Order order)
2:      {
3:        if (order.DeliveryOption.IsRoyalMail)
4:          DispatchOrderToRoyalMail(order);
5:        if (order.DeliveryOption.IsCourierService)
6:          DispatchOrderToCourierService(order);
7:      }
This means the users can chop and change the delivery options as much as they like. For example, they may decide that they don’t want to use the generic description of ‘Courier Service’ and wish to change it to the name of the courier company they have contracted with. Or perhaps they wish to offer a Royal Mail recorded delivery instead of 2nd class:


ID

Description

Code

IsRoyalMail

IsCourier
1 Royal Mail 1st Class RM1 True False
2 Royal Mail 2nd Class RM2 True False
3 Courier Service CS False True
4 Royal Mail Recorded Delivery RMRD True False
5 UPS International UPSI False True

No problem, we have programmed against the attributes so there will be no adverse affects to the application.

It may be the case that a different decision needs to be made for every single item. Imagine in the same order system where upon the order status changing the system needs to perform a certain action. With 5 different statuses the table would look like this:

ID Description IsPending IsPicking IsAwaitingDispatch IsInTransit IsDelivered
1 Pending Processing True False False False False
2 In Picking Queue False True False False False
3 Awaiting Dispatch False False True False False
4 In Transit False False False True False
5 Delivered False False False False True

At first this may seem strange, or perhaps overkill, but it’s not. The attributes represent the switches for the application’s functionality. They will always be there, set to true or false. Any ID, description or code becomes irrelevant; the ID only serves as a way for other entities to reference the items, and the description & code only represents information that means something to the user, not to the application. The data is completely configurable:


ID

Description

IsPending

IsPicking

IsAwaitingDispatch

IsInTransit

IsDelivered
1 Payment Pending  True False False False False
2 In Picking - Wharehouse 1 False True False False False
3

Awaiting Dispatch False False True False False
4 Dispatched False False False True False
5 Complete False False False False True
6 In Picking - Wharehouse 2 False True False False False