SQL Server Index fragmentation

Now this was a new one for me…

If you have had to deal with big systems, you know when a table peaks to a couple of million rows and counting, things tend to get a tad slow unless you’re planning on doing some heavy optimizations however, there’s always room to get things to some new extremes when you start handling that much data.

One of the first logical things you check on a database is that you have the correct indexes on each table and if you’re searching over one field, then that field is indexed if it can be so search gets a bit faster. An index is quite the good thing to work with because basically it stores an index (Duh!) of how to access records (rows) on the table, where the numeration on those indexes is precisely that field you chose. This article is a very good and quick way to know how indexes work, if you’re a developer and don’t know, please take 5 minutes of your time to read this, don’t worry, I’ll wait for you…

Back? Very good, give yourself a pat on the back and lets move on. Now, as you may know now indexes are stored in trees and as you add or delete (fundamentally delete) rows you create gaps on the tree that would be way too costly to fix as part of those insert or delete operations and those gaps are called fragmentation. SQL Server being such a fan-fricking-tastic tool as it is (no Microsoft does no pay me) has a way for you to query off a database which of its indexes is fragmented and how fragmented it is.

The key is on the sys.dm_db_index_physical_stats function:

sys.dm_db_index_physical_stats (DB_ID(@'MY_DB'), OBJECT_ID(@'MY_TABLE'), 1, NULL, NULL)

However, that provides information for just one index, so with a bit of SQL Magic, we get this far:

SELECT DISTINCT frag.* FROM
	( SELECT 
			TableName = t.name, IndexName = ind.name, IndexId = ind.index_id, ColumnId = ic.index_column_id, ColumnName = col.name,
			(
				SELECT avg_fragmentation_in_percent
				FROM sys.dm_db_index_physical_stats (DB_ID(NULL), OBJECT_ID(t.Name), ind.index_id, NULL, NULL) AS a
				JOIN sys.indexes AS b 
				ON a.object_id = b.object_id AND a.index_id = b.index_id
				WHERE avg_fragmentation_in_percent >= 0.1
			) AS Fragmentation
	FROM 
			sys.indexes ind 
	INNER JOIN sys.index_columns ic ON  ind.object_id = ic.object_id and ind.index_id = ic.index_id 
	INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id 
	INNER JOIN sys.tables t ON ind.object_id = t.object_id 
	WHERE 
			ind.is_primary_key = 0 
			AND ind.is_unique = 0 
			AND ind.is_unique_constraint = 0 
			AND t.is_ms_shipped = 0
	) frag
ORDER BY frag.Fragmentation DESC

Whoa! That’s a lot bigger isn’t it? Well, yes, because this one will go through all the indexes on our database and call that function and return everything on a nice little table where you’ll know if there are indexes that should take your attention or not.

Defragmenting the bad ones.

Ok, so you know now which index(es) are misbehaving on your database, now we need to defragment them, so for that we’re going to use a simple ALTER statement and be done with it. To make a quick sample, if we had an index called MY_INDEX_I on a table called MY_TABLE the alter statement would be

ALTER INDEX MY_INDEX_I ON MY_TABLE REBUILD

Now this is a bit extreme because it will drop the index stored and will rebuild the whole thing which can potentially take quite some time depending on your data and the amount of rows you have. There is another way to defragment indexes but because it doesn’t work with SQL Azure and this particular investigation was done for a server over that version, I didn’t looked into it, however the call is not too far. On top of that, this whole rebuild is an overkill for any table with more than 10% of fragmentation, but will always work so it’s a very good safe bet.

Now we have a way to do it manually, however, what happens on a production database with maybe 90 or 100 indexes that gets heavily fragmented every few weeks? You just can’t do it manually in case you were wondering. For that we’re going to use a very handy stored procedure called sp_executesql that lets us invoke dynamically created SQL. All we need is a cursor to go through the indexes and then generate the sql to defragment it and voila.  It goes a bit something like this:

 

DECLARE @MySQL NVARCHAR(4000)
DECLARE @Now DATETIME
DECLARE @Message VARCHAR(255)
DECLARE @TableName VARCHAR(255)
DECLARE @IndexName   NVARCHAR(255)
DECLARE @IndexId   INTEGER
DECLARE @ColumnId   INTEGER
DECLARE @ColumnName   NVARCHAR(255)
DECLARE @Fragmentation FLOAT

DECLARE myCursor CURSOR FOR SELECT DISTINCT frag.* FROM
	( SELECT 
			TableName = t.name, IndexName = ind.name, IndexId = ind.index_id, ColumnId = ic.index_column_id, ColumnName = col.name,
			(
				SELECT avg_fragmentation_in_percent
				FROM sys.dm_db_index_physical_stats (DB_ID(NULL), OBJECT_ID(t.Name), ind.index_id, NULL, NULL) AS a
				JOIN sys.indexes AS b 
				ON a.object_id = b.object_id AND a.index_id = b.index_id
				WHERE avg_fragmentation_in_percent >= 0.1
			) AS Fragmentation
	FROM 
			sys.indexes ind 
	INNER JOIN sys.index_columns ic ON  ind.object_id = ic.object_id and ind.index_id = ic.index_id 
	INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id 
	INNER JOIN sys.tables t ON ind.object_id = t.object_id 
	WHERE 
			ind.is_primary_key = 0 
			AND ind.is_unique = 0 
			AND ind.is_unique_constraint = 0 
			AND t.is_ms_shipped = 0
	) frag
ORDER BY frag.Fragmentation DESC
OPEN myCursor

FETCH NEXT FROM myCursor INTO @TableName, @IndexName, @IndexId, @ColumnId, @ColumnName, @Fragmentation

WHILE @@FETCH_STATUS = 0
BEGIN
	SET @Now = GETDATE()
	SET @MySQL ='ALTER INDEX ' + @IndexName + ' ON [' + @TableName + '] REBUILD'

	SET @Message = 'Reindex: ' + @IndexName + ' ON ' + @TableName + ': ' + CONVERT(NVARCHAR(28), @Now, 21) + ' (' + @MySQL + ')'
	PRINT @IndexName + ' ' + @TableName + '(' + CONVERT(NVARCHAR(28), @Fragmentation, 21) + ')'

	EXEC sp_executesql @MySQL

	FETCH NEXT FROM myCursor INTO @TableName, @IndexName, @IndexId, @ColumnId, @ColumnName, @Fragmentation
END

CLOSE myCursor
DEALLOCATE myCursor

I know right? But well, this works like a charm and because it’s fully automated, you can just call it from any of your systems as a cron job and no need to worry about fragmented indexes anymore. Did you tried it? Well, give me a shout and let me know how you got on.

DTO’s and why you should be using them

If you’ve worked in any form of modern (decent sized) application, you know that the de facto standard is to use a layered design where people usually define operations into layers corresponding to certain functionality, for example a Data Access Layer, that is nothing else but an implementation of your repository using nHibernate, Entity Framework, etc. While that is a very good idea for most scenarios, a bit of a problem comes around with it, and is the fact that you need to pass around lots of calls between layers, and sometimes is not just calling a DLL inside your solution, sometimes, it’s calling a service hosted somewhere over the network.

The problem

If your app calls services and receives data from them (obviously?) then you might encounter in your service something like this:
public Person AddPerson(string name, string lastName, string email)
Now, let’s first look at the parameters and why this is probably not a very good definition. 
In this method, you have 3 arguments, name, lastName and email; what happens if somebody needs a telephone number? Well, we just add another argument! Dead easy! Yeah, no. Suppose we make it more interesting saying we have Workers and Customers, both inheriting from person, we would then have something like this:
public Person AddWorker(string name, string lastName, string email)
public Person AddCustomer(string name, string lastName, string email)
If you need to add that telephone number now and go for that extra param, you have to add code in two locations, so you need to touch more code, and what happens if we touch more code? Simple, we put more bugs.

The Good

Now, what happens if you have this?
public Worker AddWorker(Worker worker)
public Customer AddCustomer(Customer customer)
DTO stands for Data Transfer Object, and that is precisely what these classes do, we use them to transfer data on our services. For one, code is much simpler to read now! But there is another thing, if Worker and Customer inherit from Person as they should considering they are both a Person, then we can safely add that email to the person without having to change the signature of the service, yes, our service will now have an extra argument but we don’t have to change our service signature on the code, just the DTO it receives. 
Now, more on the common use for DTO’s, just as Martin Fowler states a DTO is
An object that carries data between processes in order to reduce the number of method calls.
Now, it’s fairly obvious that using DTOs for input arguments is good, but what happens for output arguments? Well, similar story really, with a small twist, considering that many people today use ORMs for accessing the database, it’s very likely that you already have a Worker, Customer and person class, because they are part of your domain model, or they are created by Linq To Sql (not a huge fan, but many people still use it), so, should you be using those entities to return on your services? Not a very good idea and I have some reasons for it.

One very simple reason is that the objects generated by these frameworks usually are not serialization friendly, because they are on top of proxy classes which are a pain to serialize for something that outputs JSON or XML. Another potential problem is when your entity doesn’t quite fit the response you want to give, what happens if your service has something like this?

public Salary CalculateWorkerSalary(Worker worker)

You could have a very simple method just returning a double, but let’s think of a more convoluted solution to illustrate the point, imagine salary being like this:

public class Salary
{
     public double FinalSalary {get;}
     public double TaxDeducted {get;}
     public double Overtime {get;}
}

So, this is our class, and Overtime means it’s coupled to a user because not everybody does the same amount of overtime. So, what happens now if we also need the Tax code for that salary? Or the overtime rate for the calculation? That is assuming these are not stored on the salary table. More importantly, what happens if we don’t want whoever is calling the API to see the Overtime the Worker is doing? Well, the entity is not fit for purpose and we need a DTO where we can put all of these, simple as that.

The Bad

However, DTOs are not all glory, there is a problem with them and it’s the fact they bloat your application, especially if you have a large application with many entities. If that’s the case, it’s up to you to decide when a DTO is worth it and when it’s not, like many things on software design, there is no rule of thumb and it’s very easy to get it wrong. But for most of things where you pass complex data, you should be using DTOs.

The Ugly

There is another problem with DTOs, and it’s the fact you end up having a lot of code like this:

var query = _workerRepository.GetAll();
var workers = query.Select(ConvertWorkerDTO).ToList();
return workers;

Where ConvertWorkerDTO is just a method looking pretty much like this:

public WorkerDTO ConvertWorkerDTO(Worker worker)
{
    return new WorkerDTO() {
        Name = worker.Name,
        LastName = worker.LastName,
        Email = worker.Email
    };
}

Wouldn’t be cool if you could do something without a mapping method, like this:

var query = _workerRepository.GetAll();
var workers = query.Select(x => Worker.BuildFromEntity<Worker, WorkerDTO>(x))
                   .ToList();
return workers;

Happily, there is a simple way to achieve a result like this one, and it’s combining two very powerful tools, inheritance and reflection. Just have a BaseDTO class that all of your DTOs inherit from and make a method like that one, that manages the conversion by performing a mapping property to property. A fairly simple, yet fully working, version could be this:

public static TDTO BuildFromEntity<TEntity, TDTO>(TEntity entity)
{
    var dto = Activator.CreateInstance<TDTO>();
    var dtoProperties = typeof (TDTO).GetProperties();
    var entityProperties = typeof (TEntity).GetProperties();

    foreach (var property in dtoProperties)
    {
        if (!property.CanWrite)
            continue;

        var entityProp =
            entityProperties.FirstOrDefault(x => x.Name == property.Name && x.PropertyType == property.PropertyType);

        if (entityProp == null)
            continue;

        if (!property.PropertyType.IsAssignableFrom(entityProp.PropertyType))
            continue;

        var propertyValue = entityProp.GetValue(entity, new object[] {});
        property.SetValue(dto, propertyValue, new object[]{});
    }

    return dto;
}

And Finally…

The bottom line is like everything, you can over engineer your way into adding far too many DTOs into your system, but ignoring them is not a very good solution either, and adding one or two to a project with more than 15 entities just to feel you’re using them, it’s just as good as using one interface to say you make decoupled systems.

What’s your view on this? Do you agree? Disagree? Share what you think on the comments!

EDIT: As a side note, it’s work checking this article that talks a lot about the subject.

Empower your lambdas!

If you’ve used generic repositories, you will encounter one particular problem, matching items using dynamic property names isn’t easy. However, using generic repositories has always been a must for me, as it saves me having to write a lot of boilerplate code for saving, updating and so forth. Not long ago, I had a problem, I was fetching entities from a web service and writing them to the database and given that these entities had relationships, I couldn’t retrieve the same entity and save it twice, so I had a problem.
Whenever my code fetched the properties from the service, it had to realize if this entity had been loaded previously and instead of saving it twice, just modified the last updated time and any actual properties that may had changed. To begin with, I had a simple code on a base web service consumer class like this.

var client = ServiceUtils.CreateClient();
var request = ServiceUtils.CreateRequest(requestUrl);
var resp = client.ExecuteAsGet(request, "GET");
var allItems = JsonConvert.DeserializeObject<List<T>>(resp.Content);

This was all very nice and so far, I had a very generic approach (using DeserializeObject<T>). However, I had to check if the item had been previously fetched and one item’s own identity could be determined by one or more properties and my internal Id was meaningless on this context to determine if an object existed previously or not. So, I had to come up with another approach. I created a basic attribute and called it IdentityProperty, whenever a property would define identity of an object externally, I would annotate it with it, so I ended up with entities like this:

public class Person: Entity
{
    [IdentityProperty]
    public string PassportNumber { get; set; } 
    
    [IdentityProperty] 
    public string SocialSecurityNumber { get; set; }

    public string Name {get; set}
}

This would mark all properties that defined identity on the context of web services. So far, so good, my entities now know what defines them on the domain, now I need my generic service consumer to find them on the database so I don’t get duplicates. Now, considering that all my entities fetched from a web service have a Cached and a Timeout property, ideally, I would have something like this:

foreach (var item in allItems)
{
    var calculatedLambda = CalculateLambdaMatchingEntity(item);
    var match = repository.FindBy(calculatedLambda);

    if (match == null) {
        item.LastCached = DateTime.Now;
        item.Timeout = cacheControl;
    }
    else {
        var timeout = match.Cached.AddSeconds(match.Timeout);
        if (DateTime.Now &gt; timeout){
            //Update Entity using reflection
            item.LastCached = DateTime.Now;
    }
}

Well, actually, this is what I have, but the good stuff is on the CalculateLambda method. The idea behind that method is to calculate a lambda to be passed to the FindBy method using the only the properties that contains the IdentityProperty attribute. So, my method looks like this:

private Expression&lt;Func&lt;T, bool&gt;&gt; CalculateLambdaMatchingEntity&lt;T&gt;(T entityToMatch)
{
 var properties = typeof (T).GetProperties();
 var expresionParameter = Expression.Parameter(typeof (T));
 Expression resultingFilter = null;

 foreach (var propertyInfo in properties) {
  var hasIdentityAttribute = propertyInfo.GetCustomAttributes(typeof (IdentityPropertyAttribute), false).Any();

  if (!hasIdentityAttribute)
   continue;

  var propertyCall = Expression.Property(expresionParameter, propertyInfo);

  var currentValue = propertyInfo.GetValue(entityToMatch, new object[] {});
  var comparisonExpression = Expression.Constant(currentValue);

  var component = Expression.Equal(propertyCall, comparisonExpression);

  var finalExpression = Expression.Lambda(component, expresionParameter);

  if (resultingFilter == null)
   resultingFilter = finalExpression;
  else
   resultingFilter = Expression.And(resultingFilter, finalExpression);
 }

    return (Expression&lt;Func&lt;T, bool&gt;&gt;)resultingFilter;
}

Fancy code apart, what this does is just iterate trough the properties of the object and construct a lambda matching the object received as sample, so for our sample class Person, if our service retrieves a person with passport “SAMPLE” and social security number “ANOTHER”, the generated lambda would be the equivalent of issuing a query like

repository.FindBy(person =&gt; person.Passport == "SAMPLE" &amp;&amp; person.SocialSecurityNumber == "ANOTHER")

Performance you say?

If you’ve read the about section on my blog, you’ll know that I work for a company that cares about performance, so once I did this, I knew the next step was bechmarking the process. It doesn’t really matter the fact that it was for a personal project, I had to know that the performance made it a viable idea. So, I ended up doing a set of basic tests benchmarking the total time that the update foreach would take and I came up with these results:

Scenario Matching data Ticks Faster?
Lambda calculation Yes 5570318 Yes
No Lambda calculation Yes 7870450
Lambda calculation No 1780102 No
No Lambda calculation No 1660095

These are actually quite simple to explain, when no data is available, the overhead of calculating a lambda, makes it loose the edge because no items match on the query, however, when there are items matching the power of lambdas shows up, because the compiler doesn’t have to build the expression tree from an expression, but instead, it will receive a previously built tree, so it’s faster to execute. So, back into the initial title, empower your lambdas!
If you have any other point of view on these ideas, feel free to leave a comment even if you are going to prove me wrong with it because I’ve always said that nobody knows everything, so I might be very mistaken here. On the other hand, if this helps, then my job is complete here.

Common method for saving and updating on Entity Framework

This problem has been bugging me for some time now. One of the things that I miss the most from NHibernate when I’m working with EF is the SaveOrUpdate methods. Once you lose that, you realize just how much you loved it in the first place. So, I set out to make my EF repositories to use one of those. My initial approach was rather simple and really close to what you can find here or here, so I basically came out with this:

public T SaveOrUpdate(T item)
{
 if (item == null)
  return default(T);

 var entry = _internalDataContext.Entry(item);

 if (entry.State == EntityState.Detached)
  if (item.Id != null)
   TypeDbSet.Attach(item);
  else 
   TypeDbSet.Add(item);
 
 _internalDataContext.SaveChanges();
 return item;
}

This is a neat idea and it works for most of the cases, with one tiny issue. I was working with an external API and I was caching the objects received on my calls and since these objects had their own keys, I was using those keys on my DB. So, I had a Customer class, but the Id property was set when I was about to insert and since our method uses the convention that if it has an Id, it was already saved, then the repo would just attach it to the change tracker but the object was never saved! Boo! Well, no panic, my repo also has a method called GetOne which receives an Id and returns that object, so I added that into the soup and got this:

public T SaveOrUpdate(T item)
{
 if (item == null)
  return default(T);

 var entry = _internalDataContext.Entry(item);

 if (entry.State == EntityState.Detached)
 {
  if (item.Id != null)
  {
   var exists = GetOne(item.Id) != null;

   if (exists)
    TypeDbSet.Attach(item);
   else
    TypeDbSet.Add(item);
  }
  else 
   TypeDbSet.Add(item);
 }
 
 _internalDataContext.SaveChanges();

 return item;
}

Now, if you think about it, how would you update an object?

  • Check if the object already exists on the DB
  • If it’s there.. update it!
  • If it’s not there.. insert it!

As you can see, Check involves GetOne. Now, if you are thinking that you don’t want an extra DB call, there is always a solution…

public T SaveOrUpdate(T item, bool enforceInsert = false)
{
 if (item == null)
  return default(T);

 var entry = _internalDataContext.Entry(item);

 if (entry.State == EntityState.Detached)
 {
  if (item.Id != null)
  {
   var exists = enforceInsert || GetOne(item.Id) != null;

   if (exists)
    TypeDbSet.Attach(item);
   else
    TypeDbSet.Add(item);
  }
  else 
   TypeDbSet.Add(item);
 }
 
 _internalDataContext.SaveChanges();

 return item;
}

Granted, is not fancy, but gets the job done and doesn’t requires many changes. If you pass the enforceInsert flag, means you are certain that the object you’re saving requires an insert, so it will have an Id, but you know is not there. Just what I was doing!

Do you have any other way of doing this? Do you think this is wrong? Feel free to comment and let me know!

Consuming web services and notifying your app about it on Objective C

Since almost the beginning of my exploits as an iOS developer I’ve been working on several apps consuming web services and one big problem has been notifying different areas of my app that certain event has been updated. My first genius idea was to create my own home brew of notifications using the observer pattern. It wasn’t all that bad, but then a while later I realized that I was reinventing the wheel, so I resorted to the one and only NSNotificationCenter.

Enter NSNotificationCenter

According to Apple on the docs for the notification center, this is the definition:

An NSNotificationCenter object (or simply, notification center) provides a mechanism for broadcasting information within a program. An NSNotificationCenter object is essentially a notification dispatch table.

So, this was my observer! How does it work you say? Let’s get to it! But before, let’s get into context. What I have is a class called ServiceBase which is the base class (duh!) for all classes consuming services. The interface definition for the class looks a bit like this…

 @interface ServiceBase : NSObject<ASIHTTPRequestDelegate>
  - (void) performWebServiceRequest: (NSString*) serviceUrl;
  - (void) triggerNotificationWithName: (NSString*) notificationName andArgument: (NSObject*) notificationArgument;
  - (NSString*) getServiceBaseUrl;
 @end
 

The class has been simplified and the actual class has a few other things that depend more on how I work, but you get the point. However, given the idea of this post, I’m going to concentrate more on the notification side of the class. However, we do need to get some sort of example here going on and to get that done, let’s take a look on the performWebServiceRequest method.

- (void) performWebServiceRequest: (NSString*) serviceUrl
{
    if (!self.queue) {
        self.queue = [[NSOperationQueue alloc] init];
    }
    
    NSURL *url = [NSURL URLWithString: serviceUrl];
    ASIHTTPRequest *request = [ASIHTTPRequest requestWithURL:url];
    [request addRequestHeader:@"accept" value:@"text/json"];
 
 [requestion setCompletionBlock: ^{
  //this will keep the self object reference alive until the request is done
  [self requestFinished: request];
 }];
 
    [self.queue addOperation: request];
}
 

Now, we have this simplified method that creates a request, sets the requestFinished method as the completion block and queues up the request. Now, I said I would focus on the notifications, but one thing to consider here:

 [requestion setCompletionBlock: ^{
  //this will keep the self object reference alive until the request is done
  [self requestFinished: request];
 }];
 

Keep in mind, that this sentence will preserve the reference to self until the request is finished, so it’s not autoreleased by ARC, however, the way I use services on my app, each service works as a singleton (or quite close to that) and keeping the reference is not a problem because you are not creating one new instance of each service class every time you make a request. This also solves an issue with ASIHttpRequest loosing the reference to the delegate before the service is complete, however, that’s a story for another day. Now, moving on the the end of the request…

- (void)requestFinished:(ASIHTTPRequest *)request
{
    JSONDecoder* decoder = [[JSONDecoder alloc] init];
    NSData * data = [request responseData];
    NSArray* dictionary = [decoder objectWithData: data];

    for (NSDictionary* element in dictionary) {
  [self triggerNotificationWithName: @"ItemLoaded" andArgument: element];
    }
}
 

When the request is finished, it will only convert the data received, notice that this is a simple scenario, and make a notification that an Item has been loaded using the [triggerNotificationWithName: andArgument] method. Now, into the actual notification method…

- (void) triggerNotificationWithName: (NSString*) notificationName andArgument: (NSObject*) notificationArgument
{
    NSNotificationCenter * notificationCenter = [NSNotificationCenter defaultCenter];
   
 if ( notificationArgument == nil )
 {
  [notificationCenter postNotificationName: notificationName  object: nil];
 }
 else
 {
  NSMutableDictionary * arguments = [[NSMutableDictionary alloc] init];
  [arguments setValue: notificationArgument forKey: @"Value"];
  [notificationCenter postNotificationName: notificationName  object:self userInfo: arguments];
 }
}
 

Now, we only need to subscribe to a notification and retrieve the value which is very simple, take this example inside a UIViewController:

- (void) viewDidLoad
{
 NSNotificationCenter * notificationCenter = [NSNotificationCenter defaultCenter];
 [notificationCenter addObserver: self selector: @selector(authenticationFinished:) name:@"AuthenticationCompleted" object: nil];
}

- (void) itemLoadedNotificationReceived: (NSNotification*) notification
{
 NSDictionary* itemLoaded = [notification.userInfo valueForKey: @"Value"];
    // Do something with the item you just loaded
}
 

In the itemLoadedNotificationReceived method the app will receive a notification when each item is loaded. This may not be the best example, because when you’re loading several items, they normally go into a cache to be loaded from a UITableView afterwards, but this idea should get you going.

Do you use a different approach? Do you normally use it like this? Well, if you have anything at all to say, feel free to leave it in the comments!

The status of Lucene2Objects

After some time without being able to work into it, I’ve managed to put some time into Lucene2Objects again. First thing I did on my last session was to work on separating the attributes from the actual Lucene2Objects project for a very simple reason that was brought to my attention by a fellow user of the library. Currently if you want to annotate your entities on your domain project, you will have to import the Lucene2Objects library into the domain project, thus adding a dependency on the library and on Lucene .NET and on Lucene Contrib project which is used for importing several analyzers and any other dependencies these might bring along. Now, for a domain project, which is supposed to have as less dependencies as possible, this is very heavy duty, hence the need for a separation (of concerns if you will).

The basic idea that I followed on this new update was to separate the project into 2 different libraries, one very light containing the attributes with no dependencies at all and the actual library. Obviously this will make me create another package, which I will do it very soon, but will hopefully allow people to integrate easily with Lucene2Objects .

My next step is working over adding collection support for Lucene2Objects . I have a few ideas on this and I hope a new version should be done soon, but there is nothing worth pushing now. Hopefully, I will manage to put more time into this from now on, so feel free to let me know if there’s something you’d like to see on Lucene2Objects !

Back in business!

I’ve been for sometime without being able to write given that I’ve changed my location to the UK. Now that all arrangements have been take care and I can consider myself settled, I’ll continue with posting a couple of new posts soon. Also, I’m planning on moving my development on Lucene2Objects into the new features that I wanted to get working for version 2.

Thanks to all the folks that reached to me to know about the status of Lucene2Objects!

David

Testing S#arp Lite Repositories with Moq

One pending matter I’ve always had is to improve my testing skills, there I said it. I test, but not as much as I should. When I say test, I mean Unit Test, not just test the application by launching it and starting to poke it. One thing that I found to be a really outstanding idea with S#arp Lite is that repositories eliminated many complications. If you had a repository and needed to run a query against it, just call GetAll and throw some Linq at. Grated, it assumed that the Linq provider for the underlaying data model was mature, but with NHibernate and EntityFramework being the two ORM of my choice always, that seems like a fair assumption.

However, this has a downside, I tried to test the repositories and had a really rough time getting to test a repo that was using an underlying IQueryable item. However, this became quite clear with time, and now I can test my repos. Let’s make a fairly simple test scenario. Let’s assume I have a user class, with a few standard properties, pretty much like this one:

public class User : Entity
{
 public virtual string Password { get; set; }

 public virtual string Email { get; set; }

 public virtual bool Blocked { get; set; }

 public virtual int LoginCount { get; set; }
} 

Now, I have a class called Membership that handles my Membership logic, that is, logging users, blocking them after a couple of bad logins, etc. That class should look like this:

public class Membership
{
 IRepository&lt;User&gt; _usersRepository;

 public Membership( IRepository&lt;User&gt; usersRepository )
 {
  _usersRepository = usersRepository;
 }
 
 public bool IsValidUser( string email, string password )
 {
  //create test first!
  return false;
 }
} 

Now, we need to create a test case. Let’s call it, MembershipTests

[TestFixture]
public class MembershipTests
{
 [TestFixtureSetUp]
 public void SetupTestEnvironment()
 {
 
 }
}

Now, I want to create a Mock repository to pass it along to my test Membership class, but I need to do it so that it simulates the data backed without touching my actual data nor getting too slow. Obviously we need a list, but not just any list, we need a list that can pose for a Repository or at least fake it. That’s why we need to create this sort of list, a QueryableList:

public class QueryableList&lt;T, TId&gt; : List&lt;T&gt;, IQueryable&lt;T&gt; where T : EntityWithTypedId&lt;TId&gt;
{
 #region Constructors
 public QueryableList()
 { }

 public QueryableList(IEnumerable&lt;T&gt; source)
  : base(source)
 { } 
 #endregion

 #region IQueryable&lt;T&gt; implementation
 public Expression Expression
 {
  get { return ToArray().AsQueryable().Expression; }
 }

 public Type ElementType
 {
  get { return typeof(T); }
 }

 public IQueryProvider Provider
 {
  get { return ToArray().AsQueryable().Provider; }
 }
 #endregion

 public void UpdateEntity(T entity)
 {
  var index = -1;

  for (var i = 0; i &lt; Count; i++)
   if (this[i].Equals(entity))
    index = i;

  if (index == -1)
   Add(entity);
  else
   this[index] = entity;
 }
} 

Voila! We have a List that directly implements IQueryable, which is a good thing, not a hard thing to do, but it will help us a lot. We need to get the entity of the List and the Id that is going to be used on the list to keep it as generic as possible, so when we need to test a repo of entities with typed id’s, we won’t have to rewrite much. The UpdateEntity method will mimic the SaveOrUpdate method we have on our repo using the Equals method to invoke the Equality comparer provided by S#arp Lite. Now, we need to setup our Mocks. We go back to the TestSetup and let’s setup our environment:

[TestFixture]
public class MembershipTests
{
 private Membership _membership;
 
 [TestFixtureSetUp]
 public void SetupTestEnvironment()
 {
  var usersMockedRepo = new Mock&lt;IRepository&lt;User&gt;&gt;();
  
  var users = new List&lt;User&gt; { new User{ Blocked = false, Email = &quot;david@someplace.com&quot;, Password = &quot;a password&quot; } };
  var list = new QueryableList&lt;T, int&gt;(users);
  
  //Mock GetAll
  usersMockedRepo.Setup(x =&gt; x.GetAll()).Returns(list);
  
  //Mock the Get
  usersMockedRepo.Setup( x =&gt; x.Get( It.IsAny&lt;int&gt;() ))
      .Returns( (int id) =&gt; list.AsQueryable()
      .SingleOrDefault(x =&gt; x.Id.Equals(id)));
  
  //Mock the SaveOrUpdate using our own
  usersMockedRepo.Setup(x =&gt; x.SaveOrUpdate(It.IsAny&lt;T&gt;()))
      .Callback((T entity) =&gt; list.UpdateEntity(entity));
      
  //Mock the delete
  usersMockedRepo.Setup(x =&gt; x.Delete(It.IsAny&lt;T&gt;())).Callback((T entity) =&gt; list.Remove(entity));
  _membership = new Membership(usersMockedRepo.Object);
 }
}

Now, we have setup our very own mocked repository. We need to make a test now for the IsValidUser method we left before. Let’s write a simple test case:

[TestCase]
public void CheckBasicAuthentication()
{
 var checkValidUser = Membership.Instance.IsValidUser("david.conde@gmail.com", "a password");
 var checkInvalidUser = Membership.Instance.IsValidUser("david.conde@gmail.com", "another password");

 Assert.AreEqual(checkInvalidUser, false);
 Assert.AreEqual(checkValidUser, true);
}

And that’s it! We have our own test and we can now create as many test cases as we want all relying on a simple structure like a list. There is one final thought here, which came to mind while reading this StackOverflow post. The idea is to put the setup into a helper method, so we can reuse it with different test scenarios:

Please note that the following code can induce headaches 🙂

public static class MockExtensions
{
 public static void SetupIQueryableTypedRepository&lt;T, TId&gt;
  (this Mock&lt;IRepositoryWithTypedId&lt;T, TId&gt;&gt; mockObject, IEnumerable&lt;T&gt; source)
  where T : EntityWithTypedId&lt;TId&gt; where TId : IComparable
 {
  var list = new QueryableList&lt;T, TId&gt;(source);

  mockObject.Setup(x =&gt; x.GetAll()).Returns(list);
  mockObject.Setup(x =&gt; x.Get(It.IsAny&lt;TId&gt;())).Returns((TId id) =&gt; list.AsQueryable().SingleOrDefault(x =&gt; x.Id.Equals(id)));

  mockObject.Setup(x =&gt; x.SaveOrUpdate(It.IsAny&lt;T&gt;())).Callback((T entity) =&gt; list.UpdateEntity(entity));
  mockObject.Setup(x =&gt; x.Delete(It.IsAny&lt;T&gt;())).Callback((T entity) =&gt; list.Remove(entity));
 }

 public static void SetupIQueryableRepository&lt;T&gt;(this Mock&lt;IRepository&lt;T&gt;&gt; mockObject, IEnumerable&lt;T&gt; source)
  where T : Entity
 {
  var list = new QueryableList&lt;T, int&gt;(source);

  mockObject.Setup(x =&gt; x.GetAll()).Returns(list);
  mockObject.Setup(x =&gt; x.Get(It.IsAny&lt;int&gt;())).Returns( (int id) =&gt; list[id] );
  
  mockObject.Setup(x =&gt; x.SaveOrUpdate(It.IsAny&lt;T&gt;())).Callback( (T entity) =&gt; list.UpdateEntity(entity) );
  mockObject.Setup(x =&gt; x.Delete(It.IsAny&lt;T&gt;())).Callback((T entity) =&gt; list.Remove(entity));
 }
}

Now, we can reduce our Setup method to this:

[TestFixtureSetUp]
public void SetupTestEnvironment()
{
 var usersMockedRepo = new Mock&lt;IRepository&lt;User&gt;&gt;();
 var users = new List&lt;User&gt; { new User{ Blocked = false, Email = &quot;david@someplace.com&quot;, Password = &quot;a password&quot; } };
 
 usersMockedRepo.SetupIQueryableRepository(users);
 _membership = new Membership(usersMockedRepo.Object);
}

If you just want to be able to test your S#arp Lite repositories using this, then just get this extension to your code, set your mocked repositories using this idea and done! If you have any other thoughts, let me know on the comments!

Lucene2Objects roadmap

Since the release of Lucene2Objects I’ve seen a few people downloading it and some have left a couple comments, some have directly emailed me and others even found me on BitBucket and sent me a message over there. I think that is awesome, which is why I plan to devote a bit more of my time to get more cool stuff into Lucene2Objects, also there is the fact that I actively use it on a couple projects 🙂

So far, there is only one thing that comes to mind that we need to have released on L2O, which is the fact of self tracking objects, that is having the object been able to track any changes to it and update them on the Index. I’ve been devoting some time into this matter and I’ve come with a couple of ideas, but to get them, there is a couple of basic principles I need to keep in mind:

  • Lucene2Objects tracking system needs to be as unobtrusive as possible (And not because the word is trendy)
  • As it has been so far, getting the tracker up and running should be easy for anyone not interested in getting too deep
  • If somebody is indeed interested in going deep it needs to be also easier than going into Lucene .NET(which is not easy at all!!)

But, if you think of it, these are just basic design principles, the first one is basically YAGNI and KISS and the other two are just plain old Bertrand Meyer’s Open Closed Principle. Despite the fact that these should rule most development projects, the actual panorama is not like that. Anyways, I digress…

The other thing that so far I think that is necessary in order to have self-tracking entities (and collections, which are after all, entities) is that to be able to do some self-tracking, the entities to be tracked need to have some sort of primary key, or something that distinguishes them. Having said that, I can think only of forcing the use of a primary key on any entity to be tracked, for instance, using the [PrimaryKey] attribute. The alternative is generating a key somehow, but since you don’t always (actually almost never…) know which fields identify an entity, there is no way of warrantying that this is a safe or unique procedure.

There is an idea for Sharp Lite users which is using the [DomainSignature] attribute for auto generating the so called key. For those of you who don’t know what SharpLite is, is a really nice development framework made mostly by Billy McAfferty for developing .NET apps, and although it has a couple detractors it’s a really good way to start any small and mid-sized projects. So, for all of you developers reading this post and interested on how L2O progresses, Is there something you want to see on L2O? Feel free to comment or email me, I’m interested on hearing your thoughts.