This project is read-only.

lazy loading relationships scalability problems

Topics: Developer Forum, Project Management Forum
Jan 6, 2008 at 7:18 PM
I was trying to speed up the Glitz site and found a fairly significant
bottle neck (I think we had discussed this previously). Anyway, lazy loading
the properties for objects in a collection ends up not being very scalable.

Current:

ProductCollection products = ProductCollection.FetchX(...);

foreach (Product product in products)
{
// doing something here that triggers a hit to the db
// can be very bad and a big perf hit

if (product.Categories.SomethingHere(...))
{
// ...
}
}

Proposed (similar to what they do in LINQ to SQL):

ProductCollection products = ProductCollection.FetchX(...,
PreloadCategories);

foreach (Product product in products)
{
// categories have been already been loaded for each product
// in the current ProductCollection context...

if (product.Categories.SomethingHere(...))
{
// ....
}
}

Doing this, of course, is pretty complex. I believe that LINQ to SQL allows
you to take in a Func<T> to describe the items that need to be preloaded.
That Func<T> then gets translated into SQL. Crazy stuff! But, with
NBusiness, I think there are some things you can come up with in way of code
generating the caching options with, perhaps, a Flags enum that is available
in all Fetch operations. This allows the caller to be in control of the
preloading operation -- as having NBusiness try to infer that would probably
be impossible.
Jan 6, 2008 at 7:49 PM
Edited Jan 6, 2008 at 8:22 PM
There is definitely some room for improvement in this system and I"ll put some good thought in it for v3.0 because it would be a good idea but there are currently two ways to handle this two improve performance, one is only sometimes applicable though.

Use a cached collection template
There is a builtin template called CachedCollectionTemplate built into NBusiness that is useful for entities that rarely, if ever change. In this situation, if your categories were largely static then if you used this template it would not hit the database when loading the collection but would instead get it out of the cache. This of course is only applicable for collections that do not really change.

Use a custom fetch handler
By default you will get your fetch methods generated for you and in those default factories you will get all of your products lazy loaded for you, just as you mention in the above post, but you can do a little bit of custom code in the ProductCollection.cs custom partial class to get the desired effects with better performance.

So the real problem is that sometimes you want to get the categories with the products but sometimes you do not, the lazy loading is there because if you didn't have it you would end up fetch the entire database everytime you loaded your products, which is bad. BUT you can add some custom code to specifically tell the entity when to load certain other data explicitly for certain use cases (or all usecases if you wanted). The solution I would reccommend is to first create a new factory method, with the method name patterned as [Entity]Collection.FetchBy[X]With[Relationship]();

The "With[Relationship]" is the relationship implied to be lazy loaded. Or perhaps "For[UseCase]" would be more appropriate? Anyway, what this factory method would do would be to call the constructor with a new custom Criteria object which you will use to indicate that you would like that relationship to be explicitly loaded at fetch time. You will create that critieria object and have it inherit from the CriteriaBase. Something like:

private class FetchAllWithCategoriesCriteria : CriteriaBase
{
   //...
}
 
public static ProductCollection FetchAllWithCategories()
{
     return new ProductCollection(new FetchAllWithCategoriesCriteria);
}

Next you will need to add a handler to the Fetched event in the ProductCollection class and the code to do the forced loading:

public override Initialize()
{
     this.Fetched += new EventHandler<FetchEventArgs>(Products_Fetching);
}
 
private void Products_Fetching(object sender, FetchEventArgs e)
{
     if(e.Criteria is FetchAllWithCategoriesCriteria)  //look for our custom criteria!
     {
          //FetchAll as normal using the singleton Database object. Omitted for berevity.
 
          // Next explicitly load the categories
          foreach(Product p in this)
          {
               p.LoadCategories(e.Connection); //Use the same connection to load the category objects.
          }
     }
}

And finally all we need is the LoadCategories method:

private void LoadCategories(DbConnection cn)
{
     _Categories = CategoryCollection.FetchByProduct(cn, _productId);
}

In this situation you will be making 1 query to fetch all of the products and then one more query for each product in your collection to fetch their applicable categories. You could imagine that in the LoadCategories method you could potentially call another "With" factory to cause another layer of explicit loading of a relationship of categories as well. I don't think there is anyway to end up with being able to get away with doing less queries for situations like this... other than to perhaps create a stored procedure that uses a cursor to iterate through the selection of products and fetch their relatives then pass the DbCommand object around instead of the DbConnection so you can call NextResult() and load the categories that way. You might actually be able to do that right now by handling the Loaded method of the entity rather than the Fetched method. I cannot remember off hand but if not then that is a good feature request candidate for v3.0 (or 2.2). This would allow you to do one query to fetch all of the needed data, which would be very nice.

Does this make sense? It seems to be a fairly involved solution but if you understand the lifecycle of the entity it's not that bad and the amount of code really isn't that much either.

You could probably do the SQL as something like this:
  CREATE PROCEDURE ProductCollectionFetchAllWithCategories
  {
    SELECT * FROM [Product];
 
    SELECT PC.ProductId, C.* FROM [Category] AS C
    INNER JOIN [ProductCategory] AS PC ON PC.[CategoryId]=C.[CategoryId]; 
 
    -- the second select can be accessed by calling NextResult() in the DataReader object
  }

Then just sort out which categories get loaded from the second collection in code using the ProductId value. I like this idea actually.