U4-7558 - Fix GetAll repository level caching for repositories that return a query result based on the result of GetAll

Created by Shannon Deminick 17 Dec 2015, 11:02:52 Updated by Stefan Kip 28 Jan 2016, 11:25:58

Is duplicated by: U4-6503

Is duplicated by: U4-7333

Relates to: U4-7807

Currently the GetAll repository level cache will cache individual entities with their key, this is so that we are not double caching entities if we are resolving them by Id. This works ok for repositories that don't rely on the result of GetAll, or don't have GetAll called very often, but for the ones that do, this means that we are doing a ton of string comparisons when returning the result of GetAll.

The main problem there is that we have a single/global runtime cache (single dictionary). The number of keys in here could grow to massive amounts so the amount of string comparisons to find items in the cache could be in the millions. We need to allow repositories to have isolated caches (their own dictionaries).

Repositories that rely on the result of GetAll for all repository queries - these now use a custom caching policy to cache the full data set (GetAll) into a single collection item.

  • DomainRepository
  • LanguageRepository
  • PublicAccessRepository
  • TemplateRepository

Other changes:

  • Added an IsolatedCache manager
  • Changed the RepositoryBase to always return an Isolated runtime cache based on it's entity type, this means that each repository has it's own dictionary
  • Created new Cache Policies so that repositories have full control over how things are put into cache and taken out of it.
  • The above 4 repo's now use a custom Cache Policy: FullDataSetRepositoryCachePolicy which caches the result of GetAll in a single collection and doesn't cache individual items
  • The FullDataSetRepositoryCachePolicy utilizes a new custom DeepCloneableList collection that implements IDeepCloneable and IRememberBeingDirty so that it performs these cloning and clearing operations on all sub elements when the cache inserts and retrieves
  • Fixes a few cache refreshers (i.e. when deleting a language we never cleared the dictionary caches)
  • Update all of the CacheRefreshers to clear cache from the correct isolated caches instead of on the default RuntimeCache instance
  • Updates the DictionaryItem and Translation models because they have an attached ILanguage object for each translation (yikes) so if you have a lot of dictionary items, memory consumption will be high and it takes a ton of cycles to clone in and out. So now when these items are returned from the service a delegate is assigned to them which will lazily load in it's attached ILanguage (which is ignored from caching and cloning)
  • Dictionary repository has been setup with a custom policy to never cache GetAll, it only caches individual items and the methods that cache these individual items are the ones that are used all of the time by the ICultureDictionary
  • A problem with the base repository existed where things were being double cached (especially the above 4 repos) because they would do a GetAll (which was cached) from with a Get(id) which the result was cached too.
  • Template repository has been hugely simplified since it now simply works by basing everything on GetAll (full list is cached)

Comments

Lars-Erik Aabech 17 Dec 2015, 11:14:11

Any reason why each entity type can't get it's own dictionary instead of using the Key.StartsWith method?


Lars-Erik Aabech 17 Dec 2015, 11:14:50

And don't forget about the translation dictionary


Shannon Deminick 17 Dec 2015, 11:18:26

@lars-erik Yup, we can look into that too... that way the single massive dictionary is not iterated entirely (that's what you mean right ?).

One tricky part to this is that the cache currently relies on individual entities because:

  • When inserting into the cache we need to check for IDeepCloneable and IRememberBeingDirty so that when the entity goes into the cache, it is first deep cloned and then change tracking is reset
  • When coming back out of the cache we also must ensure the same process

So we could:

  • create a custom List that implements both these interfaces to be added to the single cache entry and it would take care of performing this operation on each item
  • And/Or... do what @lars-erik has mentioned and change the entire underlying cache to be dictionary of dictionaries as to not iterate the whole structure


Lars-Erik Aabech 17 Dec 2015, 11:23:17

Would be great. :)

The current repo factory injects a CacheHelper to all the repos. It could instead inject an individual caching strategy into each repo. So you could inject a DeepCloneableCache etc. for entities that use it, or just a ReallySimpleNaiveDictionaryCache for the ones that are really simple. :)

That would also allow us to change the caching behavior of individual entity types when performance converges between in-memory, SQL or other mechanisms.


Shannon Deminick 17 Dec 2015, 12:15:00

@lars-erik about the dictionary (translation) items, currently it doesn't do anything special for GetAll(), it uses the default procedure of caching individual entries if GetAll() returns < 100 items. Otherwise if it returns more, than GetAll() doesn't cache. I'm not really sure how often GetAll() is really called for dictionary items, I'd assume it shouldn't be called very often?


Lars-Erik Aabech 17 Dec 2015, 12:20:27

It might go directly to the right key, but when I profiled, 255 calls to library.GetDictionaryItem hit the deep-clone stuff 25.000 times. Stephane saw it when I showed him around. I've cached each dictionary item further out, since we hit the same key multiple times.


Shannon Deminick 17 Dec 2015, 12:24:40

Hrm... i'll need to investigate that. It's worth noting however that the front-end calls (generally) for dictionary items will end up using ICultureDictionary which can be replaced at startup with CultureDictionaryFactoryResolver . This is what library.GetDictionaryItem uses


Lars-Erik Aabech 17 Dec 2015, 12:27:45

ah, right. 'cause that should really just be a dictionary of dictionaries of dictionaries. ;)


Shannon Deminick 17 Dec 2015, 12:32:54

The other problem with the dictionary models' is that IDictionaryItem contains a collection of IDictionaryTranslation which contains an ILanguage... all of which is deep cloneable. These deep object graphs are an issue for caching for obvious reasons. In v8, we'll be changing the way these models work so we're not caching deep object graphs, we've also worked around this for a few other models but things like this still exist unfortunately. This one might be particularly ugly because of the amount of dictionary items that might exist. I'll investigate some work arounds for this too.


Lars-Erik Aabech 17 Dec 2015, 12:35:50

Great! Assign me a task if I can help during the holidays. :)


Lars-Erik Aabech 17 Dec 2015, 12:50:22

Oh, and I still feel we miss the option to query the database via some Linq provider in controllers. Currently afaik. GetAll will thrashe the cache no matter which entity you query for. There's probably lots of use-cases where you'd want to query some entity store and can't use the built-in subqueries.

Hence the suggestion to return IQueryables from GetAll, and decide based on entity count whether it should return a LINQ to PetaPoco queryable, or just a list from memory.


Shannon Deminick 17 Dec 2015, 13:06:48

Linq provider + IQueryable (properly), is v8+. I started on a prototype in my LinqPad project. It's not easy and won't support a lot of things. Doing IQueryable from a full IEnumerable (in memory) representation is fine, but that could be a ton of memory. In any case, we cannot do anything like that until v8.


Lars-Erik Aabech 17 Dec 2015, 13:14:23

v8 is fine. Wouldn't it be legit to expect that kind of support from the ORM? (iow. PetaPoco)

It wouldn't have to be the entities, it could be the petapoco dtos, or even projections of those.


Shannon Deminick 17 Dec 2015, 13:57:09

PetaPoco is a micro-ORM, it's not like NH or EntityFramework. Further more, our database doesn't support normal ORM procedures, we don't have flat tables for our entities (apart from things like Users), our tables defined schemas to then store data. So PetaPoco (or any ORM) would not be able to give us IQueryable the way we'd want to use it.

I'm just trialing a few things regarding Repository cache's, here's some more info about why one of our options is tricky:

Let's say we want to have a cache per entity type for a given repository (i.e. we have a standalone cache container for languages so we are not iterating over every key of a single application wide cache)... We can achieve this fairly easily by injecting a CacheHelper that uses a new instance of ObjectCacheRuntimeCacheProvider (which uses System.Runtime.Caching.ObjectCache). But, the problem is that we used to sort of do this a long time ago and this causes some issues because this cache needs to be invalidated by objects outside of the repositories... namely the Cache Refreshers so that load balancing works. The CacheRefreshers rely on the fact that there's a single place to clear cache (i.e. the ApplicationContext.ApplicationCache). The CacheRefreshers know how to clear entities by type, by key, by key search, etc... In order for that to be achieved, we'd need to update the CacheHelper to support having multiple IRuntimeCacheProvider assigned by types. The default RuntimeCache returned by the CacheHelper would be the global one, but we would be able to add runtime caches to the cache helper by type/name and have a method to return the IRuntimeCacheProvider from the cache helper by type/name. Then we'd have to update all of the cache refreshers to ensure they are looking up and clearing from the correct provider.


Lars-Erik Aabech 17 Dec 2015, 14:07:46

I'm sure there's ways around both these issues. Have to trial a few things myself to come up with a good idea though. Maybe that's the task for the holidays. :)

Reg. the clearing of cache - a good start would be to refactor the search by key stuff into a dictionary of dictionaries. (entity type as key, dictionary of ids/entities) So keep the interface and singleton for IRuntimeCacheProvider, but fix the inside. Means you could essentially clear the entire cache, or potentially clear less based on associations.

I wouldn't be surprised if there was some well architected open-source cache project out there...


Shannon Deminick 17 Dec 2015, 14:19:15

You can't do that unfortunately (i just tried), because the underlying HttpRuntime or ObjectCache is the thing in charge of clearing cache based on the time limits you specify. So you cannot add a Dictionary as a single item to ObjectCache, because all entities in there will be on the same time limit and if it decides it wants to eject that key, than all items are gone. This is why the only real way is to have an ObjectCache per Type - this would be an isolated dictionary that is still governed by the rules of aspnet's caching logic per key.


Lars-Erik Aabech 17 Dec 2015, 15:23:05

Sounds like that's the way to go then. Any specific reason you're using ObjectCache over HttpCache, or even static dictionaries?
Or why individual entities should be cached for individual timespans?


Shannon Deminick 17 Dec 2015, 15:26:12

Yes:

  • HttpCache is a single dictionary ... this is what is happening now and why you see this single problem with one large dictionary
  • ObjectCache is the newer version of HttpCache and applies all of the correct expiration policies that caches need to do ... but since it's isolated, it's an isolated dictionary
  • static cache - no expiration, no cache policies, this is non-managed cache which would kill memory


Shannon Deminick 17 Dec 2015, 15:35:04

I've done a bit of work today, happy to collab over the next few weeks, we need to get some fixes in for 7.4 final.

Here's a PR if you want to discuss code inline: https://github.com/umbraco/Umbraco-CMS/pull/963

This is based off of the branch temp-U4-7558, so you can just update to that branch in the core if you want to collab.

Here's what's been done:

  • Changed the RepositoryBase to always return an Isolated runtime cache based on it's entity type, this means that each repository has it's own dictionary
  • Created a new GetAll cache option: GetAllCacheAsCollection - this is used for repositories that use the result of GetAll() to do individual lookups and will cache a single List instance in cache for all entities instead of individually
  • Have applied the GetAllCacheAsCollection to Language and Domain repositories

Here's what would/could need to be done:

  • --Create a custom List collection that implements IDeepCloneable and IRememberBeingDirty so that it performs these cloning and clearing operations on all sub elements when the cache inserts and retrieves--
  • --Identify which other repositories could/should use GetAllCacheAsCollection (see issue notes, 4 repo's now use this)--
  • --Any repository that is using GetAllCacheAsCollection, need to determine the required changes for cache refreshers--
  • --Update all of the CacheRefreshers to clear cache from the correct isolated caches instead of on the default RuntimeCache instance--
  • --Look at the current state of the model IDictionaryItem and see if this could be at all possible to change so that it doesn't have such a large object graph, so that it doesn't require so much cloning of sub-elements--
  • --Investigate how the dictionary repository works now and how the lookups affect performance - perhaps it's almost fine the way it is and instead we focus on a faster readonly cache implementation for ICultureDictionary for which the back office and front-end use--
  • (I'm sure there's more)


Shannon Deminick 17 Dec 2015, 15:37:04

Based on the work I've done though, this should 'fix' a lot of the key/string comparisons and reduce the amount of individual items in the cache.


Lars-Erik Aabech 17 Dec 2015, 15:39:58

Sweet. I'll have look next week.


Shannon Deminick 06 Jan 2016, 12:37:13

@lars-erik I want to pull this in to 7.3.5 now, any chance you've done any testing with this ?


Lars-Erik Aabech 06 Jan 2016, 13:31:44

Not yet. Can try to jump through some hoops and do a test this week.


Lars-Erik Aabech 07 Jan 2016, 09:08:02

I've done a smoketest of 20 random pages now. Ran the "session" twice to see load times with and without content in the misc caches. (7.3.5 below is the u4-7558-2 branch)

7.3.2 - First pass - Avg. time: 2,003 sec 7.3.2 - Second pass - Avg. time: 0,734 sec

7.3.5 - First pass - Avg. time: 1,286 sec 7.3.5 - Second pass: Avg. time: 0,225 sec

Seems like a really good improvement, if you ask me. :)

(This is without the workaround caches I made)


Shannon Deminick 07 Jan 2016, 09:11:16

@lars-erik wonderful!! and super glad you saw the temp branch too :) I'll be working on this more today.


Lars-Erik Aabech 07 Jan 2016, 09:13:57

KUTGW! :)


Shannon Deminick 07 Jan 2016, 18:02:13

@lars-erik I've committed a TON of work today on this. There is definitely no more double caching going on and the dictionary cloning should be much faster (when there are large amounts) and the memory usage should be much lower.

If you could please test again that would be super fantastic and then we can get this reviewed and merged for 7.3.5 release.


Shannon Deminick 07 Jan 2016, 18:13:57

PR for review: https://github.com/umbraco/Umbraco-CMS/pull/1002


Lars-Erik Aabech 08 Jan 2016, 13:04:32

Did a new testrun. Minor increase in first pass time. 0.09 seconds increase on average. :P First request now averages to 1.38.

Seconds pass however average to 0.21 seconds, which is a .02 second improvement.

Overall, still thumbs up!


Shannon Deminick 08 Jan 2016, 13:13:31

The mem usage should be a bit lower too - but not sure you could definitively test that. Great new though!


Lars-Erik Aabech 08 Jan 2016, 13:16:25

I could profile a lot more, but we're in the middle of rewriting our "starter site" before starting our biggest site for 2016, so a bit short on time. :)


Stephan 11 Jan 2016, 14:14:16

merged


Shannon Deminick 11 Jan 2016, 16:58:44

@zpqrtbnk I think you forgot to delete the temp- branch from github ;)


Stefan Kip 28 Jan 2016, 11:25:58

Just for the record I'd like to mention that this is a breaking change at the moment: https://github.com/umbraco/Umbraco-CMS/commit/b91661cf65bfdf27001b702d388c9fe6b60b3249#commitcomment-15729965 Already notified @sebastiaan on Twitter: https://twitter.com/kipusoep/status/692668534325956608


Priority: Critical

Type: Bug

State: Fixed

Assignee:

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions: 7.0.0, 7.1.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.1, 7.2.0, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.3.0, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.3.1, 7.3.2, 7.3.3, 7.3.4

Due in version: 7.3.5

Sprint: Sprint 6

Story Points:

Cycle: