We have moved to GitHub Issues
Created by Shannon Deminick 17 Dec 2015, 11:02:52 Updated by Stefan Kip 28 Jan 2016, 11:25:58
Is duplicated by: U4-6503
Is duplicated by: U4-7333
Relates to: U4-7807
Currently the GetAll repository level cache will cache individual entities with their key, this is so that we are not double caching entities if we are resolving them by Id. This works ok for repositories that don't rely on the result of GetAll, or don't have GetAll called very often, but for the ones that do, this means that we are doing a ton of string comparisons when returning the result of GetAll.
The main problem there is that we have a single/global runtime cache (single dictionary). The number of keys in here could grow to massive amounts so the amount of string comparisons to find items in the cache could be in the millions. We need to allow repositories to have isolated caches (their own dictionaries).
Repositories that rely on the result of GetAll for all repository queries - these now use a custom caching policy to cache the full data set (GetAll) into a single collection item.
FullDataSetRepositoryCachePolicywhich caches the result of GetAll in a single collection and doesn't cache individual items
Any reason why each entity type can't get it's own dictionary instead of using the Key.StartsWith method?
And don't forget about the translation dictionary
@lars-erik Yup, we can look into that too... that way the single massive dictionary is not iterated entirely (that's what you mean right ?).
One tricky part to this is that the cache currently relies on individual entities because:
So we could:
Would be great. :)
The current repo factory injects a CacheHelper to all the repos. It could instead inject an individual caching strategy into each repo. So you could inject a DeepCloneableCache etc. for entities that use it, or just a ReallySimpleNaiveDictionaryCache for the ones that are really simple. :)
That would also allow us to change the caching behavior of individual entity types when performance converges between in-memory, SQL or other mechanisms.
@lars-erik about the dictionary (translation) items, currently it doesn't do anything special for GetAll(), it uses the default procedure of caching individual entries if GetAll() returns < 100 items. Otherwise if it returns more, than GetAll() doesn't cache. I'm not really sure how often GetAll() is really called for dictionary items, I'd assume it shouldn't be called very often?
It might go directly to the right key, but when I profiled, 255 calls to library.GetDictionaryItem hit the deep-clone stuff 25.000 times. Stephane saw it when I showed him around. I've cached each dictionary item further out, since we hit the same key multiple times.
Hrm... i'll need to investigate that. It's worth noting however that the front-end calls (generally) for dictionary items will end up using
ICultureDictionary which can be replaced at startup with
CultureDictionaryFactoryResolver . This is what library.GetDictionaryItem uses
ah, right. 'cause that should really just be a dictionary of dictionaries of dictionaries. ;)
The other problem with the dictionary models' is that
IDictionaryItem contains a collection of
IDictionaryTranslation which contains an
ILanguage... all of which is deep cloneable. These deep object graphs are an issue for caching for obvious reasons. In v8, we'll be changing the way these models work so we're not caching deep object graphs, we've also worked around this for a few other models but things like this still exist unfortunately. This one might be particularly ugly because of the amount of dictionary items that might exist. I'll investigate some work arounds for this too.
Great! Assign me a task if I can help during the holidays. :)
Oh, and I still feel we miss the option to query the database via some Linq provider in controllers. Currently afaik. GetAll will thrashe the cache no matter which entity you query for. There's probably lots of use-cases where you'd want to query some entity store and can't use the built-in subqueries.
Hence the suggestion to return IQueryables from GetAll, and decide based on entity count whether it should return a LINQ to PetaPoco queryable, or just a list from memory.
Linq provider + IQueryable (properly), is v8+. I started on a prototype in my LinqPad project. It's not easy and won't support a lot of things. Doing IQueryable from a full IEnumerable (in memory) representation is fine, but that could be a ton of memory. In any case, we cannot do anything like that until v8.
v8 is fine. Wouldn't it be legit to expect that kind of support from the ORM? (iow. PetaPoco)
It wouldn't have to be the entities, it could be the petapoco dtos, or even projections of those.
PetaPoco is a micro-ORM, it's not like NH or EntityFramework. Further more, our database doesn't support normal ORM procedures, we don't have flat tables for our entities (apart from things like Users), our tables defined schemas to then store data. So PetaPoco (or any ORM) would not be able to give us IQueryable the way we'd want to use it.
I'm just trialing a few things regarding Repository cache's, here's some more info about why one of our options is tricky:
Let's say we want to have a cache per entity type for a given repository (i.e. we have a standalone cache container for languages so we are not iterating over every key of a single application wide cache)... We can achieve this fairly easily by injecting a
CacheHelper that uses a new instance of
ObjectCacheRuntimeCacheProvider (which uses
System.Runtime.Caching.ObjectCache). But, the problem is that we used to sort of do this a long time ago and this causes some issues because this cache needs to be invalidated by objects outside of the repositories... namely the Cache Refreshers so that load balancing works. The CacheRefreshers rely on the fact that there's a single place to clear cache (i.e. the
ApplicationContext.ApplicationCache). The CacheRefreshers know how to clear entities by type, by key, by key search, etc... In order for that to be achieved, we'd need to update the CacheHelper to support having multiple
IRuntimeCacheProvider assigned by types. The default RuntimeCache returned by the CacheHelper would be the global one, but we would be able to add runtime caches to the cache helper by type/name and have a method to return the IRuntimeCacheProvider from the cache helper by type/name. Then we'd have to update all of the cache refreshers to ensure they are looking up and clearing from the correct provider.
I'm sure there's ways around both these issues. Have to trial a few things myself to come up with a good idea though. Maybe that's the task for the holidays. :)
Reg. the clearing of cache - a good start would be to refactor the search by key stuff into a dictionary of dictionaries. (entity type as key, dictionary of ids/entities) So keep the interface and singleton for IRuntimeCacheProvider, but fix the inside. Means you could essentially clear the entire cache, or potentially clear less based on associations.
I wouldn't be surprised if there was some well architected open-source cache project out there...
You can't do that unfortunately (i just tried), because the underlying HttpRuntime or ObjectCache is the thing in charge of clearing cache based on the time limits you specify. So you cannot add a Dictionary as a single item to ObjectCache, because all entities in there will be on the same time limit and if it decides it wants to eject that key, than all items are gone. This is why the only real way is to have an ObjectCache per Type - this would be an isolated dictionary that is still governed by the rules of aspnet's caching logic per key.
Sounds like that's the way to go then.
Any specific reason you're using ObjectCache over HttpCache, or even static dictionaries?
Or why individual entities should be cached for individual timespans?
I've done a bit of work today, happy to collab over the next few weeks, we need to get some fixes in for 7.4 final.
Here's a PR if you want to discuss code inline: https://github.com/umbraco/Umbraco-CMS/pull/963
This is based off of the branch temp-U4-7558, so you can just update to that branch in the core if you want to collab.
Here's what's been done:
Here's what would/could need to be done:
IDictionaryItemand see if this could be at all possible to change so that it doesn't have such a large object graph, so that it doesn't require so much cloning of sub-elements--
Based on the work I've done though, this should 'fix' a lot of the key/string comparisons and reduce the amount of individual items in the cache.
Sweet. I'll have look next week.
@lars-erik I want to pull this in to 7.3.5 now, any chance you've done any testing with this ?
Not yet. Can try to jump through some hoops and do a test this week.
I've done a smoketest of 20 random pages now. Ran the "session" twice to see load times with and without content in the misc caches. (7.3.5 below is the u4-7558-2 branch)
7.3.2 - First pass - Avg. time: 2,003 sec 7.3.2 - Second pass - Avg. time: 0,734 sec
7.3.5 - First pass - Avg. time: 1,286 sec 7.3.5 - Second pass: Avg. time: 0,225 sec
Seems like a really good improvement, if you ask me. :)
(This is without the workaround caches I made)
@lars-erik wonderful!! and super glad you saw the temp branch too :) I'll be working on this more today.
@lars-erik I've committed a TON of work today on this. There is definitely no more double caching going on and the dictionary cloning should be much faster (when there are large amounts) and the memory usage should be much lower.
If you could please test again that would be super fantastic and then we can get this reviewed and merged for 7.3.5 release.
PR for review: https://github.com/umbraco/Umbraco-CMS/pull/1002
Did a new testrun. Minor increase in first pass time. 0.09 seconds increase on average. :P First request now averages to 1.38.
Seconds pass however average to 0.21 seconds, which is a .02 second improvement.
Overall, still thumbs up!
The mem usage should be a bit lower too - but not sure you could definitively test that. Great new though!
I could profile a lot more, but we're in the middle of rewriting our "starter site" before starting our biggest site for 2016, so a bit short on time. :)
@zpqrtbnk I think you forgot to delete the temp- branch from github ;)
Just for the record I'd like to mention that this is a breaking change at the moment: https://github.com/umbraco/Umbraco-CMS/commit/b91661cf65bfdf27001b702d388c9fe6b60b3249#commitcomment-15729965 Already notified @sebastiaan on Twitter: https://twitter.com/kipusoep/status/692668534325956608
Backwards Compatible: True
Affected versions: 7.0.0, 7.1.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.1, 7.2.0, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.3.0, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.3.1, 7.3.2, 7.3.3, 7.3.4
Due in version: 7.3.5
Sprint: Sprint 6