U4-10756 - Performance of UmbracoHelper TypedContent is real slow with guids / udis

Created by Dennis Boonk 13 Dec 2017, 09:23:50 Updated by Sebastiaan Janssen 18 Apr 2018, 11:53:47

Tags: Gold partner

Relates to: U4-10833

Relates to: U4-10854

Relates to: U4-11208

Hi guys,

we're currently building a website with a lot of articles and when we've imported a lot of articles last week (8000+ articles) our website became pretty slow. Some pages took more than 8 seconds to render, and we didn't have these problems when we had only around 100 articles. We looked at the problem and found out the following:

  • The performance of using the UmbracoHelper.TypedContent with a guid or udi is real slow in a bigger site (umbraco.config size > 40mb, nodes count > 8000). The code:

Umbraco.TypedContent("55e4bd60-4fff-4f3a-9af8-ec9fa4bd2fee")

or

Umbraco.TypedContent("umb://document/55e4bd60-4fff-4f3a-9af8-ec9fa4bd2fee")

takes about 40ms to execute, if it’s a guid of a node near the end of the umbraco.config. This is because in the core the following xpath query is executed:

"//* [@isDoc and @key=$guid]”

This way it is traversing the complete tree to find the content with a given key/guid.

If we try to get the same content, but look it up via the id (instead of the UDI) is takes less than 1 ms:

Umbraco.TypedContent(8066)

If we use the Umbraco.TypedContent for 100 items the difference is 100ms (100x1ms) with ids vs 4000ms (100x40ms) with guids/udis.

So the loadtime by searching by UDI is almost 40 times as slow!

In our case we've used a MultiNodeTreePicker, and TypedContent is used on the output of that. The given site has a complex structure with a lot of multilingual nodes and MultiNodeTreePickers.

In our opinion this is something that should be fixed in Umbraco Core, but I hope you can shine your light on it!

Kind regards, Dennis

Comments

Stephan 15 Dec 2017, 07:51:41

What happens is... all we have is this big Xml document. When you get-by-id (integer id) we can use the find-node-by-id feature of Xml documents which is somewhat optimized (no idea how, but ids are a special thing in Xml so I can imagine the document having some sort of index). When you get-by-guid there is nothing we can use, so we rely on an XPath query which has to be slower, yes.

It is definitively something that you would expect Core to manage.

We would need to build an additional cross-reference map of guid-to-id somewhere in the content cache, above the Xml document. So... I'd say feature request, which makes total sense.


Jeffrey Schoemaker 15 Dec 2017, 08:04:06

Hi @zpqrtbnk, thanks for your input. But a feature request sounds like 'not soon in core'. Could you recommend anything for a shortterm hack/fix/work-around?

Thanks, Jeffrey


Stephan 15 Dec 2017, 11:12:13

Only proper solution I can see is to implement your own guid-to-id map, which means hook into "published" events and register guid-to-id in some sort of dictionary. Conversely, handle "unpublished" events and clear the dictionary. Then, instead of getting-by-guid, do a quick dictionary lookup and then get-by-id.

Making sense?


Asbjørn Riis-Knudsen 20 Dec 2017, 12:00:58

@zpqrtbnk What about v8? With no XML cache, I'd imagine things will look different...


Stephan 20 Dec 2017, 12:37:05

v8 will have both an id-to-content and a guid-to-content lookup indexes, so perfs should be identical


Dennis Boonk 22 Dec 2017, 07:34:32

Hi @zpqrtbnk, Thanks for your input. Is it possible that we can inject our own implementation of the umbracohelper so we can override the TypedContent method and use a guid-to-id-map? Therefore, every property value converter and piece of code who uses the umbracohelper uses the new class. So we can tackle the problem at one place.

Alternatively, inject by default a custom ContentQuery in the umbracohelper. I am hoping to fix the problem at one place for every piece of code that uses the guid to content method. A fix for umbraco, property value converters, razor and our custom code.

What do you think is the best approach to fix the performance problem?

Thanks, Dennis


Daniël Knippers 29 Dec 2017, 09:55:40

@zpqrtbnk Since all built-in property editors now use UDIs and thus suffer from this issue, would it not make sense to implement this guid-to-id map in Core as soon as possible in v7? This is a serious performance issue, which is definitely noticable whenever anything is done with lists of IPublishedContent where each item references additional IPublishedContent items through e.g. MultiNodeTreePicker properties, as that triggers many lookups with GUIDs for each item through the new property value converters. While v8 seems really cool so far (also enjoyed your demo at UK Fest this year), it might still take a while to be released and a fix for v7 in the meantime would be greatly appreciated :)


Vince Scordino 02 Jan 2018, 00:29:07

Agree with @dknippers, we are currently at a standstill and have had to implement some terrible bandaid-caching to get us by until we begin the build of the suggested guid-to-id mapping solution. I would raise the priority of this to Critical. Also worthy to note, waiting and upgrading to v8 may introduce a load of other bugs that we may not be ready for just yet.


Peter Bridger 02 Jan 2018, 16:38:28

This issue is the primary performance bottleneck on a website that I've worked on and recently launched. It makes heavy use of multinode tree pickers to reference multiple pieces of content from another piece of content.

We've had to implement our own caching to mask the issue. Which is turn has created more issues, due to cache invalidation and other considerations.

I am very keen to see this addressed within 7.x


Lee Kelleher 03 Jan 2018, 11:33:12

A quick chime on this one (from a current XML perspective), I spent some time this morning looking into if the <!ATTLIST ... could support multiple ID attributes, but unfortunately it can not.


Andy Butland 03 Jan 2018, 11:42:38

Sharing in case useful - for anyone else looking to work-around the issue or if any use for core: https://github.com/AndyButland/UmbracoUdiToIdCache/

Have put this together working with Peter who commented above - currently untested more than a quick check over with a local starter kit, so if you do want to use it please do so your own risk, and check carefully on larger sites if it's actually helpful, due to the fact that it'll use up memory and will have a start-up hit due to the database call. Wouldn't normally put something out with so little testing but, if you take that caveat, it may help if someone is running into urgent performance issues and is happy to take on the testing task.


Niels Hartvig 03 Jan 2018, 11:51:44

@leekelleher funny - I did the same.

We will definitely prioritise this for an upcoming v7.minor during this (Q1 2018) quarter


Niels Hartvig 04 Jan 2018, 13:18:12

An update: We'll be rolling out a small performance fix for next patch that should improve performance by up to 30%. Then we'll be implementing a GUID->ID lookup table for 7.9 that should really increase performance (7.8 is being wrapped up for release so it won't be able to make it for that release).


Vince Scordino 17 Jan 2018, 02:46:31

Thanks @hartvig, great news! Is there an ETA on the patch?


Niels Hartvig 17 Jan 2018, 06:40:03

The patch is a part of vLatest. 7.9 is end of Q1.


Sebastiaan Janssen 17 Jan 2018, 07:02:20

Added a related issue for the 7.7.9 release notes U4-10833


Lee Kelleher 18 Jan 2018, 11:51:08

FWIW, I've proposed a temporary solution here: https://github.com/umbraco/Umbraco-CMS/pull/2398

I've no expectations of it being merged in as is - feel free to use it as a discussion point.


Sebastiaan Janssen 22 Feb 2018, 14:47:14

Unfortunately we didn't have enough time to put this into 7.9, I'm moving this to 7.10 as that is the first opportunity to add new database tables.


Jeffrey Schoemaker 26 Feb 2018, 07:35:22

Hi @sebastiaan, that is indeed really really unfortunate :(... Do you have any clue about the release date of 7.10?


Stephan 06 Mar 2018, 15:27:05

So... let's get this done.

Umbraco already has an id/guid cache which was added for Deploy because Deploy runs exclusively with guids and needs to do a lot of id/guid mapping. The idea is to reuse this cache. Which, technically, is not very different from the package @abutland proposed. And, it has the benefit of working for documents and medias.

Indeed, we will need to deal with medias. The solution by @leekelleher which was merged in 7.7 and relies on the Xml cache, only supports documents;

Until now the cache started empty, and each cache miss triggered a database hit. I have modified the cache to fully load every know id/guid pairs for documents and medias on the first query. New documents and medias still cause a cache miss and a database query.

So, compared to @leekelleher solution, this might be slower on the first query. But it deals with medias, which we cannot do with Xml and XPath only. Things will be different in v8 where the cache knows about published and unpublished documents, and medias.

PR https://github.com/umbraco/Umbraco-CMS/pull/2505

So I guess this is a Request for Comment: what do ppl think about the solution?


Lee Kelleher 07 Mar 2018, 10:06:23

@zpqrtbnk Looks like a fine solution to me! :-)


Stephan 07 Mar 2018, 13:06:18

review: on any site, ideally with some content, create a view that does TypedContent(), it should work and be as fast as TypedContent() except for the first query.


Shannon Deminick 12 Mar 2018, 07:24:35

So the big difference betweeen @leekelleher's approach and this one is that this one requires a DB hit whereas Lee's doesn't since it just uses the already cached values from the config file.

I understand that the PR didn't address media, however media is a totally different story in the first place and would be much faster than the problem that exists for Content. The Content performance problem is due to the XPath query being slow. Media on the other hand already used the IdKMap (lazily) and then queries against lucene.

My concern here is that now the content lookups always require a DB hit whereas with Lee's PR: https://github.com/umbraco/Umbraco-CMS/pull/2398/files this is not required.

I would like to see what is faster. I suppose a rudimentary test would be to have a debug trace for the PopulateLocked method.

What i'm also very confused about is that Lee's PR was merged in this commit: https://github.com/umbraco/Umbraco-CMS/commit/41c96e6e2636c521511565456c77f0b4e9984420 ... but these changes are not actually part of the dev-v7 branch !? where did these changes go?


Lee Kelleher 12 Mar 2018, 10:47:38

@Shandem Very weird about the changed disappearing in the dev-v7 branch. I've been looking over my local repo to spot when it happened, (as GitHub didn't show me much), pinpointed it to this commit https://github.com/umbraco/Umbraco-CMS/commit/19ec26fef997347877126e418bc7b3c1870236df (again GitHub doesn't show the actual revert, but local GitExtensions does). I can't really comment on the Git merging strategy here - maybe @sebastiaan knows more?


Shannon Deminick 12 Mar 2018, 11:06:38

Yea @zpqrtbnk has figured it out, somewhere in branch merging it's disappeared but think the fix is still in releases (i'm not totally sure which ones)


Sebastiaan Janssen 12 Mar 2018, 12:33:43

It was a huge merge and it looks like I've made a mistake in the manual merge there, it has not been applied to 7.8 and 7.9, I suggest we apply Lee's interim fix to the next 7.9 again so that it's available in the next patch release.


Stephan 12 Mar 2018, 14:33:33

revisiting the PR - have unearthed other places where perfs would be slow - trying to make sense of it all


Stephan 15 Mar 2018, 10:14:35

Soooo I have updated the PR - we don't do pre-loads from database anymore, but the content cache pre-populates the idkMap with those id/keys it knows - and the rest comes from database. Back into review.


Shannon Deminick 16 Mar 2018, 03:26:20

@zpqrtbnk I don't think you pushed you code, see https://github.com/umbraco/Umbraco-CMS/pull/2505/commits


Stephan 16 Mar 2018, 06:18:41

pushed now


Shannon Deminick 21 Mar 2018, 00:42:42

All looks good, I've made some comments on the PR and pushed one change.

I've tested and stepped through the code:

  • the pre-populated cache only populates when PublishedContentCache.GetById is called
  • both PublishedContentCache.GetIdForKey and PublishedContentCache.GetKeyForId get their values from the pre-populated cache
  • when deleting a content item (from the recycle bin), the IdKMap is cleared for that item, similarly i've tested emptying the recycle bin and confirmed that for any cached values that were in there are cleared for any item now in the recycle bin
  • can confirm that if the xml cache doesn't contain the id/key, that it will fallback to the db and then cache it

Going to merge and mark as fixed but @zpqrtbnk please let me know if there's any changes I've done that you don't like :)


Gavin Faux 24 Mar 2018, 04:32:11

@Shandem Should the dependency on be included in UmbracoCms.Core Nuget package?

Running 7.10.0 on Windows Server 2008 with .net452 causes a YSOD due to missing dependency, we've resolved by manually installing package as we can't update to > .net46 in this environment.

We are migrating site to to Azure where I guess additional dependency not needed?


Shannon Deminick 26 Mar 2018, 02:43:20

Ack, yes, yes it should! I'll update asap


Priority: Major

Type: Feature (request)

State: Fixed

Assignee:

Difficulty:

Category:

Backwards Compatible: False

Fix Submitted:

Affected versions: 7.7.6, 7.7.7

Due in version: 7.10.0

Sprint: Sprint 81

Story Points: 3

Cycle: 9