U4-9487 - Optimize initialization of Xml nodes

Created by Stephan 03 Feb 2017, 18:27:11 Updated by Jeroen Breuer 31 Mar 2017, 09:36:16

From a discussion on Slack triggered by https://our.umbraco.org/forum/using-umbraco-and-getting-started/83657-slow-retrieval-with-many-children

here's the thing: when you initialize a content, just to get its properties, you indeed should not care about its children, but you still need to iterate over all xml children to find all the property data nodes (those that don't have the isDoc attribute) - meaning that yes, you iterate over all children too, and so a node with many children takes more time to initialize.

there's one way to totally improve the situation (as in, by my tests, the number of children does not have any impact and a node with many children initializes way faster) - but I'm not sure to which extend it is safe

the idea is: in the Xml document, the property data nodes come first, and then come the children content nodes, so you can stop looking at children as soon as you've hit a child which is not a property data node. tested, works, tremendous perfs, you'll like it, it's great, and by the way, absolutely great

however we have to ensure that the assumption that property data nodes always come first is true. I do believe it is, but...

1 Attachments

Comments

Stephan 03 Feb 2017, 18:29:51

PR https://github.com/umbraco/Umbraco-CMS/pull/1728

Do NOT merge this PR now, first we need to be absolutely sure it's not breaking !!!!


Dave Woestenborghs 04 Feb 2017, 09:35:30

@zpqrtbnk just checked all my sites higher then 7.3.0... in all of them props come first and then child docs.

dave


Dave Woestenborghs 04 Feb 2017, 09:47:21

@zpqrtbnk Don't know it it's possible, but maybe a xpath selector can be used to get the children. Something like this "root//node()[not (@isDoc)]" If evaluate that against the umbraco.config i only see property nodes.

If that would be possible the order in the xml doesn't matter


Stephan 04 Feb 2017, 09:58:54

@dawoe it is exactly how it worked, and it means that the query goes over ''every'' child node to figure out whether it has the isDoc attribute or not = expansive. proposal is to get rid of the xpath, manually iterate, and stop on the first child node which is not a property data node (ie which has the isDoc attribute), assuming we're 100% sure that property data nodes and children nodes never mix.

but... we (Core) do have total control of how the xml is generated, so we're going to do a code review of Core to ensure that the order is indeed respected, and then we'll be able to merge the PR.


Dave Woestenborghs 04 Feb 2017, 10:04:27

@zpqrtbnk must be that it's saturday morning...haven't read the commmit in the pr correctly.


Shannon Deminick 09 Feb 2017, 05:51:29

I decided to benchmark the new Xml Initialization updates from @zpqrtbnk , with the original way that we initialize a published content item with it's properties and the new (enhanced) way. The tests are with nodes that have 10, 100, 1000, and 10000 other child content nodes. I think the benchmarks show how good this update is:

I also enhanced the enhancement here so there's less allocations: https://github.com/Umbraco/Umbraco-CMS/commit/d0cbeb4a0aa016cb0340ced0e12fff97421c6f00

the benchmark code updates is here: 58c5618d301c5d6434c3f47a8f65dc9178c4d5fa to do that I made the method that performs the init static with lots of out params, not the prettiest but allows us to test/benchmark.


Shannon Deminick 09 Feb 2017, 05:55:10

I can confirm by looking at the code that the xml that is generated always contains the properties first, I can't see how it can be possible for the property xml to ever be pushed after the children. I'll merge this in.


Jeroen Breuer 31 Mar 2017, 09:36:16

Not sure if people realized it, but this is also a huge performance boost for preview. When you preview it will also preview all child nodes. Doing this on the root node of a website with 10.000 content nodes could take minutes to load. Now it's only a few seconds. Our content editors love it! Especially since we always need to [preview all content|http://24days.in/umbraco-cms/2016/umbraco-edge-case-stories/#preview].


Priority: Normal

Type: Bug

State: Fixed

Assignee:

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions:

Due in version: 7.5.10

Sprint: Sprint 52

Story Points: 1

Cycle: