U4-9395 - When rebuilding content indexes that don't support unpublished content and member indexes, use the cmsContentXml table as the data source

Created by Shannon Deminick 17 Jan 2017, 06:18:00 Updated by Matt Harding 28 Mar 2017, 11:33:06

Tags: Unscheduled

Relates to: U4-9371

Relates to: U4-9393

We already do this for media since it is much much faster so we should be able to do this for both content and members since both published content and member data exist serialized in the cmsContentXml table.

initial benchmarks of this show that with 8500 items in the index, the current way using the normal IContentService lookups takes: 6014ms, when changed to use the cmsContentXml table it takes 516ms which is 92% faster.

Comments

Shannon Deminick 17 Jan 2017, 06:46:15

Similarly, for 10001 members, using the fast lookup takes 559ms and the current slow lookup takes 9773ms = 94% faster


Shannon Deminick 17 Jan 2017, 06:53:40

For testing follow these instructions: http://issues.umbraco.org/issue/U4-9371#comment=67-33971

Then you can test the old way by adding this attribute to your indexer: disableXmlDocLookup=true, verify that the amount of data that is in the index is the same the new way and the old way. The default is disableXmlDocLookup=false which will be the case when this attribute is not specified.


Shannon Deminick 17 Jan 2017, 06:58:35

PR https://github.com/umbraco/Umbraco-CMS/pull/1692


Sebastiaan Janssen 17 Jan 2017, 10:45:53

That.. is SO much faster!


Sebastiaan Janssen 17 Jan 2017, 10:46:32

I went down from a site using 8 minutes to build a member index (80k members) to 1.5 minutes.


Shannon Deminick 17 Jan 2017, 12:41:28

YES!!!! I know you've prob checked but want to make sure that only published content ends up in the external index, it defo should work but want to be sure i didn't re revert the unfix that i did last time with descendants :P

This release is going to be so much better for examine and azure and we can remove that item from the wall :)


Sebastiaan Janssen 17 Jan 2017, 15:51:38

I found while re-indexing the Our Umbraco data that we needed to do some extra checks, so the PR has been re-opened with these changes: https://github.com/umbraco/Umbraco-CMS/pull/1694/commits/931db6f0b177cd381d78dd9005b13c971dffc2a3

@Shandem can you check these additional commits?

I learned that for the 120k members in my local Our copy that it took about 14 minutes for the full reindex to run 2017-01-17 15:18:47,243 [P10440/D12/T15] INFO UmbracoExamine.DataServices.UmbracoLogService - PerformIndexAll - Start data queries - member, Provider=InternalMemberIndexer, NodeId=-1 2017-01-17 15:32:16,706 [P10440/D12/T15] INFO UmbracoExamine.DataServices.UmbracoLogService - PerformIndexAll - End data queries - member, took 809463ms, Provider=InternalMemberIndexer, NodeId=-1

Our has some more custom properties than the other test site I used, so it's not as fast as that one and I haven't tried running the reindex before the upgrade (I'll do that to see how fast/slow it goes).


Shannon Deminick 19 Jan 2017, 01:55:32

@sebastiaan I've re-opend for your review. I've committed to the dev-v7 branch here https://github.com/umbraco/Umbraco-CMS/commit/db414e8045d0a455dae568a4f67675eab6c3ccef

If you can please have a look and then test the scenarios:

  • Rebuilding the XML content cache works (i.e. delete the umbraco.config file)
  • Rebuilding the internal index works (i.e. all content and media is in there)
  • Rebuilding the member index (i.e. all member data is in there)
  • Rebuilding the external index when an ancestor is not published but it's descendants are - the ancestor and descendants should not be in the index


Sebastiaan Janssen 19 Jan 2017, 09:41:07

All checked, all good now!


Matt Harding 28 Mar 2017, 11:33:06

Just for info, this change also resolves http://issues.umbraco.org/issue/U4-9677 which caused external index rebuilds not to include published items that had a newer unpublished version.


Priority: Normal

Type: Bug

State: Fixed

Assignee:

Difficulty:

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions:

Due in version: 7.5.8

Sprint: Sprint 50

Story Points:

Cycle: