We have moved to GitHub Issues
Created by Mark Bowser 04 Mar 2017, 00:24:39 Updated by Shannon Deminick 28 Mar 2017, 02:33:06Tags: PR
Relates to: U4-2463
Subtask of: U4-9609
My problem is that when I reindex an indexer with
supportUnpublished="true" through the examine manager, nodes that are foldered deep enough to be Level 6 get unindexed. When I save and publish those nodes, they are indexed.
I have a website with the following structure:
CONTENT --us --uk --fr --de --Shared Resources ------Blog ----------Authors ----------Articles ------------2013 ------------2014 ------------2015 ------------2016 ------------2017 ----------------01 ----------------02 --------------------Blog Post Inside of Month Folder ----------------Blog Post Outside of Month Folder
When I publish the "Blog Post Inside of Month Folder", I can see that it is indexed in my External Indexer. If I attach an event handler to the external indexer's
GatheringNodeData event, I can see that when I save and publish my node, my event handler is triggered. However, If I go to the examine manager and rebuild the external indexer, my
GatheringNodeData event handler is never called and my blog post is removed from the external indexer. This only seems to be the case when my nodes are foldered at level 6. If I take my blog posts and move them outside of their month folders, they index as expected. Also, if I open up the
/config/ExamineSettings.config file and set
supportUnpublished="true", all of my nodes will index as expected regardless of how deeply they are foldered. I have reproduced this with the ExternalIndexer and our custom BlogIndexer. I have also reproduced this with heavily foldered landing pages outside of the blog area.
Strangely enough, there are some old blog articles don't run into this problem. The deeply foldered blog articles start behaving at blog posts that were created around March 2015. When I save and publish these nodes, they start to misbehave. This whole issue started when we upgraded from umbraco 7.2.8 to 7.5.8. I've tried upgrading again to 7.5.10, but it didn't change anything.
There aren't any exceptions being thrown or interesting errors in my logs. I'm still able to reproduce this and play around with it, so let me know if anyone needs any details or help reproducing.
This issue seems reminiscent of U4-2463. I linked them. Hope that was the right thing to do.
I figured it out. The problem was in the
ReindexWithXmlEntries method. The indexing is done in pages with a max page size of 10,000 nodes. The do-while loop in the
ReindexWithXmlEntries() first calls its
getPagedXmlEntries function to go out and fetch the current page. One of the things that this
getPagedXmlEntries function made sure to do was filter out any results that were the children of unpublished parent nodes. Unfortunately, the do-while loop in the
ReindexWithXmlEntries() method would only move on to the next page if the number of filtered results from
getPagedXmlEntries was the same as the page size. For sites with more than 10,000 nodes where some of the nodes are unpublished, deeply nested nodes get unindexed.
I submitted a pull request. Let me know if I need to rework anything or submit my pull request in a different way. I'm not 100% sure I'm following the correct protocol.
I wonder if this forum post is related to this issue: https://our.umbraco.org/forum/using-umbraco-and-getting-started/84540-problem-with-examine-indexes-with-15000plus-media-items-missing-items
Proposing a slightly different approach in PR https://github.com/umbraco/Umbraco-CMS/pull/1827 (well, it's achieving the same, just differently).
Review = code review, + can be tested by temp. changing the 10000 page size to eg 2, and re-indexing.
(if ok with the 2nd PR, don't forget to close the 1st)
Backwards Compatible: True
Fix Submitted: Pull request
Affected versions: 7.5.8, 7.5.9, 7.5.10
Due in version: 7.5.12
Sprint: Sprint 55
Story Points: 1