U4-10735 - Examine update to better manage appdomain shutdowns

Created by Shannon Deminick 06 Dec 2017, 13:00:45 Updated by Shannon Deminick 14 Dec 2017, 07:51:34

Subtask of: U4-9609

By default Examine has it's own sub-routines execute when an appdomain is shutdown. During this time it will wait a little bit if there are items currently being added or deleted from the queue and then it will block the shutdown if the queue has items in it which could block for a very long time if there queue is really full. This could happen if the index is rebuilding and there is a TON of data and then the appdomain decides to shutdown in the middle of this. This can have some problems with overlapping appdomains and lucene file locks which may prevent the new appdomains from starting up properly.

A new examine version allows the shutdown sequence to be replaced and controlled externally. Umbraco has been updated to manage this using it's own MainDom which is the appdomain manager that also manages the content cache.

The new examine version also allows for globally configuring a Lucene lock factory. By default Lucene/Examine will use the NativeFSLockFactory which will use OS level locks for the index lock. This can also cause some issues in a website in cases where the appdomain is terminated and the OS still hangs on to the lock. In some rare cases the only way to clear the lock is to fully reset IIS which is not ideal. Umbraco has been updated to configure Examine to use a SimpleFSLockFactory which is purely based on whether the write.lock file exists or not and does not use OS level locks.

The Examine dashboard has been updated to check if the index is locked - which it should never be - and in that case when pressing rebuild will forcefully unlock it and rebuild it.

Comments

Shannon Deminick 06 Dec 2017, 14:33:11

PR: https://github.com/umbraco/Umbraco-CMS/pull/2345


Shannon Deminick 06 Dec 2017, 14:39:34

Here's the Examine update: https://github.com/Shazwazza/Examine/commit/695e9cc34b1670acc8b2f7a0f59400993d4b23a0


Shannon Deminick 06 Dec 2017, 14:46:46

For testing:

  • Ensure that the site starts up normally and verify that normal searching and indexing works along with rebuilding in the examine dashboard
  • Shut down your app, then go to src\Umbraco.Web.UI\App_Data\TEMP\ExamineIndexes\Internal\Index and create a file called: write.lock - this will now trick lucene into thinking that the index is locked ** Start your site, go to the examine dashboard and it will show that the index cannot be read, but now you can rebuild it since it will force unlock it
  • Have a look at the files in src\Umbraco.Web.UI\App_Data\TEMP\ExamineIndexes\Internal\Index, then bump your web.config which will cause the appdomain to shutdown, you'll notice that the write.lock file is deleted
  • We now log examine shutting down events, you'll see in your log something like:
 2017-12-06 15:42:01,594 [P2932/D4/T33] INFO  Umbraco.Web.ExamineStartup - Examine shutting down
 2017-12-06 15:42:01,612 [P2932/D4/T33] INFO  Umbraco.Web.ExamineStartup - Complete (took 18ms)
  • You'll also see this on startup:
2017-12-06 15:45:11,372 [P2932/D8/T57] DEBUG Umbraco.Web.ExamineStartup - Examine shutdown registered with MainDom


Shannon Deminick 06 Dec 2017, 14:47:36

Currently the examine release is a beta so if all is good i'll release the full version


Robert Copilau 13 Dec 2017, 09:49:59

Test results:

  • Everything worked
  • The locks were already there, most likely because of the previous step ** Examine dashboard displayed that the indexes cannot be read and I was able to rebuild due to force unclock
  • The locks are being deleted when web.config is bumped
  • Log entries are present

Merging :)


Shannon Deminick 14 Dec 2017, 04:23:42

I need to do one more change before we release this since it's currently relying on an Examine beta, so i need to release the final version and update core.


Shannon Deminick 14 Dec 2017, 04:32:02

I also need to change some more core stuff, stay tuned.


Shannon Deminick 14 Dec 2017, 07:51:21

@Rrobertcopilau also you need to always make sure that the Due In version is set please


Priority: Normal

Type: Bug

State: Fixed

Assignee:

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions:

Due in version: 7.7.8

Sprint: Sprint 74

Story Points: 3

Cycle: