U4-5993 - Update Examine implementation to be able to sync index storage to local temp file system

Created by Shannon Deminick 11 Dec 2014, 05:16:47 Updated by Shannon Deminick 02 Jul 2015, 11:35:42

Relates to: U4-5995

Relates to: U4-6269

Each indexer and corresponding searcher has a new optional config setting:

useTempStorage="Sync"

the value can be either: "Sync" or "LocalOnly"

Specifying "Sync" will mean that the index that is stored in ~/App_Data/TEMP/ExamineIndexes/[IndexName] will get restored to the asp.net process's local temp folder (i.e. C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\vs\7a807fb2\7d5e4942\App_Data\TEMP\ExamineIndexes[IndexName] )

Then the searchers will all operate from the local storage which is great if you are running IIS from a shared storage file system as this reduces latency issues. Each writer will then write to both the local storage and the real file system to keep them in sync. On startup the temp local index will be restored from the main storage.

When the value is "LocalOnly", this means that the index will only ever exist in local temp storage. The index will operate the same as by default but it will be stored in local temp storage. When the /bin folder changes or global.asax changes, this local folder gets cleared out which means the index will be rebuild when that happens.

It is an absolute requirement that both the indexer and it's corresponding searcher have the same values. So if useTempStorage is being used, then both indexer and searcher needs to be the same, example:

  <add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
       supportUnpublished="true"
       supportProtected="true" 
       useTempStorage="Sync"
       analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>

  <add name="InternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
       analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"
       useTempStorage="Sync"/>

Comments

Shannon Deminick 11 Dec 2014, 05:41:10

Each indexer and corresponding searcher has a new optional config setting:

useTempStorage="Sync"

the value can be either: "Sync" or "LocalOnly"

Specifying "Sync" will mean that the index that is stored in ~/App_Data/TEMP/ExamineIndexes/[IndexName] will get restored to the asp.net process's local temp folder (i.e. C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\vs\7a807fb2\7d5e4942\App_Data\TEMP\ExamineIndexes[IndexName] )

Then the searchers will all operate from the local storage which is great if you are running IIS from a shared storage file system as this reduces latency issues. Each writer will then write to both the local storage and the real file system to keep them in sync. On startup the temp local index will be restored from the main storage.

When the value is "LocalOnly", this means that the index will only ever exist in local temp storage. The index will operate the same as by default but it will be stored in local temp storage. When the /bin folder changes or global.asax changes, this local folder gets cleared out which means the index will be rebuild when that happens.


Paul Stoker 02 Feb 2015, 12:15:32

@Shandem does this fix mean that it's not necessary to use the separate library you created for Index TempStorage? https://github.com/Shazwazza/UmbracoExamine.TempStorage


Shannon Deminick 02 Feb 2015, 12:28:28

Yup, Once this version is released.


Paul Stoker 02 Feb 2015, 12:34:31

Great, thanks. We've had some index syncing issues with our production load balanced setup for some time so I'll hold out for the next release ;-)


Shannon Deminick 02 Feb 2015, 12:41:53

Would be good to hear the issues your having, otherwise I can't tell you if I can fix them ;)


Paul Stoker 02 Feb 2015, 13:17:57

Ha, I was hoping you'd say that.

Production setup

  • Umbraco 7.1.8
  • 1 file share, 2 x web servers, 1 x load balance server
  • Load balanced website files are located on a centralized file share
  • TEMP dirctory configured locally on each web server
  • Custom Examine Searcher/Indexer used for searching website

Issue

Repeatedly refreshing the search results renders different results depending on which server it hits

Cause

Possibly the index files are not synchronizing and only the master web server has the correct files.

Do you think your fix (U4-5993) will clear these up?


Shannon Deminick 03 Feb 2015, 02:59:37

The master webserver is of course the one that writes to the index initially. Then distributed calls are made to the other server's taking part in the load balancing scenario. Each server will then write to it's own index if it needs to based on the information found in the distributed call.

So no, i don't think this will fix your issue since it's really not any different from what you already have. Each server will always have it's own index since multiple servers cannot write to a single index.

There will always be some latency writing to index files, this is even the case on a single web server due to the a-synchronicity of lucene and Examine. When taking in to account load balancing with distributed http calls, this will also add to some latency between servers. This latency shouldn't be very high however.

If however your problem is that your master server has a valid index but the non-master server(s) become out of sync even after taking in to account a small amount of latency, then the problem is that distributed calls are not working correctly and/or that the Umbraco distributed call being made/received has an issue.


Shannon Deminick 03 Feb 2015, 03:00:49

On another note, this feature might be of interest to you as well: http://issues.umbraco.org/issue/U4-5995


Paul Stoker 03 Feb 2015, 11:55:36

Thanks Shannon.

Okay, thinking about it again, the index files could potentially be different depending on how and when Lucene.Net writes/optimizes, even though there contents could be the same. Just condensed into less files.

I'll do some more investigation. It could be that they were out of sync before the deployment was properly configured. I will just copy the indexes from the master web server to the slaves and see if it occurs again.


Paul Stoker 03 Feb 2015, 14:13:02

One more question Shannon, when you say 'i don't think this will fix your issue since it's really not any different from what you already have.' Are you assuming I was using your TempStorage repo on Github? I'm not currently, so do I need to do this?


Shannon Deminick 03 Feb 2015, 22:14:47

Are you saying your index 'files' are out of sync or the actual data is out of sync? two totally different things. Lucene is 'write once' and since each server is in control of it's own indexes, the files can easily get out of sync which can be due to quite a few things.

What I meant by

it's really not any different from what you already have

You said that you are running your indexes locally on each server by setting up vdirs in IIS locally to each server which is where it's lucene index is stored. Is that correct? If your indexes are stored locally on each server then running TempStorage won't make any difference, TempStorage is just a way to force your indexes to run locally on each server.


Paul Stoker 04 Feb 2015, 09:22:53

Hi Shannon, the index files are out of sync, and so is the data since the following function yields different results each time when called from different nodes in the load balanced pair:

Website.Search(searchQuery, searchProvider: "WebsiteSearcher").Where(page => !page.GetPropertyValue("excludeFromSearch")).ToList()

Yes, the indexes are setup locally with virtual directories too.

If you can advise on anything I'd be very happy ;-)


Mariusz Matysek 04 Feb 2015, 15:14:53

When are you planning to release this fix?


Mariusz Matysek 04 Feb 2015, 17:30:19

Ok I fount this in the release 7.2.1. Just have one question. Will this work fine with website configured on azure with scaled machines? I can imagine the indexes can go out of sync but I don't think this will be a problem in my case since the case when more than 1 machine will be serving the site will be an exception. But wouldn't there be locks on the index because of the fact that 2 processes can write at the same time to the index file stored on shared folder?

Also with the option "Sync", will the index be restored after the app goes asleep and then awakes on first request?


Shannon Deminick 04 Feb 2015, 23:05:35

@Carael : The version found in the core previous to the one that will be released with 7.2.2. will NOT work properly with load balancing using a shared file system - like azure websites. The initial release was in 7.1.9:

http://issues.umbraco.org/issue/U4-5789

However, if you use https://github.com/Shazwazza/UmbracoExamine.TempStorage with sync turned to off, it will work for azure websites but if the index doesn't exist it will be rebuilt on startup, similarly if you change DLLs or your global.asax the temp files will be cleared and your indexes will be rebuilt on startup.

With the 7.2.2 release there's the option to sync or not to sync, but you also have this option too: http://issues.umbraco.org/issue/U4-5995 so that you can have sync turned on when using a shared file system so long as your specify to store in a machine name specific folder.

@stokedout In order to figure out what is going on we'll need to find out when/why/how they get out of sync. You have any errors/warnings in logs about distributed calls, etc... ? Basically we'll need to be able to replicate the issue to fix it.


Paul Stoker 05 Feb 2015, 09:10:32

@Shandem I'll up the log priority to 'ALL' and monitor the logs. The way I tested it yesterday was by adding a new page in our authoring environment then used Umbraco Courier to push it live. I then used the site search in production which intermittently finds the page I just published. Hitting the new page directly always renders the page.


Mariusz Matysek 05 Feb 2015, 09:46:03

@Shandem Thanks for the answer. Just have one other question - will the solution also support custom indexers of type Examine.LuceneEngine.Providers.SimpleDataIndexer?


Paul Stoker 05 Feb 2015, 18:37:36

@Shandem thanks for your help, but apologies for wasting your time. It appears I hadn't followed the load balancing setup guide correctly. It's now working as it should


Matthew 05 Feb 2015, 21:20:43

@Shandem I'm trying to add UmbracoExamine.TempStorage to my application, but am not sure where to put the source files. Could I create a folder in App_Code and put it there?


Shannon Deminick 05 Feb 2015, 22:36:36

@kirschner@magnet.fsu.edu Source files? Probably better just to clone and build the solution from https://github.com/Shazwazza/UmbracoExamine.TempStorage and drop in the DLL but i suppose you might be able to just drop them all into app_code. That is a very good point regarding SimpleDataIndexer! the Umbraco.TempStorage library and this issue's implementation are specifically for Umbraco indexers, I'll re-open this issue and look to update the Examine core with better support for this so that it will be easy to setup other indexers. In the meantime, you'd have to create your own implementation.


Mariusz Matysek 10 Feb 2015, 09:09:34

Looking forward for the implementation of SimpleDataIndexer and the final release of version 7.2.2 then. Thanks.


Shannon Deminick 13 Feb 2015, 04:51:22

There's a new version of Examine up: https://www.nuget.org/packages/Examine/0.1.60.2941

which supports the tokenized paths. the notion of 'temp storage' isn't actually ever going to belong in the Examine core because the Examine core knows nothing about the web, it is just a Lucene engine.

So to use SimpleDataIndexer with the idea of 'temp storage', we'll either need to build that into Umbraco (which really doesn't make sense because it isn't much to do with Umbraco), or document the tools you'd need to achieve it. I'm still not sure which to do because as far as Umbraco professional support goes, if someone is using SimpleDataIndexer and wants 'temp storage' option, they'll ask support to do it anyways. However, if you want to use SimpleDataIndexer + temp storage outside of umbraco, then you'd definitely need to do it yourself.


Shannon Deminick 13 Feb 2015, 05:11:06

Am closing this issue, for using SimpleDataIndexer + temp storage, that will need to be a custom implementation until further notice... perhaps I can create another standalone project for that or we can put something in Umbraco core but it won't be for 7.2.2.


Priority: Normal

Type: Task

State: Fixed

Assignee: Shannon Deminick

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions:

Due in version: 7.2.5

Sprint:

Story Points:

Cycle: