U4-7643 - DatabaseServerMessenger.cs causes lucene indexes to rebuild on app startup

Created by Matt Brailsford 04 Jan 2016, 17:07:28 Updated by Simon Dingley 22 Jun 2016, 09:14:16

Is duplicated by: U4-7506

Relates to: U4-8627

In DatabaseServerMessenger.cs Boot method, which runs during app startup it runs 3 methods

ReadLastSynced(); EnsureInstructions(); Initialize();

Read last synced reads the last synced id from disk, yet ensure instructions resets it to -1 if there is no instructions therefore making the initialize method think that there is no last synced ID and this causes a resync. This resync then causes all lucene indexes to rebuild.

The problem is it's that it's possible that nothing has changed between the app recycles so there is nothing to sync, yet a sync takes place anyway triggering a re index. Unfortunately I have thousands of nodes in my install and so a reindex is time consuming and costly.

There should be a way to determine that there is nothing to sync BUT that it's already in sync so doesn't need to trigger a resync.

Comments

Matt Brailsford 04 Jan 2016, 17:10:59

Forgot to add, this is if you set distributedCalls to disabled in umbracoSettings.config


Matt Brailsford 04 Jan 2016, 17:15:23

(not sure what happens if it's set to true)


Shannon Deminick 04 Jan 2016, 17:35:38

Will have a look tomorrow. You said that EnsureInstructions resets it to -1 ... what did ReadLastSynced do ? Is this a brand new server? (i.e. brand new could mean a new AppDomainAppId == new IIS website instance, but could be using a shared file store)


Shannon Deminick 04 Jan 2016, 17:36:52

I'm assuming this is happening with the azure site swap like the other issue?


Matt Brailsford 04 Jan 2016, 17:39:44

RE azure, no, it's just local (different project but dealing with lots of data in this one too).

ReadLastSynced did pull back a valid ID, but then EnsureInstructions sets it to -1 cos nothing new has been added. I'm just deving locally which requires the odd app pool recycle but this then triggers the full reindex.


Anders Bjerner 04 Jan 2016, 18:13:06

We had a similar issue for a solution running 7.2.8 with roughly 5k-6k nodes.

When updated to 7.3.1, the stage server (Azure VM + Azure DB) worked fine, but on my local computer (Win 7 + IIS + same Azure DB), Umbraco would start rebuilding both the content cache and Examine indexes after each build in Visual Studio (probably each time the IIS app was restarted). This would take quite some time given the amount of nodes.

I couldn't figure out what was going on. Entries like the following would show up after each app restart:

 2015-12-07 09:19:04,751 [P720/D3/T230] WARN  Umbraco.Core.Sync.DatabaseServerMessenger - No last synced Id found, this generally means this is a new server/install. The server will rebuild its caches and indexes and then adjust it's last synced id to the latest found in the database and will start maintaining cache updates based on that id
 2015-12-07 09:19:04,901 [P720/D3/T230] INFO  umbraco.content - Load Xml from file...
 2015-12-07 09:19:11,566 [P720/D3/T230] INFO  umbraco.content - Loaded Xml from file.
 2015-12-07 09:19:11,869 [P720/D3/T230] INFO  umbraco.content - Loading content from database...
 2015-12-07 09:28:17,104 [P720/D3/T320] INFO  umbraco.content - Save Xml to file...
 2015-12-07 09:28:20,560 [P720/D3/T252] INFO  umbraco.content - Saved Xml to file.

Eventually I tried to do a fresh pull from the project repository. This have somewhat fixed the issue, as a content cache and Examine rebuild now only happens occasionally. I have been working on the solution most of the day, but according to the log, there has only been a single rebuild today (taking roughly 8 minutes for just the content cache):

 2016-01-04 09:03:41,974 [P18532/D2/T1] WARN  Umbraco.Core.Sync.DatabaseServerMessenger - No last synced Id found, this generally means this is a new server/install. The server will rebuild its caches and indexes and then adjust it's last synced id to the latest found in the database and will start maintaining cache updates based on that id
 2016-01-04 09:03:42,141 [P18532/D2/T1] INFO  umbraco.content - Load Xml from file...
 2016-01-04 09:03:45,455 [P18532/D2/T1] INFO  umbraco.content - Loaded Xml from file.
 2016-01-04 09:03:45,558 [P18532/D2/T1] INFO  umbraco.content - Loading content from database...
 2016-01-04 09:11:10,938 [P18532/D2/T17] INFO  umbraco.content - Save Xml to file...
 2016-01-04 09:11:13,716 [P18532/D2/T29] INFO  umbraco.content - Saved Xml to file.

I'm not sure whether this is exactly the same issue as Matt's, but it started after upgrading to 7.3.x.


Shannon Deminick 04 Jan 2016, 18:26:28

Ok, will try to replicate this somehow this week. We'll be releasing another 7.3 version so we'll include the fix there too.


Shannon Deminick 05 Jan 2016, 12:25:18

Hi,

The reason why EnsureInstructions resets the last synced Id back to -1 is because it did not find the stored last synced id (from disk) in the database instructions table - therefore it assumes that something has gone wrong and it needs to 'cold boot' == rebuild all caches as if it is a new server.

So I'm trying to understand how this can occur. Do you have steps to replicate this?

I can see one way this can occur:

  • The server has stored a last synced Id (lets say 1234) after it syncs with the db
  • Someone deletes all of the instructions

Or

  • The server has stored a last synced Id (lets say 1234) after it syncs with the db
  • PruneOldInstructions is called which prunes old instructions which includes the last synced id 1234, this would be the case for servers with more than 2 day old instructions (the default) ... this last one is of course an issue which needs fixing

Do you think it's one of these scenarios causing your issues ?


Matt Brailsford 05 Jan 2016, 13:15:01

It's possible it could be the later as I did the import before the Xmas holidays and it's when I came back to it 2 weeks later that it was causing me problems.


Shannon Deminick 05 Jan 2016, 13:17:03

Yup most likely, fixing this now.


Shannon Deminick 05 Jan 2016, 13:28:26

PR for review: https://github.com/umbraco/Umbraco-CMS/pull/989

So here's the gist:

  • We will always keep the latest record in the db when PruneOldInstructions is called
  • All active servers will have synced to this latest record therefore EnsureInstructions wont reset it to -1 since it will exist in the table

Something to NOTE:

  • This will always work for a single server install
  • For a Load Balanced environment: If one of the servers has been offline for more than DaysToRetainInstructions (2 days by default), then it's last synced Id will no longer be available and it will need to cold boot = refresh/rebuild all caches.


Kevin Giszewski 05 Jan 2016, 13:46:43

I've noticed this behavior recently and I think this will be very helpful +1!


Tommy Enger 12 Jan 2016, 14:34:25

Glad to see this being fixed soon. Meanwhile, is it possible to set the value of DaysToRetainInstructions to something else? I see the value is hard-coded in DatabaseServerMessengerOptions.cs, but I don't see how to override it.


Shannon Deminick 12 Jan 2016, 15:03:43

You can change this on startup. You'd create an ApplicationEventHandler, override OnApplicationStarting - this allows you to modify resolvers before they are locked. Then change the current db server messenger... however there are callbacks that you need to maintain. For example, this is how the WebBootManager enables the BatchedDatabaseServerMessenger:

ServerMessengerResolver.Current.SetServerMessenger(new BatchedDatabaseServerMessenger(
                ApplicationContext,
                true,
                //Default options for web including the required callbacks to build caches
                new DatabaseServerMessengerOptions
                {
                    //These callbacks will be executed if the server has not been synced
                    // (i.e. it is a new server or the lastsynced.txt file has been removed)
                    InitializingCallbacks = new Action[]
                    {
                        //rebuild the xml cache file if the server is not synced
                        () => global::umbraco.content.Instance.RefreshContentFromDatabase(),
                        //rebuild indexes if the server is not synced
                        // NOTE: This will rebuild ALL indexes including the members, if developers want to target specific 
                        // indexes then they can adjust this logic themselves.                        
                        rebuildIndexes
                    }
                }));

You need to keep all of the callbacks specified there but I've just realized that this will not be possible to do without some hackery because you cannot resolve a value from a current resolver before resolution is locked. For example if you tried to do this:

var current = ServerMessengerResolver.Current.Messenger;

you would get an exception. Further more even if you could do that, you cannot get direct access to the currently assigned options. You're only real option is to copy the code that is found in the WebBootManager rebuildIndexes callback - however you will have reduced functionality (explained further down).

You could do this:

ServerMessengerResolver.Current.SetServerMessenger(new BatchedDatabaseServerMessenger(
                    ApplicationContext,
                    true,
                    //Default options for web including the required callbacks to build caches
                    new DatabaseServerMessengerOptions
                    {
                        //TODO: Put your custom days here
                        DaysToRetainInstructions = 1234,
                        //These callbacks will be executed if the server has not been synced
                        // (i.e. it is a new server or the lastsynced.txt file has been removed)
                        InitializingCallbacks = new Action[]
                        {
                            //rebuild the xml cache file if the server is not synced
                            () => global::umbraco.content.Instance.RefreshContentFromDatabase(),
                            
                            //rebuild indexes if the server is not synced
                            // NOTE: This will rebuild ALL indexes including the members, if developers want to target specific 
                            // indexes then they can adjust this logic themselves.                        
                            () =>
                            {
                                //If the developer has explicitly opted out of rebuilding indexes on startup then we 
                                // should adhere to that and not do it, this means that if they are load balancing things will be
                                // out of sync if they are auto-scaling but there's not much we can do about that.
                                if (ExamineSettings.Instance.RebuildOnAppStart == false) return;
                                //rebuild them all
                                ExamineManager.Instance.RebuildIndex();
                            }
                        }
                    }));

The only issue with doing this is that you could potentially double rebuild indexes on startup because when the server is detected as a 'cold boot', it will just rebuild all indexes... if the server is brand new and has no indexes at all, the Examine by default will rebuild on startup and then Umbraco will detect as a cold boot and rebuild again. In the Core we work around this issue but you wouldn't be able to do the same work around in your code. The alternative would be to set RebuildOnAppStart=false which will mean Examine doesn't rebuild anything on startup and then remove this check from the above callback which means that they'd only be rebuilt if the server is detected as a cold boot.


Priority: Normal

Type: Bug

State: Fixed

Assignee:

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions: 7.3.3, 7.3.4

Due in version: 7.3.5

Sprint: Sprint 6

Story Points:

Cycle: