U4-581 - Automatic publishing not working in load balanced setup

Created by Sebastiaan Janssen 19 Aug 2012, 14:54:18 Updated by Glenn Lieberman 27 Jun 2014, 18:59:32

Hi

We have configured Umbraco for our client in a load balanced environment consisting of one database and two webservers. Each webserver has one website which is accessed via a shared URL and the editors access the Umbraco backend by a separate 'admin' domain which is run of the first of the web servers.

Since being setup in this way automatic publishing no longer functions, and after some investigation it would appear that we are not the only ones experiencing this issue (http://our.umbraco.org/forum/getting-started/installing-umbraco/11995-Auto-publish-doesn%27t-work-in-load-balanced-environments)

http://umbraco.codeplex.com/workitem/30204 suggest that this issue has been fixed but we are still unable to get it to work in a load balanced environment

Regards

Ben

''Originally created on CodePlex by [Benedfit|http://www.codeplex.com/site/users/view/Benedfit]'' on 6/15/2012 12:47:09 PM [Codeplex ID: 30849 - Codeplex Votes: 1]

Comments

CH 07 Nov 2012, 15:11:12

we had been using 4.7 in a similar loadbalanced setup, the Publish At function would not work as well after upgrading to 4.7.1 and then 4.7.2 we noticed the publishing worked again but erratically - sometimes content is missing, or it is moved to another property


CH 09 Nov 2012, 15:50:49

also when Umbraco is autopublishing a node on the same loadbalanced setup we have hundreds of umbracoLog errors stating "Error adding to SiteMapProvider in loadNodes(): System.InvalidOperationException: Multiple nodes with the same URL '/#' were found. XmlSiteMapProvider requires that sitemap nodes have unique URLs. at System.Web.StaticSiteMapProvider.AddNode(SiteMapNode node, SiteMapNode parentNode) at umbraco.presentation.nodeFactory.UmbracoSiteMapProvider.loadNodes(String parentId, SiteMapNode parentNode)"


CH 13 Nov 2012, 15:18:04

as pointed out in the comments of http://umbraco.codeplex.com/workitem/30204 it might be a "red herring" but the SiteMapProvider errors occur (both on single and dualserver setups) when the PublishingService tries to publish a scheduled node while looking at the source code I found out that the SiteMapProvider is used by the PublishingService in the method PublishNodeDo and fails because the current HttpContext is null the PublishingService is called by a Timer on the RequestModule, the HttpContext is then passed as an argument but never used nor assigned (and probably UmbracoContext is also missing) so all methods referring to the current HttpRequest like library > NiceUrlFetch(int nodeID, int startNodeDepth, bool forceDomain) fail I got rid of these errors by adding this in the PublishingService > CheckPublishing method

                // Set umbraco context
                if (HttpContext.Current==null)
                    HttpContext.Current = (HttpContext)sender;
                if(UmbracoContext.Current==null)
                    UmbracoContext.Current = new UmbracoContext(HttpContext.Current);


Dan Booth 21 May 2014, 14:22:50

I can confirm this doesn't work in 6.1.6 and, I'm pretty sure, any other version.

From my tests, the following happens:

  1. You set a node to publish on the master server
  2. When the time comes for publication the node is published, and a distributed call informs the other servers, and the node gets published BUT the cache is not refreshed. This leads to the "Oops: this document is published but is not in the cache (internal error)" - see U4-3145
  3. The node is then in limbo, as it's status is "Published" but it has no URL etc. and isn't in the XML cache

I have found that if you run umbraco.library.RefreshContent() on each of the distributed servers then this seems to fix this issue.

There is also a related issue with scheduled publishing whereby the HttpContext.Current is not available - and this can also cause all kinds of issues, too. See U4-2724

I was looking at Umbraco source code for publishing service - https://github.com/umbraco/Umbraco-CMS/blob/02a4c2f4fc908e08d2cf7d1a0590c7cb719eef40/src/Umbraco.Web/umbraco.presentation/publishingService.cs

Is this is what is used? Because it is still seems to be using the old Document API rather than ContentService? Shouldn't this be using ContentService? I've found if I manually call Services.ContentService.SaveAndPublish(content, content.WriterId, true) then this seems to work in a distributed environment, so would imagine it could solve some issues?


Shannon Deminick 22 May 2014, 00:59:48

Thanks Dan will get on to this on Monday.


Dan Booth 19 Jun 2014, 09:12:10

What I did as a quick hack, which seems to have worked, is to download the source of U6.1.6 and rip out the first part of the code in CheckPublishing() in the publishingService.cs code. I then created my own controller which then implements the same basic logic, but used the the ContentService. I then call this controller with an HTTP request from a console app that runs every minute using Windows Task Scheduler. It's hacky, but does work very reliably - even in a load-balanced environment (the scheduler only runs on one of the four servers, btw). So i think the key is to replace the old Document API with ContentService and this should work much better.


Shannon Deminick 19 Jun 2014, 10:14:59

The main problem is that the publishing is trying to take place on a background (non web) thread, but in order for a lot of cogs in the works to turn, it needs to be in a web context. One day we can make a lot more things work without a web context but for now that is how it is. So, just like you say, this scheduled task needs to be done in a request - it's kind of weird that it's not that way since all other 'scheduled tasks' are web requests.

The other problem with this is that all servers in the LB setup are running these tasks which is incorrect for majority of LB setups considering the only supported way is having a 'master' server. But if all servers are running these tasks then we have a problem with that since there's no longer a 'master'. So that needs to be dealt with as well.


Shannon Deminick 19 Jun 2014, 11:08:53

Ok, so am making some good progress with this, but this does actually mean we need to add yet another config item to the umbracoSettings. We need to add a base url for the 'master' server so that we know how to call into the new scheduled publishing controller when distributed call is enabled.

This poses another problem too - how would the 'non master' server know not to process scheduled tasks? In a replicated environment, this is trivial: Each node has it's own umbracoSettings.config and if distributed call is enabled but this new base url is missing then it will not attempt to run scheduled tasks

However, when running LB in a file share environment (i.e. SAN, Azure websites, etc...) then each node shares the same umbracoSettings.config. In the case of non-azure, then there might actually be a master server configured, in that case the only way I can see this working is having another config option next to the base url that specifies the machine name that is the master server, then only that machine will execute the tasks. However for Azure websites where you cannot specify a master server, we will just have to specify a base url that is accessible by all instances and just deal with the fact that potentially more than once instance will publish/unpublish the node at the same time.

Hope this is all making sense...


Sebastiaan Janssen 19 Jun 2014, 11:30:46

@Shandem Can we utilize the self-registering umbracoServer table and always pick the first one in the list? Or am I oversimplifying too much now? :)


Dan Booth 19 Jun 2014, 11:38:34

Could you add an attribute to server to indicate it is master? eg.

<servers>
  <server master='true'>serverone.com</server>
  <server>servertwo.com</server>
  <server>serverthree.com</server>
</servers>

Or, maybe, as Seb says, just go with a convention that the first one in the list is the master? Maybe fallback to this if none has master='true' or whatever.


Shannon Deminick 20 Jun 2014, 00:02:37

Seb - no the underlying problem of the self registering table is that a server doesn't know what it's hostname/baseurl should be. We are also not actively using this table or a provider to use it, that was a POC that never eventuated.

Dan - exactly what I built in - the first one in the list s considered the master.

So with these changes, the first server entry can contain the following entries:

server1.mysite.com

The latter 2 are optional and are only required for the first server entry, and depend on how your LB env is setup. The explain these settings: when configuring LB with a 'master' server, each server in the env needs to know if it is the master and it will be the one that runs scheduled tasks. The only way a server can know if it's the master is if it can detect that it's registration is the first in the list but because a server cannot detect what host name it is configured for it needs to detect this another way: server name or app id. The server name in most cases will suffice since you would normally not be load balancing a site on the same machine ... but you might (i do for testing). If that is the case, you'd enter the app Id which is equal to: HttpRuntime.AppDomainAppId which is the unique id of the app and doesn't change across app domains (http://msdn.microsoft.com/en-us/library/system.web.httpruntime.appdomainappid(v=vs.110).aspx), it basically ends up being the id of the IIS site. You would also never use both serverName and appId together (appId will supersede serverName).

When running LB on Azure websites (when we make that possible), there is no master server by default so these settings aren't needed but it does mean that every server will be executing scheduled tasks which is not ideal. It is possible to get the appId or serverName on WAWS so if people do use WAWS it will be recommended to try to assign a default server where possible.


Shannon Deminick 20 Jun 2014, 04:45:39

Ok, this is all working now. It's worth reading over the docs I've added to the distributedCall section:

https://github.com/umbraco/Umbraco-CMS/commit/e8f7f77bb6182168ca95fbc8b981a01ee04b4717#diff-6369ba937a67b200cfacbadf14fa2b99R226

I'll also update the LB docs soon too to include this detail. Ideally, if you are using a replicated LB env, then you should ensure your master is the FIRST one listed you should also ensure that you have either serverName or appId specified for all servers (at the very least do this for the master). If you don't do it for all servers, then chances are the keepalive/ping service will not work properly - this has been the case to-date but with these updates it should also work.


Dan Booth 20 Jun 2014, 09:38:37

That's great work, Shannon, and all makes sense.


Shannon Deminick 25 Jun 2014, 02:15:25

I've updated the docs here:

https://github.com/umbraco/Umbraco4Docs/blob/master/Documentation/Installation/load-balancing.md#correct-config-for-scheduled-publishing--tasks


Glenn Lieberman 27 Jun 2014, 18:59:32

Wow, thanks so much for this timely fix and thanks for updating the 'load-balanced' doc where I read about this. I will anxiously await the 7.1.5 release as we are getting ready to go live on a load balanced setup that schedules publish and expire of content! Currently running 7.1.4.


Priority: Normal

Type: Bug

State: Fixed

Assignee: Shannon Deminick

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions: 6.0.0, 6.1.0, 7.0.0, 7.1.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.1.1, 6.0.6, 6.0.5, 6.0.7, 6.1.2, 6.2.0, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.1, 7.1.2, 7.1.3

Due in version: 7.1.5, 6.2.2

Sprint:

Story Points:

Cycle: