U4-6992 - Need to figure out the best way for scheduled tasks to run with new Database server messenger

Created by Shannon Deminick 19 Aug 2015, 17:42:48 Updated by Shannon Deminick 31 Aug 2015, 16:03:24

Relates to: U4-7046

Currently there is no way for scheduled tasks to run correctly when using the new DatabaseServerMessenger w/ load balancing. This is because all server work autonomously and don't know about each other (which was the whole point of setting this up). So currently, all servers taking part in the load balancing environment will all run scheduled tasks. This might lead to some issues:

  • For scheduled publishing/unpublishing, this might actually be ok, it would just mean that each server is going to execute the publish/unpublish logic against the database and cache distribution which is just added overhead
  • For custom scheduled tasks, this could cause issues because these are user tasks which could throw exceptions if executed more than once on many machines

Another thing to think about is the base url that scheduled tasks execute against. There's a few places where we detect the base url currently: The first URL request detected when the site starts, the base url in the umbracoSettings.config file in the new request section, the server list when dist calls are enabled (since each server can be assigned an address). However, in the case of the DatabaseServerMessenger, part of the point of this setup is for auto-scaling so in that case, an administrator cannot know the internal URL for every server, nor should they need to update any config when adding servers (since that negates the whole auto-scale concept). We need to come up with a solution to all of this.

Comments

Stephan 24 Aug 2015, 09:23:42

Notes on "master server":

Some background tasks (eg keepalive) need to run on every servers, while some other tasks (scheduled publishing) need to run on the master server only - this is controlled by the task itself, which can check whether the server is slave/master.

Scheduled publishing ''does'' check and make sure it does not run on a slave server, but that check runs against the settings - must fix - it needs to run against the server registrar because the way we figure out if a server is slave or master depends on the registration method.

We can maintain a "master" status with the database registration: *Add a IsMaster field to the database, false by default *When a server is "touched", if no other server has IsMaster set, set IsMaster for that server *When checking for server status, if no server has IsMaster set, the current server becomes the master *When a server is "stale", reset IsMaster for that server

That would of course require proper database-level locking.

At the moment the "stale" timeout is 1 day, that's far too long, we should consider a server stale if it hasn't been touched for a few minutes. And that would cause another server to become the master.

Next: need to evaluate how much of this can be achieved without breaking compatibility.


Stephan 24 Aug 2015, 09:31:39

Notes on "urls":

The various config options to specify the "umbraco application url", ie the url that should be used to talk to Umbraco, exist precisely because the detection mechanism would not always work properly. So we let ppl manually configure the url.

At the moment the detection mechanism works by using the url of the first incoming request.

If we want true auto-scaling then manual configuration is not an option. Which means we need to fix the detection mechanism somehow. Or, fix ''and'' extend.

Ideas: *The UmbracoApplicationUrl setting could support patterns eg "http:///umbraco" *We could provide a hook so that ppl with exotic requirements can plug their code in and figure out the url


Stephan 26 Aug 2015, 12:04:25

Have pushed 20d8656237821a1b98df4b04ea46a8d1571ea6bd that fixes issues with servers registration. Details:

Load Balancing Configuration

By default in 7.3 the new database-based load-balancing mechanism is always enabled. This is because that mechanism is needed to better handle the concurrent AppDomain situations we can face when an AppDomain restarts. That mechanism is auto-scalable, ie servers do not need to be configured nor registered, and it works with 1 or more servers.

For backward compatibility reasons, when the ''umbracoSettings:settings/distributedCall/@enabled'' attribute is set to true, the legacy, web-service-based load-balancing mechanism is activated and replaces the database-based mechanism. And then nothing is changed compared to 7.2. That setup should only be used when backward compatibility is required.

For new load-balanced setups, no configuration is required, it is always-on. Just start a new server pointing to the same database.

Load Balancing Master Server

It is now the responsibility of the IServerRegistrar to determine the current server role (single/slave/master). The legacy ConfigServerRegistrar determines the role based on the settings. The new DatabaseServerRegistrar uses a discovery mechanism. Whenever a server is "touched" (ie on every content request, throttled):

  • the server record is updated, and the server is marked active
  • if no server is marked as master, the server becomes master
  • stale servers are de-activated (and a stale master is not master anymore)

In a virgin environment, the first server to go up would then be master. Should that server go offline, it will be detected as stale and will not be master anymore. The next server that is touched would then become master. The worst case would be that the other server is touched just before the first server is detected as stale, and then we would have to wait for the throttle delay before the other server becomes master. Ie, during at max (stale + throttle) delay, no server is master. By default, that would be 2min + 30s or 2.5min, which is acceptable.

People should be aware that they should not rely on a master server being always present. Put things in queues and wait for the master server to pick them.

In an auto load-balancing environment, a server could become master after having been slave for a while, if the former master goes offline. For that reason, the server role is not static. Tasks such as ScheduledPublishing have been adapted: if the server role does not allow execution, the task is skipped but repeated.

Also, while Umbraco is booting and has not processed a content request yet, the server role is unknown. Tasks such as ScheduledPublishing have been adapted, and will not run as long as the role is unknown.

Umbraco Application Url

That url is the url that Umbraco should use to talk to itself, eg for scheduled publishing, scheduled tasks, anything. It should look like "http://www.mysite.com/umbraco".

It is available via ApplicationContext.Current.UmbracoApplicationUrl. Before serving any request, the UmbracoModule ensures that the url is configured, using the ApplicationUrlHelper. If the url is not configured yet, that helper will:

  • try umbracoSettings:settings/web.routing/@umbracoApplicationUrl
  • try umbracoSettings:settings/scheduledTasks/@baseUrl
  • try the current IServerRegistrar
  • try the method ApplicationUrlHelper.ApplicationUrlProvider
  • use the current request

The legacy ConfigServerRegistrar can return a url (based on the settings) whereas the new DatabaseServerRegistrar cannot, since nothing is configured and all is auto. Hence the ApplicationUrlProvider method, that can be used to determine the url programmatically. It receives the current request as a parameter, and should return a string (or null).


Priority: Critical

Type: Task

State: Fixed

Assignee: Stephan

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions:

Due in version: 7.3.0

Sprint:

Story Points:

Cycle: