U4-9041 - Intermittently taking over 100 seconds to save and publish data

Created by Paul D'Ambra 07 Oct 2016, 09:01:04 Updated by Claus Jensen 14 Jun 2017, 13:43:45

We have an Umbraco site (running 7.3.8) which has been in prod for several years and has been upgraded in place several times.

Intermittently we see spikes of the time taken to save and publish to well over 100 seconds.

These seem to cluster together but there are no obvious hardware issues that correlate with the slow save and publish.

Application monitoring shows that the slowdown is inside umbraco/backoffice/UmbracoApi/Content/PostSave

Profiling our database during a period of slow saves showed many, many instances of the same query being run against every version of multiple joined items

exec sp_executesql N'SELECT cmsPropertyData.*
FROM cmsPropertyData
INNER JOIN cmsPropertyType
ON cmsPropertyData.propertytypeid = cmsPropertyType.id
INNER JOIN 
    (SELECT cmsContent.nodeId, cmsContentVersion.VersionId FROM [cmsDocument]
INNER JOIN [cmsContentVersion]
ON [cmsDocument].[versionId] = [cmsContentVersion].[VersionId]
INNER JOIN [cmsContent]
ON [cmsContentVersion].[ContentId] = [cmsContent].[nodeId]
INNER JOIN [umbracoNode]
ON [cmsContent].[nodeId] = [umbracoNode].[id]
LEFT OUTER JOIN [cmsDocument] [cmsDocument2] ON ([cmsDocument2].[nodeId]=[cmsDocument].[nodeId] AND [cmsDocument2].[published]=1)
WHERE ([umbracoNode].[nodeObjectType] = @0)
AND (cmsContentVersion.VersionId = @1)
) as docData
ON cmsPropertyData.versionId = docData.VersionId AND cmsPropertyData.contentNodeId = docData.nodeId
LEFT OUTER JOIN cmsDataTypePreValues
ON cmsPropertyType.dataTypeId = cmsDataTypePreValues.datatypeNodeId',N'@0 nvarchar(40),@1 nvarchar(40)',@0=N'c66ba18e-eaf3-4cff-8a22-41b16d66a972',@1=N'6a245cee-89d9-4588-9a32-dea94f300f91'

Is this behaviour is intended or desirable.

2 Attachments

Download while-publishing.log

Comments

Claus Jensen 10 Oct 2016, 09:59:44

This is definetely not intended and neither desirable. We have done a bunch of fixes to the queries being done by the Core recently, so first of all I would urge you to update to the very latest version.

If you are still seeing issues, we will have to start debugging it from there.

Do you by any chance have a lot of backoffice users in your site? - we have had some issues with the way notification queries are handled on save/update of documents, which could cause a major slowdown in this process if you had a good amount of users configured.


Paul D'Ambra 10 Oct 2016, 10:03:23

Thanks @claus. We may be updating to 7.5.x in short order as the customer wants to start to use Grid.

We're going to delete old versions of propertyData and look into reducing the number of properties on key pages that aren't used anymore.

I'll update this ticket when that's done.

I don't think there are many backend users - i.e. fewer than 10 concurrent at busiest periods but we don't have monitoring on that directly (iirc)


Claus Jensen 10 Oct 2016, 10:39:24

@pauldambra Sounds good!

If reducing number of properties is part of what you would want done anyways, then sure - go ahead. If however you're mainly doing it to fix the slowdown issue I'd really suggest to try the upgrade before doing any further changes.

I would say you would have to have a very large amount of properties for this to actually be a cause for the problem :) .. that being said - it's never bad to do a cleanup/refactor to get rid of outdated stuff so I'm not suggesting you shouldn't do it at all.

In regards to the backoffice users, it's not really ''concurrent'' users but actually the total amount of users. Some of the notification code was doing some lookups that weren't really that thought-through and was doing it once for every single user set up in the site as a part of handling notifications.

But yes - please keep us updated - I will mark the issue as 'awaiting feedback'.


Paul D'Ambra 19 Oct 2016, 09:46:16

@claus just to keep you updated. I'm working through getting 7.3.8 upgrade to 7.5.3 locally. there are a few things related to customisations that I need to resolve but that's chugging along.

we removed a number of tabs and properties from our home node that the users no longer user. and deleted old versions of the data.

in our staging and live environments that appeared to take save and publish from c180 seconds to c40 seconds

We used this sql to do that


DECLARE @Date DATE = '2016-07-20'
DECLARE @Versions INT = 1;

SELECT	PD.*
--DELETE PD
FROM	cmsContent C WITH (NOLOCK)
		INNER JOIN cmsPropertyData PD WITH (NOLOCK) ON C.nodeId = PD.contentNodeId
		INNER JOIN cmsContentVersion CV WITH (NOLOCK) ON CV.VersionId = PD.versionId
WHERE	PD.versionId NOT IN 
		(
			SELECT	TOP(@Versions) VersionId
			FROM	cmsContentVersion nCV
					
			WHERE	nCV.ContentId = C.nodeId
			ORDER BY nCV.VersionDate DESC
		)
AND		CV.VersionDate < @Date
--ORDER BY C.nodeId, CV.VersionDate

SELECT	PX.*
--DELETE PX
FROM	cmsContent C WITH (NOLOCK)
		INNER JOIN cmsContentVersion CV WITH (NOLOCK) ON C.nodeId = CV.ContentId
		INNER JOIN cmsPreviewXml PX WITH (NOLOCK) ON CV.VersionId = PX.versionId
WHERE	PX.versionId NOT IN 
		(
			SELECT	TOP(@Versions) VersionId
			FROM	cmsContentVersion nCV
					
			WHERE	nCV.ContentId = C.nodeId
			ORDER BY nCV.VersionDate DESC
		)
AND		CV.VersionDate < @Date
--ORDER BY C.nodeId, CV.VersionDate


SELECT	CV.*
--DELETE CV
FROM	cmsContent C WITH (NOLOCK)
		INNER JOIN cmsContentVersion CV WITH (NOLOCK) ON C.NodeId = CV.ContentId
WHERE	CV.versionId NOT IN 
		(
			SELECT	TOP(@Versions) VersionId
			FROM	cmsContentVersion nCV
					
			WHERE	nCV.ContentId = C.nodeId
			ORDER BY nCV.VersionDate DESC
		)
AND		CV.VersionDate < @Date
--ORDER BY C.nodeId, CV.VersionDate

I've attached a log taken with logging set to DEBUG while publishing our home node before that change in case anything stands out. As I'm guessing 40 seconds is still too high.


Claus Jensen 19 Oct 2016, 10:07:19

@pauldambra Although 40 seconds is a lot better than 180 seconds - it is still nowhere near acceptable.

I'd be really interested in knowing whether this problem is fixed when you get to v. 7.5.3. If you are still seeing issues when done upgrading, would it be possible for you to provide us with a copy of your site and database so we can debug the issue and have it resolved?


Paul D'Ambra 19 Oct 2016, 10:16:56

Aiming to be on 7.5.3 in pretty short order and publish speed is definitely on our customer's radar :)

Would need to confirm with customers and business that they're happy to share - presumably we could arrange an NDA? If we can't resolve the issue ourselves I'd love to get your input!


Claus Jensen 19 Oct 2016, 10:31:11

@pauldambra Of course. NDA is not a problem if it is necessary, so just let me know how it goes and we can take it from there.


Paul D'Ambra 07 Feb 2017, 18:03:41

@claus just managed to make time to look at this again.

We're now on 7.5.3. And I have access to the CMS server with no users to upgrade to .Net 4.5.2

After upgrade and reboot I published the home page twice.

The first publish took 102 seconds. Immediately after another publish took 99.

Any suggestions on where to look next?


Claus Jensen 08 Feb 2017, 07:15:18

@pauldambra Strange that you are still having problems with the slow publishing. Could you check with the customer, if you would be able to provide us with a copy of the site+database so we can take a look at what's causing the delays?


Chris Roberts 08 Feb 2017, 11:49:12

Just wanted to add that we have been doing some upgrades for some of our client sites recently and we're also experiencing this (or a very similar) issue.

The symptoms are that occasionally when publishing content the sites will grind to a halt for a minute or two. You can publish several documents with no delay, then all of a sudden, the next time you publish everything slows down.

This even affects the public front-end of the site - everything appears to be blocked until it eventually sorts itself out. This is clearly causing some problems for our clients! We appear to be experiencing this in v7.4.3 and v7.5.6.

I'll review the server logs and post back if I find anything that looks useful.


Claus Jensen 08 Feb 2017, 12:07:03

@solutionmania Thanks - anything that can help us replicate it in a fresh site, or a backup of a site/db having the issue would be very helpful in tracking down the problem!


Paul D'Ambra 08 Feb 2017, 12:17:42

If I publish on my machine it takes about 1 second. I'm going to pull the database locally to see if that affects publish speed. If it does are there are any good steps you'd recommend for my investigation.

I'm just talking internally here and with the client to check everyone is happy shooting an NDA over to you so we can start the process of someone there taking a look too


Paul D'Ambra 08 Feb 2017, 16:30:50

@claus can you send me the legal entity name to put on the NDA?


Claus Jensen 09 Feb 2017, 07:49:18

@pauldambra I've sent it to the email you have registered on your profile here :)


Paul D'Ambra 28 Feb 2017, 14:16:12

@claus just had unexpected time to investigate further.

In UAT it currently takes nearly 2 minutes to publish the home page. publishing it locally with our 'dev' database takes 1 second. I pulled the UAT database locally and it took 52 seconds to publish. So that should rule out our code, right?

Any obvious DB things I can do? Force clear cache tables or something?


Alessandro Calzavara 28 Feb 2017, 14:25:59

Maybe try to rebuild/reorganize all the indexes on the DB


Paul D'Ambra 28 Feb 2017, 14:46:23

@Eyescream Grazie! We just checked and IX_cmsPropertyData_2 on cmsPropertyData was 99% fragmented o_O In fact there are 20-30 tables that are 80%+ fragmented.

Rebuilding that index looks to have solved the publish problem locally. I'm just waiting on confirmation I can apply it to our UAT to confirm there.


Paul D'Ambra 28 Feb 2017, 15:17:29

@Eyescream Ah, no! Still takes 2 minutes in UAT even with those indexes rebuilt :(

Must have been an artefact running locally


Paul D'Ambra 28 Feb 2017, 15:33:43

SELECT dbschemas.[name] as 'Schema',
dbtables.[name] as 'Table',
dbindexes.[name] as 'Index',
indexstats.avg_fragmentation_in_percent,
indexstats.page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS indexstats
INNER JOIN sys.tables dbtables on dbtables.[object_id] = indexstats.[object_id]
INNER JOIN sys.schemas dbschemas on dbtables.[schema_id] = dbschemas.[schema_id]
INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id]
AND indexstats.index_id = dbindexes.index_id
WHERE indexstats.database_id = DB_ID()
ORDER BY indexstats.avg_fragmentation_in_percent desc
alter index IX_cmsPropertyData_2 on cmsPropertyData rebuild;
alter index IX_umbracoNodeUniqueID on umbracoNode rebuild;
alter index IX_umbracoNode_uniqueID on umbracoNode rebuild;
alter index PK_cmsDocument on cmsDocument rebuild;
alter index IX_cmsContentVersion_VersionId on cmsContentVersion rebuild;
alter index IX_cmsPropertyData_3 on cmsPropertyData rebuild;
alter index IX_cmsDocument_published on cmsDocument rebuild;
alter index IX_cmsDocument_newest on cmsDocument rebuild;
alter index IX_umbracoLog on umbracoLog rebuild;
alter index IX_cmsContentVersion_ContentId on cmsContentVersion rebuild;
alter index IX_umbracoNodeParentId on umbracoNode rebuild;
alter index [PK_dbo.MediaFolders] on MediaFolders rebuild;
alter index [PK_dbo.MediaItems] on MediaItems rebuild;
alter index PK_cmsPreviewXml on cmsPreviewXml rebuild;
alter index IX_MediaFolderId on MediaItems rebuild;
alter index IX_cmsDocument on cmsDocument rebuild;
alter index PK_cmsContentXml on cmsContentXml rebuild;
alter index PK_cmsPropertyData on cmsPropertyData rebuild;
alter index IX_umbracoNodeObjectType on umbracoNode rebuild;
alter index PK_cmsPropertyType on cmsPropertyType rebuild;

we ran the above which identified many tables with 66%+ fragmentation. Locally this appeared to resolve the publish, but not in the actual UAT environment. I'll have to recreate locally again to see what the what


Florin Lazau 06 Apr 2017, 18:29:02

Hi guys,

Any updates here?

Local everything works but in azure save and publish is slow.

Thanks, Florin


Paul D'Ambra 07 Apr 2017, 09:28:34

We've had work here that's pushed this off my stack so there's been no progress here. We've prepared an NDA so when I have time to verify the environments aren't the issue I should be in a place to get that signed and let people look to see if it's a code+data issue.


Paul D'Ambra 14 Jun 2017, 13:06:55

@claus \o/ I found some logging code that was committed over 4 years ago and was switched off in some configs that when I accidentally removed it from our publish event handlers took us down to 8 seconds to publish.

Thanks so much for all of your help!


Claus Jensen 14 Jun 2017, 13:43:45

@pauldambra Really happy to hear you got your issues resolved! :)

Considering this was some custom code causing the issue, if anyone else is having similar issues - please create a new issue on the tracker with info, instead of reopening this one.


Priority: Critical

Type: Bug

State: Closed

Assignee:

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions:

Due in version:

Sprint:

Story Points:

Cycle: