We have moved to GitHub Issues
Created by Paul D'Ambra 07 Oct 2016, 09:01:04 Updated by Claus Jensen 14 Jun 2017, 13:43:45
We have an Umbraco site (running 7.3.8) which has been in prod for several years and has been upgraded in place several times.
Intermittently we see spikes of the time taken to save and publish to well over 100 seconds.
These seem to cluster together but there are no obvious hardware issues that correlate with the slow save and publish.
Application monitoring shows that the slowdown is inside umbraco/backoffice/UmbracoApi/Content/PostSave
Profiling our database during a period of slow saves showed many, many instances of the same query being run against every version of multiple joined items
exec sp_executesql N'SELECT cmsPropertyData.* FROM cmsPropertyData INNER JOIN cmsPropertyType ON cmsPropertyData.propertytypeid = cmsPropertyType.id INNER JOIN (SELECT cmsContent.nodeId, cmsContentVersion.VersionId FROM [cmsDocument] INNER JOIN [cmsContentVersion] ON [cmsDocument].[versionId] = [cmsContentVersion].[VersionId] INNER JOIN [cmsContent] ON [cmsContentVersion].[ContentId] = [cmsContent].[nodeId] INNER JOIN [umbracoNode] ON [cmsContent].[nodeId] = [umbracoNode].[id] LEFT OUTER JOIN [cmsDocument] [cmsDocument2] ON ([cmsDocument2].[nodeId]=[cmsDocument].[nodeId] AND [cmsDocument2].[published]=1) WHERE ([umbracoNode].[nodeObjectType] = @0) AND (cmsContentVersion.VersionId = @1) ) as docData ON cmsPropertyData.versionId = docData.VersionId AND cmsPropertyData.contentNodeId = docData.nodeId LEFT OUTER JOIN cmsDataTypePreValues ON cmsPropertyType.dataTypeId = cmsDataTypePreValues.datatypeNodeId',N'@0 nvarchar(40),@1 nvarchar(40)',@0=N'c66ba18e-eaf3-4cff-8a22-41b16d66a972',@1=N'6a245cee-89d9-4588-9a32-dea94f300f91'
Is this behaviour is intended or desirable.
This is definetely not intended and neither desirable. We have done a bunch of fixes to the queries being done by the Core recently, so first of all I would urge you to update to the very latest version.
If you are still seeing issues, we will have to start debugging it from there.
Do you by any chance have a lot of backoffice users in your site? - we have had some issues with the way notification queries are handled on save/update of documents, which could cause a major slowdown in this process if you had a good amount of users configured.
Thanks @claus. We may be updating to 7.5.x in short order as the customer wants to start to use Grid.
We're going to delete old versions of propertyData and look into reducing the number of properties on key pages that aren't used anymore.
I'll update this ticket when that's done.
I don't think there are many backend users - i.e. fewer than 10 concurrent at busiest periods but we don't have monitoring on that directly (iirc)
@pauldambra Sounds good!
If reducing number of properties is part of what you would want done anyways, then sure - go ahead. If however you're mainly doing it to fix the slowdown issue I'd really suggest to try the upgrade before doing any further changes.
I would say you would have to have a very large amount of properties for this to actually be a cause for the problem :) .. that being said - it's never bad to do a cleanup/refactor to get rid of outdated stuff so I'm not suggesting you shouldn't do it at all.
In regards to the backoffice users, it's not really ''concurrent'' users but actually the total amount of users. Some of the notification code was doing some lookups that weren't really that thought-through and was doing it once for every single user set up in the site as a part of handling notifications.
But yes - please keep us updated - I will mark the issue as 'awaiting feedback'.
@claus just to keep you updated. I'm working through getting 7.3.8 upgrade to 7.5.3 locally. there are a few things related to customisations that I need to resolve but that's chugging along.
we removed a number of tabs and properties from our home node that the users no longer user. and deleted old versions of the data.
in our staging and live environments that appeared to take save and publish from c180 seconds to c40 seconds
We used this sql to do that
DECLARE @Date DATE = '2016-07-20' DECLARE @Versions INT = 1; SELECT PD.* --DELETE PD FROM cmsContent C WITH (NOLOCK) INNER JOIN cmsPropertyData PD WITH (NOLOCK) ON C.nodeId = PD.contentNodeId INNER JOIN cmsContentVersion CV WITH (NOLOCK) ON CV.VersionId = PD.versionId WHERE PD.versionId NOT IN ( SELECT TOP(@Versions) VersionId FROM cmsContentVersion nCV WHERE nCV.ContentId = C.nodeId ORDER BY nCV.VersionDate DESC ) AND CV.VersionDate < @Date --ORDER BY C.nodeId, CV.VersionDate SELECT PX.* --DELETE PX FROM cmsContent C WITH (NOLOCK) INNER JOIN cmsContentVersion CV WITH (NOLOCK) ON C.nodeId = CV.ContentId INNER JOIN cmsPreviewXml PX WITH (NOLOCK) ON CV.VersionId = PX.versionId WHERE PX.versionId NOT IN ( SELECT TOP(@Versions) VersionId FROM cmsContentVersion nCV WHERE nCV.ContentId = C.nodeId ORDER BY nCV.VersionDate DESC ) AND CV.VersionDate < @Date --ORDER BY C.nodeId, CV.VersionDate SELECT CV.* --DELETE CV FROM cmsContent C WITH (NOLOCK) INNER JOIN cmsContentVersion CV WITH (NOLOCK) ON C.NodeId = CV.ContentId WHERE CV.versionId NOT IN ( SELECT TOP(@Versions) VersionId FROM cmsContentVersion nCV WHERE nCV.ContentId = C.nodeId ORDER BY nCV.VersionDate DESC ) AND CV.VersionDate < @Date --ORDER BY C.nodeId, CV.VersionDate
I've attached a log taken with logging set to DEBUG while publishing our home node before that change in case anything stands out. As I'm guessing 40 seconds is still too high.
@pauldambra Although 40 seconds is a lot better than 180 seconds - it is still nowhere near acceptable.
I'd be really interested in knowing whether this problem is fixed when you get to v. 7.5.3. If you are still seeing issues when done upgrading, would it be possible for you to provide us with a copy of your site and database so we can debug the issue and have it resolved?
Aiming to be on 7.5.3 in pretty short order and publish speed is definitely on our customer's radar :)
Would need to confirm with customers and business that they're happy to share - presumably we could arrange an NDA? If we can't resolve the issue ourselves I'd love to get your input!
@pauldambra Of course. NDA is not a problem if it is necessary, so just let me know how it goes and we can take it from there.
@claus just managed to make time to look at this again.
We're now on 7.5.3. And I have access to the CMS server with no users to upgrade to .Net 4.5.2
After upgrade and reboot I published the home page twice.
The first publish took 102 seconds. Immediately after another publish took 99.
Any suggestions on where to look next?
@pauldambra Strange that you are still having problems with the slow publishing. Could you check with the customer, if you would be able to provide us with a copy of the site+database so we can take a look at what's causing the delays?
Just wanted to add that we have been doing some upgrades for some of our client sites recently and we're also experiencing this (or a very similar) issue.
The symptoms are that occasionally when publishing content the sites will grind to a halt for a minute or two. You can publish several documents with no delay, then all of a sudden, the next time you publish everything slows down.
This even affects the public front-end of the site - everything appears to be blocked until it eventually sorts itself out. This is clearly causing some problems for our clients! We appear to be experiencing this in v7.4.3 and v7.5.6.
I'll review the server logs and post back if I find anything that looks useful.
@solutionmania Thanks - anything that can help us replicate it in a fresh site, or a backup of a site/db having the issue would be very helpful in tracking down the problem!
If I publish on my machine it takes about 1 second. I'm going to pull the database locally to see if that affects publish speed. If it does are there are any good steps you'd recommend for my investigation.
I'm just talking internally here and with the client to check everyone is happy shooting an NDA over to you so we can start the process of someone there taking a look too
@claus can you send me the legal entity name to put on the NDA?
@pauldambra I've sent it to the email you have registered on your profile here :)
@claus just had unexpected time to investigate further.
In UAT it currently takes nearly 2 minutes to publish the home page. publishing it locally with our 'dev' database takes 1 second. I pulled the UAT database locally and it took 52 seconds to publish. So that should rule out our code, right?
Any obvious DB things I can do? Force clear cache tables or something?
Maybe try to rebuild/reorganize all the indexes on the DB
@Eyescream Grazie! We just checked and IX_cmsPropertyData_2 on cmsPropertyData was 99% fragmented o_O In fact there are 20-30 tables that are 80%+ fragmented.
Rebuilding that index looks to have solved the publish problem locally. I'm just waiting on confirmation I can apply it to our UAT to confirm there.
@Eyescream Ah, no! Still takes 2 minutes in UAT even with those indexes rebuilt :(
Must have been an artefact running locally
SELECT dbschemas.[name] as 'Schema', dbtables.[name] as 'Table', dbindexes.[name] as 'Index', indexstats.avg_fragmentation_in_percent, indexstats.page_count FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS indexstats INNER JOIN sys.tables dbtables on dbtables.[object_id] = indexstats.[object_id] INNER JOIN sys.schemas dbschemas on dbtables.[schema_id] = dbschemas.[schema_id] INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id] AND indexstats.index_id = dbindexes.index_id WHERE indexstats.database_id = DB_ID() ORDER BY indexstats.avg_fragmentation_in_percent desc
alter index IX_cmsPropertyData_2 on cmsPropertyData rebuild; alter index IX_umbracoNodeUniqueID on umbracoNode rebuild; alter index IX_umbracoNode_uniqueID on umbracoNode rebuild; alter index PK_cmsDocument on cmsDocument rebuild; alter index IX_cmsContentVersion_VersionId on cmsContentVersion rebuild; alter index IX_cmsPropertyData_3 on cmsPropertyData rebuild; alter index IX_cmsDocument_published on cmsDocument rebuild; alter index IX_cmsDocument_newest on cmsDocument rebuild; alter index IX_umbracoLog on umbracoLog rebuild; alter index IX_cmsContentVersion_ContentId on cmsContentVersion rebuild; alter index IX_umbracoNodeParentId on umbracoNode rebuild; alter index [PK_dbo.MediaFolders] on MediaFolders rebuild; alter index [PK_dbo.MediaItems] on MediaItems rebuild; alter index PK_cmsPreviewXml on cmsPreviewXml rebuild; alter index IX_MediaFolderId on MediaItems rebuild; alter index IX_cmsDocument on cmsDocument rebuild; alter index PK_cmsContentXml on cmsContentXml rebuild; alter index PK_cmsPropertyData on cmsPropertyData rebuild; alter index IX_umbracoNodeObjectType on umbracoNode rebuild; alter index PK_cmsPropertyType on cmsPropertyType rebuild;
we ran the above which identified many tables with 66%+ fragmentation. Locally this appeared to resolve the publish, but not in the actual UAT environment. I'll have to recreate locally again to see what the what
Any updates here?
Local everything works but in azure save and publish is slow.
We've had work here that's pushed this off my stack so there's been no progress here. We've prepared an NDA so when I have time to verify the environments aren't the issue I should be in a place to get that signed and let people look to see if it's a code+data issue.
@claus \o/ I found some logging code that was committed over 4 years ago and was switched off in some configs that when I accidentally removed it from our publish event handlers took us down to 8 seconds to publish.
Thanks so much for all of your help!
@pauldambra Really happy to hear you got your issues resolved! :)
Considering this was some custom code causing the issue, if anyone else is having similar issues - please create a new issue on the tracker with info, instead of reopening this one.
Backwards Compatible: True
Due in version: