U4-9513 - Empty recycle bin SQL logic can SQL to timeout due to query inefficiencies

Created by Shannon Deminick 10 Feb 2017, 01:03:11 Updated by Claus Jensen 10 Feb 2017, 10:43:11

Tags: Unscheduled

Subtask of: UAASSCRUM-790

For some reason there is a lot of joins taking place when emptying the recycle bin, however these joins are totally unnecessary.

For example, the query that consumes the most CPU is this one:

DELETE FROM cmsPropertyData WHERE cmsPropertyData.contentNodeId IN 
(SELECT TB1.contentNodeId FROM cmsPropertyData as TB1 INNER JOIN umbracoNode as TB2 ON TB1.contentNodeId = TB2.id WHERE TB2.trashed = '1' AND TB2.nodeObjectType = 'c66ba18e-eaf3-4cff-8a22-41b16d66a972')

The INNER JOIN here is not needed at all, this can be rewritten very simply as:

DELETE FROM cmsPropertyData WHERE cmsPropertyData.contentNodeId IN 
(SELECT id FROM umbracoNode WHERE trashed = '1' AND nodeObjectType = 'c66ba18e-eaf3-4cff-8a22-41b16d66a972')

this will have the exact same outcome and the simply query for id in umbracoNode is 91% more efficient according to SQL mgmt studio.

It turns out that 13 of the 17 queries suffers from this exact same problem. So changing these queries to not use an unneeded INNER JOIN will be WAY faster and not chew up SQL CPU or time causing it to timeout.

Comments

Shannon Deminick 10 Feb 2017, 01:29:40

PR for review here: https://github.com/umbraco/Umbraco-CMS/pull/1747

To test:

  • Make sure unit tests pass
  • Create some content with: relations, permissions, notifications, public access, tags, domains, published and preview data, multiple versions ... then recycle it and then empty the recycle bin


Claus Jensen 10 Feb 2017, 10:42:31

@Shandem Comparing the two querys the INNER JOIN will actually just result in the inner select statement getting everything it needs with multiple results - adding distinct on this gives me 100% the same result as the query without inner join. So from my point of view, the query does seems good. It is also very similar to a sql script I've been using on problematic sites with large recycle bins.

I have one little thing though - in my script I'm also disabling check constraints on cmsContent, cmsDocument, and cmsPropertyData, while doing the deletes and enabling them again afterwards. I've had some sites give issues if I didn't do this when running from SQL MGMT Studio... but not sure if we should include it here. Considering noone has complained about it yet, when the query is run via the Umbraco DB context, I think it may be better to just leave the constraints on - what do you think?

I have made a note on your PR regarding the UPDATE statement and constraints. https://github.com/umbraco/Umbraco-CMS/pull/1747/files#r100506940


Claus Jensen 10 Feb 2017, 10:43:11

Marked as fixed and have merged in.


Priority: Normal

Type: Bug

State: Fixed

Assignee:

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions:

Due in version: 7.5.10

Sprint: Sprint 52

Story Points:

Cycle: