We have moved to GitHub Issues
Created by Shannon Deminick 01 Aug 2013, 03:24:08 Updated by Richard Soeteman 14 Aug 2013, 08:40:47
When there's a large amount of data in the recycle bin, emptying it takes quite a long time and chews up a lot of resources. I'm sure this can be made WAY better than it currently is.
The recycle bin is currently emptied by getting all items in the bin and then iterating over all of them deleting them one by one, which isn't very effective, but is done because we also need to check for any associated files (referenced from an upload field/property) that needs to be deleted.
I'm thinking that we could make a query for all items in the recycle bin, which contains an upload datatype (this is really the only one we can check by convention) and then save a list (in-memory) of found files. These files can be deleted from another thread while we run a sql delete statement for the items in the recycle bin - something like: DELETE FROM umbracoNode WHERE trashed = 'true' AND nodeObjectType = 'C66BA18E-EAF3-4CFF-8A22-41B16D66A972' BUT a single statement like this would only work if we had integrity built into the schema with DELETE CASCADE.
So in order to optimize the empty'in of the recycle bin I propose that we update the schema with DELETE CASCADE (where its relevant - mostly around foreign keys to umbracoNode's id) and indexes on foreign key constraints, which would also optimize joins (not only for deleting but also for fetching) while we're at it.
Possibly version 6.2 ?
Even without a cascading delete, we can still run all of the bulk delete statements in a single transaction to clear all of the data from the tables we need. We also shouldn't really go deleting the files until the data is successfully gone from the db either, if the transaction fails we'd still want the files. I also don't think that deleting the files is really going to be a big bottleneck. Having a cascade delete would simplify the delete process and of course speed up a bulk delete operation but even if we first implement a bulk delete operation without cascade delete as opposed to individual deletions that would massively improve performance. I'm also not sure what events fire for each item deleted from the recycle bin? Do we currently execute an event for that with a strongly typed object passed in for each one ? If so, I think we should start looking at the bulk gathering of property data so we can look up a lot of content at once with limited SQL calls (Based on our conversation on Podio).
Well, any changes is going to be an improvement right ;) When deleting a single content item these sql delete statements are run:
DELETE FROM umbracoUser2NodeNotify WHERE nodeId = @Id DELETE FROM umbracoUser2NodePermission WHERE nodeId = @Id DELETE FROM umbracoRelation WHERE parentId = @Id DELETE FROM umbracoRelation WHERE childId = @Id DELETE FROM cmsTagRelationship WHERE nodeId = @Id DELETE FROM umbracoDomains WHERE domainRootStructureID = @Id DELETE FROM cmsDocument WHERE NodeId = @Id DELETE FROM cmsPropertyData WHERE contentNodeId = @Id DELETE FROM cmsPreviewXml WHERE nodeId = @Id DELETE FROM cmsContentVersion WHERE ContentId = @Id DELETE FROM cmsContentXml WHERE nodeID = @Id DELETE FROM cmsContent WHERE NodeId = @Id DELETE FROM umbracoNode WHERE id = @Id
Would you simply do a join for each of these tables to umbracoNode (expect on the umbracoNode table itself), so you can do WHERE trashed = true AND nodeId = @nodeId ? If so you'd ideally also want to add indexes on the joined columns. Check this article: http://www.mssqltips.com/sqlservertip/2743/using-delete-cascade-option-for-foreign-keys/
Ah yes, i see what you are getting at now. I guess it depends on if we want to wait until 6.2 or not. In 6.2 we could add the cascade delete options (and even the indexes on the joined columns if they are not there yet) but in the meantime we could just execute the bulk delete statement in 6.1.4. Also, when we delete any item that has multiple delete statements are we executing that in a single transaction? (we definitely should be... and it will actually execute faster)
All the delete statements I listed above is done within one transaction - its the sql thats run when you call contentService.Delete(content) which then calls the repository's PersistDelete. The slowness is because emptying the recycle bin is X times .Delete() + the file check as far as I remember.
Yup, i understand why it is slow, just wanted to double check that all those queries were done in one trans :)
I think it'd be great if we can get the bulk delete working in 6.1.4 so that it is at least much faster than 1 at a time, and then we can make it faster again in 6.2 by modifying the db cascading ?
Yup, thats fine. I thought about adding an EmptyRecycleBin method in the Content- and MediaRepository, as they are the only two entities that has a Recycle Bin plus it would be nicer to have this "low level" stuff in the repository instead of the services. What do you think?
Sure sounds good... would be nice to start sharing some of the logic between these repos and services too.
@Morten, I've modified some code here in revision: 1e61563f792aa5a2d9c39dcb9548ee64320e452b We need to wrap the deletions in a transaction so that if any of the delete processes fails the whole thing is rolled back. This will also improve performance.
Isn't the transaction started and committed at the service level with the unit of work ?
No, not in the recycle bin case (My other commit is incorrect: ce847e97a8fe8a757af8b005bbbbd9a9365cbf2a which I'm rolling back because the uow does take care of that).
The UOW only ever uses a transaction when using the Commit() method. In the case of the recycle bin, we're just going direct to the db, the only purpose the UOW serves in this case is access to the db object.
There is similar performance issues around updating doctypes (adding properties), it seems to cycle each node in the content tree with that doctype assigned and carries out certain actions.
This could be vastly improved with a similar approach to what has been identified above.
6.1.4 will be MUCH better for performance regarding actions like those: changing a doc type alias, adding/removing a doc type property, rebuilding xml structures
Shannon, I asked for a copy of a database with a lot of items in the bin to further test the improvements made with this task and Brendan donated the one above ;) Even though I'll have to upgrade the db schema should be a nice final test.
@Brendan I downloaded the database, so you can delete it now. Thanks!
@Shannon, looking forward to the performance improvements, my try get some of my clients with performance issues talked into upgrading
@Morten, thanks for letting me know, will delete it later this evening
I'm still seeing really slow performance emptying the Recycle Bin in 6.1.4 Build 32 on MySQL, bad enough to the point where I'm not sure it is actually doing anything at all.
@Brian would it be possible for you to update the schema of your db (if you have a Dev test version) to have Cascade Delete on the foreign keys that directly or indirectly references umbracoNode to see if it makes a difference? I can provide you the details needed if its possible for you to try out.
@Morten: I can try it out on my dev instance. Please let me know what needs changed.
@Morten If you still need a database with lot of items, maybe CMSImport can help. Import 500 records to a few nodes a few times then delete should end up with a lot of nodes in the recycle bin.
Assignee: Morten Christensen
Backwards Compatible: True
Due in version: 6.1.4