COU-608 - Update Courier to pre-fetch all media path + Id entries instead of running the SQL query each time

Created by Shannon Deminick 14 Aug 2017, 05:44:38 Updated by Claus Jensen 16 Aug 2017, 11:24:01

Tags: Unscheduled

Subtask of: COU-521

Currently there is no good way to fetch a media item by it's path because the SQL used to do so is very very slow due to the LIKE query that must execute because media paths are now hidden in JSON in the ntext column due the usage of the cropper and if the media still uses an upload control to make matters more difficult that uses the nvarchar column and not the ntext column. It's worth noting that it doesn't matter if any indexing is applied to these columns: You cannot add an index to an ntext column but even if you could executing a LIKE query with a prefix wildcard will never use that index anyways.

There are various suggested options in the past but unfortunately due to the volatility of different umbraco installs, old umbraco data due to upgrades and just not being able to know exactly what data is stored in the RTE or the Grid, we need to resort to the media path lookups. I understand that in the RTE we sometimes put an "data-id" attribute next to where the media path might be but that is only on newer versions and older data won't have that so it's not reliable. Older version used to have a "rel" attribute with the ID too but that is also not reliable since that didn't exist on even older versions and doesn't exist on newer versions. Then we have the new UDI pickers which store a data-udi attribute. ... and here's the real problem: These attributes and their values get transfered as-is by Courier (and potentially deploy) so we don't know where the INT or UDI Id originally came from, it's not reliable to assume that it's correct. Thus ... we must do a path lookup.

To fix this now, we lookup a media item by path in Courier we should (same as Deploy more or less):

  • Lazily build an in memory dictionary per Execution Context (we do this sort of thing a lot in Courier so is pretty easy): ** Create a forward only db reader with as efficient of an SQL query as we can - the least amount of data returned and fetched = a media item with the correct node object type and only for items that are of type upload or image cropper ** For each row returned ** Cache this data in a dictionary where the path is the Key
  • Use this dictionary for all subsequent lookups of media paths during a ExecutionContext

Longer term we'll add another Db table to Umbraco to deal with this correctly , see http://issues.umbraco.org/issue/U4-10286

Comments

Shannon Deminick 16 Aug 2017, 09:07:43

PR: https://github.com/umbraco/UmbracoDeploy/pull/109

Turns out we resolved this 9 months ago (see https://github.com/umbraco/UmbracoDeploy/commit/b8c57b0c409ede371ecd1637d48d24518c49b4aa) for the RTE and the MediaHelper.GetMediaIdByPath does this exact caching lookup as described in this task. So we can extend this helper/lookup to include the GUID and also enhance the lookup so it's faster and uses less memory.

It turns out that the Grid doesn't use this method though and uses the NHibernateProvider.GetUniqueIdFromMediaFile which I didn't know existed. It will be updated to redirect it's logic to the MediaHelper.

Testing:

  • Update your grid to suppport images in the RTE
  • Add an RTE to the grid, add images to it
  • Add images directly in the grid
  • Also have a non-grid RTE on your document and add images in there
  • Transfer - it should all be successful, no errors


Claus Jensen 16 Aug 2017, 11:23:46

Code looks good and have tested that it works - merged in!

  • Also checked that the cache is cleared correctly when deleting items.
  • Tested having an invalid udi on the tag still works by image path lookup
  • Tested having an invalid id on the tag still works by image path lookup


Priority: Normal

Type: Task

State: Fixed

Assignee:

Difficulty:

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions:

Due in version: 3.1.4

Sprint: Sprint 65

Story Points: 3

Cycle: 3