U4-4706 - Non-normalized filenames in media

Created by Peter Josefsson 15 Apr 2014, 17:00:34 Updated by Shannon Deminick 26 Jun 2017, 05:38:46

File names that I think come from Macs don't follow normal Unicode normalization conventions, and fail in some programs (such as standard zip file formats, git etc). For example, "รค" (U+00E4) is saved as a combining diacritical mark and an "a". I haven't worked with this, but if I remember correctly, there are tools in the framework that can normalize text without breaking the cases where combinations are the only options. Given time, I may research this and post a comment soon. The MediaService should handle this transparently, I think, renaming files when necessary.

This is pretty disruptive, especially the git problem. Unfortunately, I haven't recorded in which versions I've seen this, but it is fairly new (never seen it on anything older than v6, POSSIBLY not older than v7).

Comments

Peter Josefsson 15 Apr 2014, 19:14:01

Okay, so here is the fix (to be inserted wherever umbraco has saved a file in the proper location and is about to set the umbracoFile property, create thumbnails and so on):

if (filename != null && !filename.IsNormalized()) { // evaluation of parameter lists are strictly left-to right, // so this is safe: System.IO.File.Move(filename, filename = filename.Normalize()); }

It turns out that the default normalization form is C (NFC), which gives us what we want - all "composite" character are first decomposed and ordered normally, then composed into equivalent single characters wherever possible.

Suspicion: The decompression tool used when uploading multiple images uses a lowlevel API that doesn't do this automagically (on the lowest level, NTFS supports ANY binary byte sequence as a file name, how it is intepreted is up to the subsystem - Win23, Win64, Unix or whatever - if I remember correctly).


Shannon Deminick 26 Jun 2017, 05:38:47

Closing issue due to inactivity - see blog post for details https://umbraco.com/blog/issue-tracker-cleanup/


Priority: Normal

Type: Bug

State: Closed

Assignee:

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions:

Due in version:

Sprint:

Story Points:

Cycle: