U4-3343 - Media files get stored as only the file extension

Created by Sebastiaan Janssen 05 Nov 2013, 08:40:29 Updated by Sebastiaan Janssen 21 Nov 2013, 09:22:13

Is duplicated by: U4-3406

Is duplicated by: U4-3407

Relates to: U4-3407

Relates to: U4-3632

See screenshot

So this happens because the file name's leading numbers are stripped off. We should revert to the previous way of renaming the files. It's a delicate issue for a lot of people, they don't like their file names messed with.

1 Attachments

Comments

Sebastiaan Janssen 05 Nov 2013, 09:00:10

Might be beause my images are named 01.jpg, 02.jpg etc?


Per Ploug 05 Nov 2013, 10:21:04

why are we doing this stripping out numbers?


Sebastiaan Janssen 06 Nov 2013, 09:21:26

I think something was changed to run media nodes through the safealias method, which also happens to run the filenames through the same method (wrongly). I spent a lot of time last year getting the filename renaming correct (we do need to strip out characters that can't be in a url or filepath), so this change needs to be reverted.


Shannon Deminick 12 Nov 2013, 08:45:01

I'll check see what's going on


esunxray 13 Nov 2013, 00:40:56

if file name is Chinese character like 中国.jpg, it'll be the same.


Shannon Deminick 13 Nov 2013, 01:04:52

I believe this is a change that Stephen has made with the new IOHelper.SafeFileName implementation, will check with him.


esunxray 13 Nov 2013, 01:12:53

All url of content node has the same problem. it will be cleaned.


Shannon Deminick 13 Nov 2013, 01:15:20

yup, will get this fixed, have asked Stephen to chime in on why the change is necessary.


Stephan 15 Nov 2013, 11:16:00

Looking into it at the moment.


Stephan 15 Nov 2013, 15:17:27

OK, let's discuss it here: we want to have some sort of rational "short strings" manipulation logic. By "short strings" I mean filenames, url segments, and aliases (anything else?). For all those strings we have the input value (what the user enters) and the "clean" value (what we're going to use). It looks like the new cleaning code is way too agressive. OK. Can we define the rules for each case?

  • filename : there's a set of illegal chars in a filename (slash, star...). we should replace those chars with a dash, and otherwise leave the filename untouched (so unicode chars are ok). that should be enough, until we try to use that filename in a url.

  • alias : an alias can be used as a property name in C# code or JavaScript, and as an element name in XML. So it should be ASCII, without spaces, dashes, dots, nor any reserved character. Also, it should not beging with a number. So "66 bröski dang.dang" should become "broskiDangDang" assuming we want them camel-cased.

  • url segment : an url segment is going to be used as part of an url. there's a set of illegal chars in a url fragment (slash, question mark). we should replace those chars with a dash, and otherwise leave the url untouched? OR should we %-encode everything that's not utf8? not ascii? @esunxray: what would you expect node.Url to return? raw chineese? percent-encoded chineese? most modern browsers seem quite happy with raw unicode...

I'd be happy if we could refine these rules (and make sure we don't have others... do we have several type of aliases?) so the implementation can be fixed.


esunxray 16 Nov 2013, 01:16:01

Thank you. I want to tell you somethings:

  1. There are more than 20,000 Chinese characters.
  2. I just want Umbraco can auto generate page url or media name when the node or media name is Chinese. With V6, It's good enough for me. page url is raw Chinese, Media item name is support Chinese. It's OK for me. but if you have a better way to solve this problem for some security reason, you can try the new way.


Shannon Deminick 16 Nov 2013, 07:43:37

@Stephen, it all makes sense, i just think that the logic put in to the DefaultShortString for the URL was very aggressive - even the LegacyShortString was stripping off leading numbers (too aggressive). We should review the code that I've put back in there in v7 to see if it's not aggressive enough as you've got far more experience with this stuff than i do. I do know the logic that is in there now solves this issue though.


Stephan 16 Nov 2013, 09:49:36

OK I have ideas for the new Default...Helper so that it supports raw unicode where needed... and can be configured to constrain to utf8 or ascii when needed... will try to update it over the weekend.


Stephan 16 Nov 2013, 10:15:10

Actually @esunxray what would be great is if you could provide me with a set of test cases: original filename + target filename, original alias + target alias, original content name + target url.


Sebastiaan Janssen 18 Nov 2013, 17:26:26

Can we please revert this to the old behavior for now, we don't have time to properly test this before final release in a few days. Same for U4-3407


Shannon Deminick 18 Nov 2013, 21:53:16

@Sebastian - i think you are confused. I have fixed this issue in the v7 codebase, the code that handles this now is the same code (slightly improved) that existed in pre 6.1 which has worked for umbraco for ages.

The code in 6.1 is slightly flawed and the previous code in 7.0 is what caused the logdgement of this issue in the first place.

The reason this is still open is because I wanted Stephan to review the code that is now in 7.0 to ensure that it's not missing some logic that was put into 6.1/7.0 that needed to be there.


Sebastiaan Janssen 18 Nov 2013, 22:15:20

A'ight, looking good, Chinese characters work again and all of my incredibly weirdly named test files seem to be okay!


esunxray 19 Nov 2013, 00:38:26

I suggest that for no Latin character, you should give user a choice to use id or raw character as page url. The reason is: If I only can use Chinese as url, then, your can't be able to enter this url and goto this page. because you don't know Chinese.


Shannon Deminick 19 Nov 2013, 00:48:13

umbraco will load any page given it's ID already. For example if you have a page with ID: 1234 , you can go to this URL and it will work:

http://mysite/1234


esunxray 19 Nov 2013, 02:03:44

I know this very well.but if I want to use id as url, I must modify all my old codes, another reason is: some packages are not use id as url. add config to let umbraco decide how to generate url is a flexible way to solve this problem. id or raw chacater? give user a choice is not a bad idea.


Shannon Deminick 19 Nov 2013, 04:01:10

You can override this behavior by using: http://our.umbraco.org/wiki/reference/umbraco-best-practices/umbracourlname


Stephan 19 Nov 2013, 08:11:48

@exsunray: the new IUrlSegmentProvider interface allows you to replace the code that creates a url segment, ie the url part corresponding to a node. The new IUrlProvider interface allows you to replace the code that create the whole url (by default, by assembling nodes segments). Using this two interfaces, it should be quite easy to ensure that Umbraco always returns example.com/1234 for every page, anytime you get the page's url.


esunxray 19 Nov 2013, 08:32:28

Thanks, I'm not a programer, I'm purely a user, so I don't know how to do it. All my suggestion is based a user not a programer.


Sebastiaan Janssen 21 Nov 2013, 09:22:13

This issue is now fixed, other improvements are on the way and will be tracked in a seperate issue (U4-3632).


Priority: Major

Type: Bug

State: Fixed

Assignee: Shannon Deminick

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions: 7.0.0

Due in version: 7.0.0

Sprint:

Story Points:

Cycle: