We have moved to GitHub Issues
Created by Sebastiaan Janssen 05 Nov 2013, 08:40:29 Updated by Sebastiaan Janssen 21 Nov 2013, 09:22:13
Is duplicated by: U4-3406
Is duplicated by: U4-3407
Relates to: U4-3407
Relates to: U4-3632
So this happens because the file name's leading numbers are stripped off. We should revert to the previous way of renaming the files. It's a delicate issue for a lot of people, they don't like their file names messed with.
Might be beause my images are named 01.jpg, 02.jpg etc?
why are we doing this stripping out numbers?
I think something was changed to run media nodes through the safealias method, which also happens to run the filenames through the same method (wrongly). I spent a lot of time last year getting the filename renaming correct (we do need to strip out characters that can't be in a url or filepath), so this change needs to be reverted.
I'll check see what's going on
if file name is Chinese character like 中国.jpg, it'll be the same.
I believe this is a change that Stephen has made with the new IOHelper.SafeFileName implementation, will check with him.
All url of content node has the same problem. it will be cleaned.
yup, will get this fixed, have asked Stephen to chime in on why the change is necessary.
Looking into it at the moment.
OK, let's discuss it here: we want to have some sort of rational "short strings" manipulation logic. By "short strings" I mean filenames, url segments, and aliases (anything else?). For all those strings we have the input value (what the user enters) and the "clean" value (what we're going to use). It looks like the new cleaning code is way too agressive. OK. Can we define the rules for each case?
filename : there's a set of illegal chars in a filename (slash, star...). we should replace those chars with a dash, and otherwise leave the filename untouched (so unicode chars are ok). that should be enough, until we try to use that filename in a url.
url segment : an url segment is going to be used as part of an url. there's a set of illegal chars in a url fragment (slash, question mark). we should replace those chars with a dash, and otherwise leave the url untouched? OR should we %-encode everything that's not utf8? not ascii? @esunxray: what would you expect node.Url to return? raw chineese? percent-encoded chineese? most modern browsers seem quite happy with raw unicode...
I'd be happy if we could refine these rules (and make sure we don't have others... do we have several type of aliases?) so the implementation can be fixed.
Thank you. I want to tell you somethings:
@Stephen, it all makes sense, i just think that the logic put in to the DefaultShortString for the URL was very aggressive - even the LegacyShortString was stripping off leading numbers (too aggressive). We should review the code that I've put back in there in v7 to see if it's not aggressive enough as you've got far more experience with this stuff than i do. I do know the logic that is in there now solves this issue though.
OK I have ideas for the new Default...Helper so that it supports raw unicode where needed... and can be configured to constrain to utf8 or ascii when needed... will try to update it over the weekend.
Actually @esunxray what would be great is if you could provide me with a set of test cases: original filename + target filename, original alias + target alias, original content name + target url.
Can we please revert this to the old behavior for now, we don't have time to properly test this before final release in a few days. Same for U4-3407
@Sebastian - i think you are confused. I have fixed this issue in the v7 codebase, the code that handles this now is the same code (slightly improved) that existed in pre 6.1 which has worked for umbraco for ages.
The code in 6.1 is slightly flawed and the previous code in 7.0 is what caused the logdgement of this issue in the first place.
The reason this is still open is because I wanted Stephan to review the code that is now in 7.0 to ensure that it's not missing some logic that was put into 6.1/7.0 that needed to be there.
A'ight, looking good, Chinese characters work again and all of my incredibly weirdly named test files seem to be okay!
I suggest that for no Latin character, you should give user a choice to use id or raw character as page url. The reason is: If I only can use Chinese as url, then, your can't be able to enter this url and goto this page. because you don't know Chinese.
umbraco will load any page given it's ID already. For example if you have a page with ID: 1234 , you can go to this URL and it will work:
I know this very well.but if I want to use id as url, I must modify all my old codes, another reason is: some packages are not use id as url. add config to let umbraco decide how to generate url is a flexible way to solve this problem. id or raw chacater? give user a choice is not a bad idea.
You can override this behavior by using: http://our.umbraco.org/wiki/reference/umbraco-best-practices/umbracourlname
@exsunray: the new IUrlSegmentProvider interface allows you to replace the code that creates a url segment, ie the url part corresponding to a node. The new IUrlProvider interface allows you to replace the code that create the whole url (by default, by assembling nodes segments). Using this two interfaces, it should be quite easy to ensure that Umbraco always returns example.com/1234 for every page, anytime you get the page's url.
Thanks, I'm not a programer, I'm purely a user, so I don't know how to do it. All my suggestion is based a user not a programer.
This issue is now fixed, other improvements are on the way and will be tracked in a seperate issue (U4-3632).
Assignee: Shannon Deminick
Backwards Compatible: True
Affected versions: 7.0.0
Due in version: 7.0.0