We have moved to GitHub Issues
Created by Stephan 28 May 2018, 09:27:20 Updated by Shannon Deminick 01 Jun 2018, 01:59:05
Subtask of: U4-11279
if name has eg 'é' into it it's not removed and shows in url
By default, ToUrlSegment() converts all url segments to UTF8. It has been like that for... a few years I believe. But the requestHandler/urlReplacing section of UmbracoSettings.config has toAscii attribute that can be: 'true', 'false', 'try'.
It is 'false' by default, but when set to 'true', ToUrlSegment() will convert all url segments to ASCII, eg 'é' become 'e'. This is based on a looong conversion table built from one that is used in Lucene, merged with some others.
The table is of course probably totally complete (some have complained that some cryptic turkish chars are missing). That's why we also have a 'try' option: with 'try', if a character cannot be converted, then the whole string is left unchanged.
If people have very exotic needs, they can always replace the IShortStringHelper implementation with their own and totally control what ToUrlSegment() does.
So... I believe this issue is a non-issue, or a documentation issue.
I guess the question is should toAscii be true by default?
let's decide to 'try' by default
PR https://github.com/umbraco/Umbraco-CMS/pull/2655 ready for review
Test: creating a content with a name that contains an accent... should turn the url to ASCII - unless one char cannot become ASCII (eg chineese thing)
confirmed it's working
Backwards Compatible: False
Due in version: 8.0.0
Sprint: Sprint 86
Story Points: 1