U4-11375 - Url Generation keeps accented chars

Created by Stephan 28 May 2018, 09:27:20 Updated by Shannon Deminick 01 Jun 2018, 01:59:05

Subtask of: U4-11279

if name has eg 'é' into it it's not removed and shows in url


Stephan 28 May 2018, 17:26:33

By default, ToUrlSegment() converts all url segments to UTF8. It has been like that for... a few years I believe. But the requestHandler/urlReplacing section of UmbracoSettings.config has toAscii attribute that can be: 'true', 'false', 'try'.

It is 'false' by default, but when set to 'true', ToUrlSegment() will convert all url segments to ASCII, eg 'é' become 'e'. This is based on a looong conversion table built from one that is used in Lucene, merged with some others.

The table is of course probably totally complete (some have complained that some cryptic turkish chars are missing). That's why we also have a 'try' option: with 'try', if a character cannot be converted, then the whole string is left unchanged.

If people have very exotic needs, they can always replace the IShortStringHelper implementation with their own and totally control what ToUrlSegment() does.

So... I believe this issue is a non-issue, or a documentation issue.

Shannon Deminick 30 May 2018, 06:35:11

I guess the question is should toAscii be true by default?

Stephan 30 May 2018, 07:28:38

let's decide to 'try' by default

Stephan 31 May 2018, 06:23:31

PR https://github.com/umbraco/Umbraco-CMS/pull/2655 ready for review

Test: creating a content with a name that contains an accent... should turn the url to ASCII - unless one char cannot become ASCII (eg chineese thing)

Shannon Deminick 01 Jun 2018, 01:59:01

confirmed it's working

Priority: Normal

Type: Task

State: Fixed


Difficulty: Normal


Backwards Compatible: False

Fix Submitted:

Affected versions:

Due in version: 8.0.0

Sprint: Sprint 86

Story Points: 1