U4-3732 - Still can't auto generate url when content node name is Chinese.

Created by esunxray 27 Nov 2013, 09:22:53 Updated by Mikkel Holck Madsen 14 Jun 2017, 06:36:07

Is duplicated by: U4-3757

Relates to: U4-10016

give your Chinese to test: 中文测试

Comments

Sebastiaan Janssen 27 Nov 2013, 09:34:05

The node name is accepted, but the url gets these characters stripped off.


esunxray 27 Nov 2013, 11:18:07

For now,we can't use v7 because it can't auto generate url, this is a bug. Please support auto generate Chinese url. Most users who are not use Latin characters need it.


Stephan 27 Nov 2013, 12:22:51

I know ;-( and should fix it later this week. I know how to do it, just lack time right now.


esunxray 10 Dec 2013, 08:16:01

Is this bug fixed?


Stephan 11 Dec 2013, 08:06:59

Currently working on that one.

Cause: for urls, filenames and aliases, we convert everything to UTF8 by removing unwanted Unicode chars.

Fix: changing the default config so that by default we keep the Unicode chars. Though it will be possible to force Umbraco to remove them if needed.


Stephan 13 Dec 2013, 10:20:17

Please review:

Assuming you enter the following name: "0123 中文测试 中文测试 léger ZÔRG (2) a?? *x" Then the url would be: "0123-中文测试-中文测试-léger-zôrg-2-a-x"

Ie lower-cased, utf-8, using "-" as a separator, removing everything that's not a letter or a digit.

For those ppl that would want to enforce ASCII, it is still possible (via config).

Would that be OK for you?


Stephan 14 Dec 2013, 19:46:45

Pushed 5aec753 in 6.2.0 and merged into 7.0.2, ready for testing.


Eran Meir 18 Dec 2013, 15:20:16

i'm getting the same problem with Hebrew names


Stephan 18 Dec 2013, 15:23:44

@Eran - making sense. The fix should apply to you too, but if you want to be sure, just post a text sample here and I'll check what URL it creates.


Eran Meir 18 Dec 2013, 15:25:30

אודות האתר


Stephan 18 Dec 2013, 16:44:00

Assert.AreEqual("אודות-האתר", helper.CleanStringForUrlSegment("אודות האתר"));

Passed.


Stephan 10 Jan 2014, 07:26:46

Think we can consider than one fixed.


Dima Stefantsov 14 Jan 2014, 18:29:54

Not sure if you have already fixed it, but if you not, you might consider using solution I used for cyrillic (russian) characters. It should be able to transliterate any language. Using https://bitbucket.org/DimaStefantsov/unidecodesharpfork nuget ''Install-Package UnidecodeSharpFork'', overriding umbraco from my dll:

public class DApplicationEventHandler : ApplicationEventHandler { protected override void ApplicationStarting(UmbracoApplicationBase umbracoApplication, ApplicationContext applicationContext) { var shortStringHelper = new DShortStringHelper().WithConfig(allowLeadingDigits: true); ShortStringHelperResolver.Current.SetHelper(shortStringHelper);

        base.ApplicationStarting(umbracoApplication, applicationContext);
    }
}

public class DShortStringHelper : DefaultShortStringHelper { public override string CleanStringForUrlSegment(string text) { return text.ToUrlTitle(); }

    public override string CleanStringForUrlSegment(string text, CultureInfo culture)
    {
        return text.ToUrlTitle();
    }
}

public static class StringExtensions { private const int seoUrlMaxLength = 80;

    /// <summary>
    /// Implementation from stackoverflow, augmented with transliteration
    /// and corrected.
    /// <para />
    /// http://stackoverflow.com/questions/25259/how-does-stackoverflow-generate-its-seo-friendly-urls/25486#25486
    /// <para />
    /// Produces optional, URL-friendly version of a title, "like-this-one". 
    /// Hand-tuned for speed, reflects performance refactoring contributed
    /// by John Gietzen (user otac0n).
    /// </summary>
    public static string ToUrlTitle(this string title)
    {
        string transliterated = title.Unidecode();

        bool prevdash = false;
        int transliteratedLength = transliterated.Length;
        var sb = new StringBuilder(transliteratedLength);

        for (int i = 0; i < transliteratedLength && sb.Length < seoUrlMaxLength; i++)
        {
            char c = transliterated[i];
            if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'))
            {
                sb.Append(c);
                prevdash = false;
            }
            else if (c >= 'A' && c <= 'Z')
            {
                // Tricky way to convert to lowercase.
                sb.Append((char)(c | 32));
                prevdash = false;
            }
            else if (c == ' ' || c == ',' || c == '.' || c == '/' || c == '\\' || c == '-' || c == '_' || c == '=')
            {
                if (!prevdash && sb.Length > 0)
                {
                    sb.Append('-');
                    prevdash = true;
                }
            }
        }


        if (prevdash)
        {
            return sb.ToString(0, sb.Length - 1);
        }
        else
        {
            return sb.ToString();
        }
    }
}

This automatically transliterates initial characters, and then makes nice url out of it.


Stephan 14 Jan 2014, 18:45:20

@Dima: I ''think'' I have fixed it but the code you post is interesting, will have a look.


Shannon Deminick 15 Jan 2014, 06:51:10

@Stephen Can we close this? I assume this fix is in 6.2 as well?


Stephan 15 Jan 2014, 08:42:12

Yes, can close, fixed in 6.2 and 7.


Priority: Major

Type: Bug

State: Fixed

Assignee:

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions: 7.0.0, 6.1.6

Due in version: 6.2.0, 7.0.2

Sprint:

Story Points:

Cycle: