Scrapper fails with diacritics and other specials

Issue #315 open
Psico Lock created an issue

Example: Astérix (fails because e) Bishōjo Senshi Sailor Moon R (fails on scrapping too) , others like Alien³, Ranma ½. etc

Fails too on finding short names like F1

Comments (14)

  1. Max Payne

    Interesting, I have many japanese ROMS that I play using fan translate patches. The names have special characters and the scrapper fails indeed to locate them at TheGamesDB even when the game exists in the database.

  2. Jason Carr repo owner

    Thanks, guys. I will research. Unfortunately this may be an issue with TheGamesDB's API, but I will look.

  3. Brad Cheyne

    Need to make it so characters like é can also be found as e, but not the other way around.

    For example, Pokemon: Blue Version isn't found (even with manual searching in the edit screen) but Pokémon: Blue Version is found.

    We'll need an exhaustive list and it wont be complete right away but we need to try and catch as many letters as possible.

    Letters: ë, é, ō, ç, ñ.

    I don't know if there is a way with code to figure out not based per letter but rather per accent to then just automatically figure out every variation, so chances are we may have to manually exchange letter by letter. If there is, look at the right here: https://en.wikipedia.org/wiki/English_terms_with_diacritical_marks

  4. Brett Sanderson

    I know i did this a couple years back for some french reports i was working on:

    public static class StringExtensions
    {
        public static string ReplaceDiacritics(this string source)
        {
            return new string(
                source.Normalize(NormalizationForm.FormD)
                    .Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
                    .ToArray()).Normalize(NormalizationForm.FormC);
        }
    }
    

    var newString = @"âge français".ReplaceDiacritics();

    produces: age francais

    Let me know if it helps

  5. Former user Account Deleted

    As a related issue importing doesn't play nice with some of the no-intro naming conventions dashes "-" seem to trip up the import tool as does "~", which no-intro uses to separate titles if the same rom file was released in different regions with different names.

  6. Michael Mullins

    This issue caused Launchbox to miss several videos in the EmuMovies database. For example, in the LB database several Pokemon DS games are listed (correctly) with the diacritic on the ‘e', it’s not there in EmuMovies. With the Okami games, the database has a diacritic ‘o', but the EmuMovies site lists “Ookami”. This appears to be a similar workaround to the way the German ‘Esset’ (lit. ‘ß’) is represented in many ways as ‘ss’ which is accepted in the language.

    I had to manually download missing videos and edit the filenames so that LB would detect them.

    I’m guessing it’s not too common, but the LB heuristic could use a touch-up to account for these cases.

  7. Log in to comment