Issues

Issue #8 resolved

Char encoding on Windows 7

Batte HUCHAI avatarBatte HUCHAI created an issue

Hello,

I have a problem of char encoding on Windows 7. Indeed, when I create a file with Windows Explorer which contains a special char (no ASCII or "basic" char ...) like "♫" (for example) in its name, the EFSW FileWatcher gives to my application the same filename but with a "?" char instead of "♫" :

For example, when I create the file "test♫.xlsx", I get :

DIR (C:\Users\buchet\Documents\Watched\) FILE (test?.xlsx) has event Added

It can be stupid to have a filename with a "♫" but it's just an example. It's the same thing with all special chars which are accepted by Windows Explorer but are "modified" by EFSW naming ...

For your information, it seems working in Mac OS 1.7.5.

Any idea ?

Comments (28)

  1. Martín Lucas Golini

    Hi Batte, yes, the problem is that it's converting everything to ANSI in Windows, which it is not the best solution. I'll change this to convert everything to UTF-8, but you'll not see the ♫ unless you change the command line default code page since it is using ANSI by default. You can change the codepage to UTF-8 with "chcp 65001", but you also will need to change the default font, for someone that support this characters ( like Lucida Console ). You can also compile with UNICODE support, if you're using Visual Studio, go to the project properties -> Configuration Properties -> General -> Character Set -> Use unicode character set. But, since i'll compile this change to work always with UTF-8, it'll be exactly the same. Thanks for reporting it!

  2. Batte HUCHAI

    Hello,

    I understand but the default encoding of QtCreator's command line page (IDE I use) does not seem to be ANSI, it's the reason why I'm able to see special chars like ♫.

    About your new encoding choice (UTF-8), are you sure that's the correct encoding of Windows ? I thouth that's UTF-16 ... Maybe you're right, I'm actually not sure !

    Thanks,

    B.

  3. Martín Lucas Golini

    Yes, Windows encourage the use of UTF-16 as the default encoding, but it's not a requirement, since it supports any Unicode method. I can't use UTF-16 because i'm using std::string to keep it simple, and the other OSes use UTF-8, so the correct approach is to always use the same encoding. There's nothing impeding yo to convert the strings to any other encoding, and you can use the String class used internally by efsw ( efsw::String::fromUtf8( filename ).toWideString() ). I also use QtCreator, the application output in other OSes is set to UTF-8, and looks fine, but i don't know what's using in Windows, i tried printing UTF-16 and i doesn't seems to work neither. I don't have time to continue testing, but it's not something that i care much about, it's not a problem of efsw. Sadly i can satisfy every developer, but if you want to suggest other solution, i'm listening!

    Edit: Hint: Read this https://bugreports.qt-project.org/browse/QTCREATORBUG-316 Try with calling this: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686036(v=vs.85).aspx with the correct codepage ( 65001 ).

    Regards, Martín

  4. Batte HUCHAI

    Hello,

    All UTF-8 chars seem to be correctly sent by EFSW.

    But I have some problems with others files (files with filenames originating from Mac OS X --> created directly on a Mac and correctly printed by Finder and Explorer Windows).

    One of them : https://mega.co.nz/#!FR1UQBDa!TzRLQ210dwpE2KQGUNz__1fWcOsvo19VFLed-2YujLE

    On Windows, if you edit its filename, after you copy the filename and you put it on a text plain editor like Notepad, you'll see some specific chars ! And these chars are not correctly sent by EFSW ... Do you know why ?

    I hope I'm understandable ...

    Do not hesitate to tell me if I'm not.

    B.

  5. Martín Lucas Golini

    Sorry, but i'm not sure what are you trying to say. May be you can explain me step by step how to reproduce the problem and try to be a little more clear explaining what's the problem. Because what i understand it doesn't sound like an efsw bug. Thanks, Martín

  6. Batte HUCHAI

    OK, sorry to be not understandable.

    It's simple. As for "test♫.xlsx", I tried to put the file below into the watched folder : https://mega.co.nz/#!FR1UQBDa!TzRLQ210dwpE2KQGUNz__1fWcOsvo19VFLed-2YujLE

    Result ? As for "test?.xlsx" resulting from EFSW, I received a wrong filename for this new file.

    Do you know why ?

    For information : after some investigations, I understood that the file was coming from Mac OS X and had some strange chars on its filename (you can see them by copying the filename into Notepad (or other text plain editors ...).

    B.

  7. Martín Lucas Golini

    This is what i explained in the previous messages, this is not an efsw problem, you need to use a command line that supports UTF-8 character encoding, with a font that also supports it. Or you can change the output to an encoding that the command line interprets correctly. For example, i used cygwin console to show you that this is already working ( since it use UTF-8 by default ): example working. There are some other options, just search something like "windows unicode command line support" in Google. Regards, Martín

  8. Batte HUCHAI

    Hello,

    I've also a char encoding problem on Mac OS. Indeed, if I put a folder "lolélalé" into a watched folder, I'll get "lole'lale'" ...

    For help, this is a part of my code :

    case efsw::Actions::Add:
                    std::cout << "DIR (" << dir << ") FILE (" << filename << ") has event Added" << std::endl;
                    _filewatchersignals->emit_addSignal(QString::fromUtf8(dir.c_str()), QString::fromUtf8(filename.c_str()));
                    break;
                case efsw::Actions::Delete:
                    std::cout << "DIR (" << dir << ") FILE (" << filename << ") has event Delete" << std::endl;
                    _filewatchersignals->emit_deleteSignal(QString::fromUtf8(dir.c_str()), QString::fromUtf8(filename.c_str()));
                    break;
                case efsw::Actions::Modified:
                    std::cout << "DIR (" << dir << ") FILE (" << filename << ") has event Modified" << std::endl;
                    _filewatchersignals->emit_modifiedSignal(QString::fromUtf8(dir.c_str()), QString::fromUtf8(filename.c_str()));
                    break;
                case efsw::Actions::Moved:
                    std::cout << "DIR (" << dir << ") FILE (" << filename << ") has event Moved from (" << oldFilename << ")" << std::endl;
                    _filewatchersignals->emit_movedSignal(QString::fromUtf8(dir.c_str()), QString::fromUtf8(filename.c_str()), QString::fromUtf8(oldFilename.c_str()));
                    break;
                default:
                    std::cout << "Should never happen!" << std::endl;
    

    You can see that I correctly take the outside with UTF8 encoding ...

    Thanks for your help.

    B.

  9. Martín Lucas Golini

    You have the same problem than in Windows, your locale it's not correctly set in the Terminal. I've tested with the default terminal locale ( en_US.UTF-8 ) and everything works just fine. Also works in the application output from QtCreator. Your code looks fine, so i don't thing there's nothing wrong there. OS X and UTF-8 example

  10. Batte HUCHAI

    Hello,

    I understand you're saying but I think that my problem is another thing.

    Indeed, my Qt project is a client which talks with a web service. On Windows, when EFSW gives a file "tété.txt" to the Qt client (which sends it), the web service receives correctly the file (with the same file name "tété.txt"). On Mac OS, when EFSW gives the same file name, the web service receives a wrong filename.

    I have looked the decimal value of each char of the file name sent by EFSW and it doesn't seem to be the UTF-8 decimal value of "t" and "é" chars.

    Do you know what I mean ?

  11. Martín Lucas Golini

    Yes, it's clear what you are describing. I'll compare the string hash produced on OS X and Windows, if something is different, means that efsw is doing something wrong, otherwise it should be something of your application.

    Let me see and i'll tell you.

    Thanks

  12. Martín Lucas Golini

    Ok, i made the tests and it looks everything fine. The string hashes are the same, the binary data is exactly the same. I still think that this is not an efsw issue, if you can reproduce it with a simple example that i can test here, i'll take a look at it. But, please nothing with Qt or client/server, since it has nothing to do with the library.

    OS X hashes Windows 7 hashes

    Regards

  13. Batte HUCHAI

    Hello,

    I'm sorry but I have still some problems about EFSW encoding.

    I have print in hexadecimal the string that EFSW gives after an event occured. The result is :

    DIR (/Users/bb/MCF/bb@gmail.com/Privévè/Coffre-fort/aabbcc/) FILE (pépè.png(0x7065ffffffccffffff817065ffffffccffffff802e706e67)) has event Added
    

    As you can see, all caracters are encoded in Unicode UTF-8 ...

    • "p" : 0x70
    • "." : 0x2e
    • "n" : 0x6e
    • "g" : 0x67

    ... EXCEPT "é" and "è" :

    • "é" : 0x65ffffffccffffff81 (which seems to be the UTF-8 code of "e" (0x65) and something else ... (0xffffffccffffff81).
    • "è" : 0x65ffffffccffffff80 (which seems to be the UTF-8 code of "e" (0x65) and something else ... (0xffffffccffffff80).

    But normally, UTF-8 code of "é" is : 0xc3a9 and UTF-8 of "è" is : 0xc3a8

    This difference is the reason why my C++ program (in Qt) doesn't correctly understand the word "pépè.png" ...

    Have you the same observation ? Have you got an explaination ?

    Thanks for your help.

    B.

  14. Martín Lucas Golini

    Sorry, but i tested again and i'm getting the correct UTF-8 codes ( i tested with mingw and vs too ). I'll need a minimal test where i can reproduce your problem. And, if it's possible without Qt, since i think you are having problems there, have you tested this with the efsw-test that comes with the project?

  15. Batte HUCHAI

    I'm gonna test with efsw-test.

    Can you just give me the hexadecimal output of a file "pépè.png" detected by EFSW ?

    Something like that :

    void print_hex(const char *s)
    {
        while(*s)
        printf("%02x", (unsigned int) *s++);
    }
    
    [...]
    
    switch (action)
    {
    case efsw::Actions::Add:
        std::cout << "DIR (" << dir << ") FILE (" << filename << "(";
        print_hex(filename.c_str());
        break;
    [...]
    
  16. Martín Lucas Golini

    No, that's not the encoding, i printed the data as you asked me, converting every char to unsigned int ( printf("%02x", (unsigned int) *s++); ), that's why you see those extra ffffff. è first byte is: c3 and the second byte is a8.

  17. Martín Lucas Golini

    I think that your problem is that you're not converting correctly the UTF-8 std::string to QString, you need to create the string using QString::fromUtf8, and i think you are using QString( str.c_str() ).

  18. Martín Lucas Golini

    Oh no, now i see your previous post, you used QString::fromString. So i don't know, still if you want, make a minimal example of this failing, and i'll debug it ( use Qt4 if you want, because i think there's the problem ).

  19. Batte HUCHAI

    Hello,

    It's really really strange.

    As you advised me, I have changed the "test" sources of EFSW project :

    src/test/efsw-test.cpp :

    [...]
    void print_hex(const char *s)
    {
          while (*s)
          printf("%02x", (unsigned int) *s++);
    }
    
    void handleFileAction( efsw::WatchID watchid, const std::string& dir, const std::string& filename, efsw::Action action, std::string oldFilename = ""  )
          {
          std::cout << "DIR (" << dir + ") FILE (" + ( oldFilename.empty() ? "" : "from file " + oldFilename + " to " ) + filename + " (";
          print_hex(filename.c_str());
          std::cout << ") " << ") has event " << getActionName( action ) << std::endl;
          }
    [...]
    

    As you can see, I've just added the "print_hex()" function. There is no worries about Qt ; indeed, I use your makefile to compile test program.

    After compiling and executing, I get :

    iMac-de-B:bin bb$ ./efsw-test-release 
    Press ^C to exit demo
    CurPath: /Users/bb/Documents/efsw_test/efsw_project/bin/
    Added WatchID: 1
    Added WatchID: 2
    DIR (/Users/buchet_b/Documents/efsw_test/efsw_project/bin/test/) FILE (from file pépé copie to pépé (7065ffffffccffffff817065ffffffccffffff81) ) has event Moved
    

    So exactly the same ...

    I really need your help. You'll find the EFSW project I use, here :

    https://mega.co.nz/#!IQ0EDZZB!BAR8vwK8cnDWo05hpIJ_BhOkXgg0CaFNr0zsEPDMWYU

    With these sources, what result do you have ?

    Do you have any other idea ?

    Thanks a lot by advance,

    B.

  20. Martín Lucas Golini

    Wait... your project file is from OS X, and i was testing on windows... so... your problems now are on OS X? Give me some minutes and i'll test in OS X ( but i tested previously in this same thread and was working fine ).

  21. Martín Lucas Golini

    I'm getting the correct code:

    DIR (/Users/charly/Downloads/efsw_project/bin/test/) FILE (from file pépé to pèpè (70ffffffc3ffffffa870ffffffc3ffffffa8) ) has event Moved
    

    What i'm thinking is that your OS X file system is using a different encoding for file names. I've read some articles about that, but i'm not sure how to handle it right now. What i need you to do is: run python from the terminal, insert: import sys print os. getfilesystemencoding()

    (if you're on Mavericks and python crashes running this, fix it with the instructions from here: http://stackoverflow.com/questions/19569143/python3-segmentation-fault-on-osx-mavericks ). And tell me what you get, it must be something different from utf-8.

    It must be something similar to this problems: https://bugzilla.mozilla.org/show_bug.cgi?id=703161 http://stackoverflow.com/questions/9757843/unicode-encoding-for-filesystem-in-mac-os-x-not-correct-in-python http://apple.stackexchange.com/questions/10476/how-to-enter-special-characters-so-that-bash-terminal-understands-them

    I'm a little bit busy to look for a fix right now, i'll need you to help me with this, or just wait a little bit for me to get some time to read about this. I don't event own a mac, so it's not that easy for me to see this.

    Regards, Martín

  22. Martín Lucas Golini

    I think the problems comes from the file system encoding, please make a test converting the filename string from NFD to NFC, here's a function that i got from stackoverflow:

    std::string precomposeFilename(const std::string& name)
    {
       CFStringRef cfStringRef = CFStringCreateWithCString(kCFAllocatorDefault, name.c_str(), kCFStringEncodingUTF8);
       CFMutableStringRef cfMutable = CFStringCreateMutableCopy(NULL, 0, cfStringRef);
    
       CFStringNormalize(cfMutable,kCFStringNormalizationFormC);
    
       char c_str[255 + 1];
       CFStringGetCString(cfMutable, c_str, sizeof(c_str)-1, kCFStringEncodingUTF8);
    
       CFRelease(cfStringRef);
       CFRelease(cfMutable);
    
       return std::string(c_str);
    }
    

    It seems to be a very common problem, but i'm not sure if we are dealing with this or is another thing.

    Regards, Martín

  23. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.