Problem with locales on Linux + py3k

Issue #24 resolved
Ygor Lemos created an issue

It seems that the _init method that sets locale to "" is causing problems with Python 3 on Linux.

  File "./c/exports.py", line 639, in sav
    with savReaderWriter.SavWriter(dstfile, varNames, varTypes, varLabels=varLabels, valueLabels=valLabels, formats=varFormats, multRespDefs=multRespDefs, ioUtf8=True, ioLocale='pt_BR.utf8') as writer:
  File "/usr/local/lib/python3.4/dist-packages/savReaderWriter/savWriter.py", line 110, in __init__
    super(Header, self).__init__(savFileName, ioUtf8, ioLocale)
  File "/usr/local/lib/python3.4/dist-packages/savReaderWriter/generic.py", line 28, in __init__
    locale.setlocale(locale.LC_ALL, "")
  File "/usr/lib/python3.4/locale.py", line 592, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

by setting this line manually to pt_BR.utf8 (which is the proper locale name as listed on Ubuntu 14.10) it went fine.

I think that this line should be changed to:

class Generic(object):
    """
    Class for methods and data used in reading as well as writing
    IBM SPSS Statistics data files
    """

    def __init__(self, savFileName, ioUtf8=False, ioLocale="en_US.utf8"):
        """Constructor. Note that interface locale and encoding can only
        be set once"""
        locale.setlocale(locale.LC_ALL, ioLocale)

and ioLocale should default to something like en_US.utf8 or if it stays as None, it should be checked before calling setlocale.

Also, worth noticing that locale names differ between OS X and Linux, for instance, calling Portuguese Brazilian UTF-8 in Ubuntu is pt_BR.utf8 and on Mac: pt_BR.UTF-8

Comments (5)

  1. Albert-Jan Roskam repo owner

    Hi Ygor,

    Thanks for letting me know. locale.setlocale(locale.LC_ALL, '') is the recommended way to get the default locale setting [https://docs.python.org/3.2/library/locale.html]. What do you get when you run the locale command in your terminal screen? I checked the locale with a number of OSs that I have handy. I checked Python 2 and 3 on Windows 7 64, Linux Debian 7 64, Linux Ubuntu 12 32, Linux Ubuntu 10 64, Mac OS ML 2. On the latter two, the locale is does not seem to be set properly:

    # Linux Ubuntu 10 64 (Python 2 and 3)
    >>> locale.setlocale(locale.LC_ALL, "")
    'en_US'
    # Mac OSX ML2 (Python 2 and 3)
    >>> locale.setlocale(locale.LC_ALL, "")
    'C/UTF-8/C/C/C/C'
    

    It can be fixed by running something like export LC_ALL=en_US.UTF-8 or export LANG=en_US.UTF-8 before you start Python (and perhaps put it in ~/.bashrc). As you already mention, locale specifications vary by OS, so I would rather not hard-code any locale name. I will mention this in the documentation, though. It would be still nicer if there were some platform-independent way of fixing this.

    Best wishes, Albert-Jan

  2. Ygor Lemos reporter

    Hi AJ,

    On py3k under Linux it says:

    ubuntu@adm-1:~$ python3
    Python 3.4.0 (default, Apr 11 2014, 13:05:11)
    [GCC 4.8.2] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import locale
    >>> locale.setlocale(locale.LC_ALL, "")
    'en_US.UTF-8'
    

    Linux Version:

    ubuntu@adm-1:~$ cat /etc/lsb-release
    DISTRIB_ID=Ubuntu
    DISTRIB_RELEASE=14.04
    DISTRIB_CODENAME=trusty
    DISTRIB_DESCRIPTION="Ubuntu 14.04.1 LTS"
    

    It works properly when called on Python CLI, but for some reason, not when called within savReaderWriter generic.py

    I have also confirmed that it is using the same python 3 binary as the sav scripts.

    Heres my Linux locale outputs:

    ubuntu@adm-1:~$ locale
    LANG=en_US.UTF-8
    LANGUAGE=en_US.UTF-8
    LC_CTYPE="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_PAPER="en_US.UTF-8"
    LC_NAME="en_US.UTF-8"
    LC_ADDRESS="en_US.UTF-8"
    LC_TELEPHONE="en_US.UTF-8"
    LC_MEASUREMENT="en_US.UTF-8"
    LC_IDENTIFICATION="en_US.UTF-8"
    LC_ALL=en_US.UTF-8
    
    ubuntu@adm-1:~$ locale -a
    C
    C.UTF-8
    en_US.utf8
    POSIX
    pt_BR
    pt_BR.iso88591
    pt_BR.utf8
    

    anyway, setting the ioLocale variable on set locale fixed the script for me... I really don't know why (maybe because I'm calling it from inside CherryPy and some thread sets it previously... ?)

  3. Ygor Lemos reporter

    PS: I had the same problem on another OS X Machine...

    just changed to: locale.setlocale(locale.LC_ALL, ioLocale) and now it works fine...

  4. Albert-Jan Roskam repo owner

    This is most probably also fixed in commit b225071, see also issue #26 generic.Generic.__init__ now starts with locale.setlocale(locale.LC_ALL, "" if ioLocale is None else ioLocale). Also, the ioLocale is now dirived from locale.setlocale(locale.LC_ALL), instead of ".".join(locale.getlocale()) (http://bugs.python.org/issue23425)

  5. Log in to comment