1. Albert-Jan Roskam
  2. savReaderWriter
  3. Issues
Issue #20 new

Interface with just strings (no bytes)

created an issue

Hello Albert-Jan

I've been very pleased to see SRW is working with Python 3 now. But do you really need these bytes-type strings? As I understand it, one the major advantages of Python 3 is finally to get rid of this unicode vs. byte-strings nightmare.

What about making SRW accepting both? For example in the alignments.setter in header.py we could simply replace

 alignments = {b"left": 0, b"right": 1, b"center": 2}


 alignments = {b"left": 0, b"right": 1, b"center": 2, 
                "left": 0,  "right": 1,  "center": 2}

I would definitely prefer to do some work on SRW instead of uglifying my code with (from my point of view completely superfluous) b"..."s and bytes(..., "ASCII")s.

Am I missing something? Isn"t the "natural" dict-key type str (e.g. like in **kwargs)? And the "natural" string type in general also?

Regards, T.

Comments (11)

  1. Albert-Jan Roskam repo owner


    Yeah, I am also rather partial about the whole 'bytes' thing. At some point, I just wanted to reach consistency between Python 2 (where python2-strings are used by default) and Python 3 (where, therefore, bytes seemed most logical). For metadata, it is indeed very well possible to accept both str and bytes. In fact, your 'alignments' solution is both simple and very elegant.

    For data, it is less efficient to use unicode/str, because it needs to be converted into bytes anyway. Then again, users may need to do this anyway, inside or outside the program. You are very welcome to try and make SRW less picky about strings/bytes (in the Python 3 sense of the word). I will have a look how I can give you access to the repo.

    A slightly related topic is the lack of ctypes.argtypes in most of the metadata setters. Currently if you mistakenly give an argument of the wrong type (a str instead of bytes, perhaps), you may get an ugly segfault. Beter to specify the argtypes in the code. It is on my to-do list, but I haven't gotten to do it.

    regards, Albert-Jan

  2. Albert-Jan Roskam repo owner

    PS: It'd be most desirable/flexible if the functions take both bytes and unicode strings (polymorphic functions). If only unicode strings were acceptable, Python 2.7 users would be forced to use the u' prefix everywhere.

  3. th3l0nius reporter

    Thank you for your confidence - I'm not sure if I really deserve it. (I have access to your repo now.)

    Until now I only contributed to other peoples code by pull requests. Should I create a "thelonius" branch, push it up and then sync it with my "master" while I locallly rename the original master to a different name?

  4. Albert-Jan Roskam repo owner

    Hi, well, if you prefer pull requests, that will also do. Otherwise, a branch will also be a good solution. I wrote a fair amount of unittests. I use a commit hook to run these with each commit. It is my intention to make the tests pass all the time, at least for the master branch. Btw, what OS are you on? Yesterday I found out that some of the tests fail under Windows (related to line separators (\n vs. \r\n and a locale difference, en_US.UTF8 vs nl_NL.cp1252)

  5. th3l0nius reporter

    I'm normally working and developing on Linux. I have only one application that uses SRW and for this the final target is a windows application (based on PyQt4, built with cx_freeze...) which is normally run under a german Windows 7.

    I also had an issue with locales under Windows in an older version (maybe this has been already fixed in the meantime). To solve it I have:

    ioLocale= 'english' if os.name == 'nt' else None

    in the parameters to savReaderWriter.SavWriter(). SRW failed if it tried to use the german default locale.

  6. th3l0nius reporter

    I just saw that you silently dropped my contribution from last year. There is no sense for me in contributing if I anyway had to add the changes manually. Seems to make more sense to write a wrapper.

  7. th3l0nius reporter

    Sorry, I've been busy. And sorry, I thought you dropped my cx_freeze adaptions because they are not in the 3.3.0 release.

    I've now made the proposed changes to the alignments setter and the measureLevels setter and when now writing the respective unit tests I have found a bug in your code which I cannot solve without your help (I could but it would take me certainly ten times more time than you will need to figure it out). In header.py on line 561 you have the following code:

    levels = {b"unknown": 0, b"nominal": 1, b"ordinal": 2, b"scale": 3,  
              b"ratio": 3, b"flag": 4, b"typeless": 5}

    And the reverse mapping is in the getter on line 541:

    levels = {0: b"unknown", 1: b"nominal", 2: b"ordinal", 3: b"scale",
              3: b"ratio", 4: b"flag", 5: b"typeless"}

    Because 3 is the equivalent to b"scale" and b"ratio" as well in both mappings my unit test now results in:

    AssertionError: Lists differ: [b'nominal', b'ratio', b'ratio... != [b'nominal', b'scale', b'scale...
    First differing element 1:
    - [b'nominal', b'ratio', b'ratio', b'ordinal']
    ?                ^ ^^^     ^ ^^^
    + [b'nominal', b'scale', b'scale', b'ordinal']
    ?                ^^ ^^     ^^ ^^

    I'm already done with everything and just need to know: may I solve it this way:

    levels = {0: b"unknown", 1: b"nominal", 2: b"ordinal", 3: b"scale",
              4: b"ratio", 5: b"flag", 6: b"typeless"}

    Or how else?

    At least this made me experience that unit tests can really be helpful. I actually didn't see the error when editing the code...

  8. th3l0nius reporter

    Ok. I found in spssdio.h that ratio is just a synonym for scale. So the bytes to ints mapping makes sense. The ints to bytes doesn't so I hope it is agreeable to you when I remove the 2nd 3:

    levels = {0: b"unknown", 1: b"nominal", 2: b"ordinal", 3: b"scale",
              4: b"flag", 5: b"typeless"}
  9. Log in to comment