Problem with multRespDefs on py3k

Issue #23 open
Ygor Lemos created an issue

When I try to load the multRespDefs dict on savWriter it always throws an exception as follows.

Looking at the module source code, I saw that before doing anything the _setMultRespDefs method encodes everything and them tries to access the variables.

Problem is it looks for the rest["setType"] key which in fact is rest[b"setType"]

I've tried to pass the multRespDefs dict with all elements both encoded to bytes and as outfit but the error is always the same:

  File "/Users/yg/Developer/adm/c/exports.py", line 642, in sav
    with savReaderWriter.SavWriter(dstfile, varNames, varTypes, varLabels=varLabels, valueLabels=valLabels, formats=varFormats, multRespDefs=multRespDefs, ioUtf8=True, ioLocale="pt_BR.UTF-8") as writer:
  File "/usr/local/lib/python3.4/site-packages/savReaderWriter/savWriter.py", line 144, in __init__
    self.multRespDefs = multRespDefs
  File "/usr/local/lib/python3.4/site-packages/savReaderWriter/header.py", line 1017, in multRespDefs
    multRespDefs = self._setMultRespDefs(multRespDefs)
  File "/usr/local/lib/python3.4/site-packages/savReaderWriter/header.py", line 905, in _setMultRespDefs
    if rest["setType"] not in (b"C", b"D"):
KeyError: 'setType'

Code:

multRespDefs.update({
                        mrn: {
                            b"setType": b"D",
                            b"label": quest.get("q").encode(),
                            b"varNames": msvars, 
                            b"countedValue": 1
                        }
                    })

with savReaderWriter.SavWriter(dstfile, varNames, varTypes, varLabels=varLabels, valueLabels=valLabels, formats=varFormats, multRespDefs=multRespDefs, ioUtf8=True, ioLocale="pt_BR.UTF-8") as writer:

any hints?

Comments (9)

  1. Albert-Jan Roskam repo owner

    Hi again,

    Thanks for taking the time to report this. I noticed that I do not have a proper unittest for setting multiple response definitions (shame on me). It is not unlikely that I forgot to add a b' prefix somewhere when I made the code ready for use with Python 3. Quite likely the fix is just rest[b"setType"]. Out of curiosity: (1) did you already try this in codepage mode (ioUtf8=False) (2) Does the code work in Python 2.7? Multiple response definitions are not a commonly used feature, and Python 3 is (still) much less commonly used than Python 2, so this would explain why this bug has been lurking all the while. I will work on this later!

    Best wishes, Albert-Jan

  2. Ygor Lemos reporter

    Hi AJ,

    Yes, just adding the byte cast to setType wasn't enough as there were other keys inside rest{} which are also bytes... also it gave many errors on concatenating strings with bytes...

    I made a quick and dirty fix on the _setMultRespDefs method that worked fine on py3k and generated the save file with all the multiple response sets properly created.

    worth noticing that I only "fixed" it for dichotomies and multiple categories is still broken (as it still relies on the tail variable).

        def _setMultRespDefs(self, multRespDefs):
            """Set 'normal' multiple response defintions.
            This is a helper function for the multRespDefs setter function. 
            It translates the multiple response definition, specified as a
            dictionary, into a string that the IO module can use"""
            mrespDefs = []
            for setName, rest in multRespDefs.items():
                rest = self.encode(rest)
                if rest[b"setType"] not in (b"C", b"D"):
                    continue
                rest["setName"] = self.encode(setName)
                mrespDef = "$%s=%s" % (rest["setName"], rest[b"setType"].decode())
                mrespDef = mrespDef.encode()
                lblLen = len(rest[b"label"])
                rest[b"lblLen"] = lblLen
                rest[b"varNames"] = b" ".join(rest[b"varNames"])
                tail = b" %(varNames)s" if lblLen == 0 else b"%(label)s %(varNames)s"
                if rest[b"setType"] == b"C":  # multiple category sets
                    template = b" %%(lblLen)s %s " % tail
                else:                       # multiple dichotomy sets
                    # line below added/modified after Issue #4:
                    # Assertion during creating of multRespDefs
                    rest[b"valueLen"] = len(str(rest[b"countedValue"]))
                    template= "%s %s %s %s %s" % (str(rest[b"valueLen"]), str(rest[b"countedValue"]), rest[b"lblLen"], rest[b"label"].decode(), rest[b"varNames"].decode())
                mrespDef += template.encode()
                mrespDefs.append(mrespDef.rstrip())
            mrespDefs = b"\n".join(mrespDefs)
            return mrespDefs
    

    I think that this method can be refactored to use just utf8 strings and encode just at the very end, thoughts?

    Also, I have noticed that it creates the multiple sets only on Data > Define Multiple Response Sets but not on Analyse > Multiple Response > Define Variable Sets...

    Is there a way to recreate the same variables there too ?

    Thanks.

  3. Ygor Lemos reporter

    Yeah, just read a bit and it seems that the one on Analyse is a older feature and will be deprecated in the ?future? of spss...

    anyway, can you consider the _setMultRespDefs method above to be included on the package? It really solves the problem (for now) :)

  4. Ygor Lemos reporter

    Hi,

    I'm still having this issue on Linux on savReaderWriter 3.3.0 although the same code with the same data runs normally on Mac OS X Yosemite.

    Mac is on Python 3.4.3 - Linux is using latest stable ubuntu server LTS python 3 (3.4.0)

    Problem:

    Traceback (most recent call last):
      File "/usr/local/lib/python3.4/dist-packages/cherrypy/_cprequest.py", line 670, in respond
        response.body = self.handler()
      File "/usr/local/lib/python3.4/dist-packages/cherrypy/lib/encoding.py", line 217, in __call__
        self.body = self.oldhandler(*args, **kwargs)
      File "/usr/local/lib/python3.4/dist-packages/cherrypy/_cpdispatch.py", line 61, in __call__
        return self.callable(*self.args, **self.kwargs)
      File "./c/kit.py", line 287, in SDPYD
        return func(request, *args, **kwargs)
      File "./c/exports.py", line 640, in sav
        with savReaderWriter.SavWriter(dstfile, varNames, varTypes, varLabels=varLabels, valueLabels=valLabels, formats=varFormats, multRespDefs=multRespDefs, ioUtf8=True, ioLocale='pt_BR.UTF-8') as writer:
      File "/usr/local/lib/python3.4/dist-packages/savReaderWriter/savWriter.py", line 144, in __init__
        self.multRespDefs = multRespDefs
      File "/usr/local/lib/python3.4/dist-packages/savReaderWriter/header.py", line 1017, in multRespDefs
        multRespDefs = self._setMultRespDefs(multRespDefs)
      File "/usr/local/lib/python3.4/dist-packages/savReaderWriter/header.py", line 905, in _setMultRespDefs
        if rest["setType"] not in (b"C", b"D"):
    KeyError: 'setType'
    
  5. Albert-Jan Roskam repo owner

    Hi Ygor,

    Did you already try this with the HEAD revision, via pip install -U -e git+https://bitbucket.org/fomcl/savreaderwriter.git#egg=savreaderwriter? The Mac/Linux difference is still intriguing. I rarely test my code on MacOS, although I do have a Hackintosh VM. So I will give it a try as soon as I have time.

    Regards, Albert-Jan

  6. Log in to comment