pipeline fails when using python 3

Issue #73 closed
Thomas Gilgenast created an issue

appears to be an error when loading the config file:

configparser.InterpolationSyntaxError: '%' must be followed by '%' or '(', found: '% GC"}],\n"iced": ["MakeIced", {}],\n"kr": ["MakeKR", {}],\n"spline_both": ["MakeSpline", {}],\n"spline_gc": ["MakeSpline", {"bias_factors": ["% GC"], "knots": [0]}],\n"spline_length": ["MakeSpline", {"bias_factors": ["length"], "knots": [20]}],\n"express": ["MakeExpress", {}],\n"jointexpress": ["MakeJointExpress", {}],\n"bin_gmean_16_4": ["MakeBinned", {}],\n"bin_amean_20_8": ["MakeBinned", {"window_function": "amean",\n"bin_width": 8000,\n"window_width": 20000}],\n"bin_gmean_20_8": ["MakeBinned", {"bin_width": 8000,\n"window_width": 20000}],\n"expected_regional": ["MakeExpected", {"global_expected": false,\n"donut": false}],\n"expected_donut": ["MakeExpected", {}],\n"variance": ["MakeVariance", {}],\n"pvalues": ["MakePvalues", {}],\n"threshold": ["MakeThreshold", {}],\n"is": ["MakeInteractionScores", {}]\n}'

this looks like an issue with some interpolation feature that was added to configparser in python 3

since the config is parsed internally by luigi, this may be difficult to disable

luigi switched from using RawConfigParser to ConfigParser in 2013, see https://github.com/spotify/luigi/issues/101

it may be possible to rework our naming of the “% GC“ bias factor to just “GC“ instead to avoid this

Comments (12)

  1. Thomas Gilgenast reporter

    digging into this further, it seems that none of the code in lib5c directly references how to “augment“ a primermap with a “% GC“ column

    for the cell systems paper, we used this script to achieve this: https://bitbucket.org/creminslab/comparison-manuscript/src/master/augment_primerfile.py

    it appears that we would lose nothing by renaming this in the default config and everywhere else we use it

    this might point to a related issue: the lack of documentation on how to add the GC and length columns to a primerfile

  2. Thomas Gilgenast reporter

    this problem is actually more complicated than initially reported

    besides the “% GC” strings (which we removed in d3c1e99), we use “%s“ strings in output file paths specied in the config - we handle the interpolation for these inside our Task classes, we do not want configparser to do anything with them

    if we add configparser as a dependency, we might be able to manually escape all the remaining % symbols in the config and expect it to work under both py2 and py3 if the extra % gets consumed during config parsing

  3. Thomas Gilgenast reporter

    even when configparser is installed on py2, it appears that py2 does not interpret the %%s as escaping a %, and the strings as seen by our luigi Tasks still contain %%s, preventing string substitution from working as desired. meanwhile, under py3, the %%s is interpreted as escaping one % symbil, and the strings as seen by our luigi Tasks contain %s as expected.

  4. Thomas Gilgenast reporter

    a few different ideas:

    1. we can see what happens if we downgrade luigi to a version before it supported interpolation (2.7.9). this will help us understand if the % escpaing is related to interpolation, versus just being a py2 vs py3 difference that has nothing to do with interpolation
    2. we can find a Task class that everyone inherits from (e.g., CmdTask) and override its __getitem__() so that it replace %% with %. this should be more robust than trying to manually find all the locations in the code where %% may cause trouble, but in terms of understanding the underlying bug it feels like a cop-out

    if we do (1) and find that the true source of the inconsistency we’re seeing between py2 and py3 is due to luigi, then we can choose to either implement (2) or force an upper-bound on the luigi version

    if we do (1) and find that it’s not luigi’s fault, then we should keep looking for the true source of this inconsistency

    for review: this inconsistency was initially surprising to us because we specifically install configparser in py2, but looking closer at how luigi implements config parsing, they first attempt to import ConfigParser, then fall back to configparser. this means that even if configparser is installed in py2, they py2 ConfigParser will be used rather than configparser.

  5. Thomas Gilgenast reporter

    (1) does not fix the problem, because the underlying issue seems to be the order in which the imports are attempted

  6. Thomas Gilgenast reporter

    an alternative solution is to use a different placeholder (e.g. “<rep>“) and str.replace() instead of using “%s” and standard (%-based) string formatting

  7. Thomas Gilgenast reporter

    another alternative is to detect if ConfigParser is available when dropping the default config and escaping all the % symbols if it is not

    configs would not be “shareable“ between py2 and py3 without modification, though it would be easy to add something like a “lib5c convertconfig“ command

  8. Log in to comment