1. Volker Birk
  2. pyPEG
Issue #10 resolved

Can't use numbers inside lists.

Anonymous created an issue

Grammars with lists that include int or float as alternatives break with a ValueError exception. For example:

>>> parse("123", [int, word])
123
>>> parse("hello", [int, word])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pypeg2/__init__.py", line 552, in parse
    t, r = parser.parse(text, thing)
  File "pypeg2/__init__.py", line 657, in parse
    t, r = self._parse(t, thing, pos)
  File "pypeg2/__init__.py", line 864, in _parse
    t, r = self._parse(text, e, pos)
  File "pypeg2/__init__.py", line 963, in _parse
    obj = thing(r)
ValueError: invalid literal for int() with base 10: 'hello'

I think this is due to the list handling only checking for SyntaxError and not ValueError in init.py:862. I've managed to make it work by changing it to

            for e in thing:
                try:
                    t, r = self._parse(text, e, pos)
                except ValueError:
                    continue
                if type(r) != SyntaxError:
                    found = True
                    break

Floats are even worse, as the word regexp doesn't work for them.

Comments (4)

  1. Volker Birk repo owner

    I reviewed that issue.

    In short: using just "float" in a grammar cannot work. The reason: usually you're not parsing Python source with pyPEG, because Python already delivers its own parser (and you can use it from Python programs). Instead you're parsing other languages.

    But if the syntax is not Python, what is the correct syntax for a float? This really depends on the concrete language. And pyPEG offers a solution to implement the syntax and semantics of a float in your language; actually, it offers two possibilities. One is using pyPEG's Literal class, the other is deriving from float directly:

    from pypeg2 import *
    
    class MyFloat(Literal):
        grammar = re.compile(r"(?P<sign>[+-])(?P<mantissa>\d*)\*\*(?P<sign2>[+-])(?P<exponent>\d*)")
        def __init__(self, value=.0):
            if isinstance(value, str):
                m = MyFloat.grammar.match(value)
                if m:
                    sign = -1. if m.group("sign") == "-" else 1.
                    mantissa = float(m.group("mantissa"))
                    sign2 = -1. if m.group("sign2") == "-" else 1.
                    exponent = float(m.group("exponent"))
                    self.value = sign * mantissa * 10 ** (sign2 * exponent)
                else:
                    raise ValueError(%s» is not a valid float" % value)
            elif isinstance(value, float):
                self.value = value
            else:
                self.value = float(value)
    
    
    class MyFloat2(float):
        grammar = re.compile(r"(?P<sign>[+-])(?P<mantissa>\d*)\*\*(?P<sign2>[+-])(?P<exponent>\d*)")
        def __new__(cls, value=.0):
            if isinstance(value, str):
                m = MyFloat2.grammar.match(value)
                if m:
                    sign = -1. if m.group("sign") == "-" else 1.
                    mantissa = float(m.group("mantissa"))
                    sign2 = -1. if m.group("sign2") == "-" else 1.
                    exponent = float(m.group("exponent"))
                    return sign * mantissa * 10 ** (sign2 * exponent)
                else:
                    raise ValueError(%s» is not a valid float" % value)
            elif isinstance(value, float):
                return value
            else:
                return float(value)
    

    In this example I created a small decimal float syntax and semantics. It is a sign for the mantissa, the mantissa itself as an integer, then two asterisks, after that the sign of the exponent and last but not least the exponent itself as an integer. For example, "+222**-2" will be 2.22:

    >>> MyFloat("+222**-2")
    MyFloat(2.22)
    >>> MyFloat("+222**-2").value
    2.22
    >>> MyFloat2("+222**-2")
    2.22
    >>> parse("+222**-2", MyFloat)
    MyFloat(2.22)
    >>> parse("+222**-2", MyFloat2)
    2.22
    >>> 
    

    Other literal types have comparable behaviour, including int. But because integers are similar in many languages, I'm correcting the list bug for integers – it should work in pyPEG 2.10.

  2. Log in to comment