Invalid digits in binary number silently parsed as implied multiplication

Issue #634 closed
Pol Welter created an issue

Example:

0b104 = 0b10 * 4

We should raise a syntax error in that case.

Comments (22)

  1. Tey'

    Should this only ban other decimal digits or also 'a' to 'f'? I would think so.

    Only digits I believe, because if you ban some letters as well, you won't be able to write things like <NUMBER><IDENTIFIER> without a space between them (like 2a or 0b10a where a is a variable). Also, if it's not a digit, the compiler already fails anyway.

    A similar issue also exists with octal numbers (0o49 => 4*9).

    Do you plan to fix it? I can work on it otherwise.

  2. Pol Welter reporter

    I have a fix ready (only a couple of lines really). I would have said that letters should be illegal as well. I can not think of any situation where one seriously would want 0b10a want to mean 0b10 * a. Really, forcing he user to be more explicit seems like a more prudent solution.

    Yes, this applies to binary and octal. Decimal should not be affected (here implied multiplication does make sense). Hexadecimal is good the way it is.

  3. Pol Welter reporter

    Hmm, my changes do make the expression give a syntax error, but the syntax highlighter is weirded out by it. You can maybe have a look at that one if you want.

  4. Tey'

    I would have said that letters should be illegal as well. I can not think of any situation where one seriously would want 0b10a want to mean 0b10 * a.

    If we allow it for decimal, it does not sound coherent to forbid it for other bases (especially if we still allow other letters like 0b10x). Also, your changes break implicit multiplications with binary and octal numbers, even when there is a space between the number and the identifier (e.g., 0b10 foo).

    And there is a similar issue with radix characters, as 12.12.12 is allowed and understood as 12.12 * .12. Edit: also with 12e12.12.12.

    Hmm, my changes do make the expression give a syntax error, but the syntax highlighter is weirded out by it. You can maybe have a look at that one if you want.

    Should be fixed in your PR now.

  5. Pol Welter reporter

    Thanks. The space must not be a syntactic element. It can also serve as thousand's separator so we can precisely tell nothing from it.

    Since we don't seem to agree, let's ask @heldercorreia, @fk, @thadrien :)

    Which of these expressions should fail? If not, what is the expected result?

    a=5
    x=4
    
    0b104
    
    0b10 4
    
    0b10x
    
    0b10a
    
    0b10 a
    
    5a
    
    0xa
    

    Whoa, this 12.12.12 is bad as well...

  6. Hadrien Theveneau

    On a side note: I suggested some weeks ago to implement a warning system which would allow to evaluate the result but to tell the user that something deserves his attention.

  7. Helder Correia repo owner

    I don't have a strong opinion on all this, but answering the question about which ones should fail IMO: 0b104 and 0b10 4.

  8. Felix Krull

    I seem to recall that originally, multiplication of two literal numbers wasn't allowed? That's what's causing us problems here from what I can see. The only way I can see of fixing 12.12.12 is by disallowing it (well, outside of requiring more explicit separators in some fashion).

    To answer the particular question, I think 0b104 and 0b10 4 should fail; the others aren't great either, but by disallowing them, we're disallowing implicit multiplication for binary (and octal) numbers either entirely or for variables that start with [a-f].


    Also somewhat regarding problems like this, see my comment in issue #589.

  9. Hadrien Theveneau

    I think another way to fix it is to make the lexer recognise the third ".12" not as a number but as an error. if there is no space just before.

    Le 11/05/2016 à 16:25, Felix Krull a écrit :

  10. Pol Welter reporter

    Yes, it should be recognised as an error in the lexer, and not even reach the compile function.

  11. Helder Correia repo owner

    Correct. For 1.0, we should be able to precisely point the user to the expression substring where a very specific (and communicated) error occurs.

  12. Tey'

    Which of these expressions should fail? If not, what is the expected result?

    @polwel can we conclude on this point? Summing up the previous comments, only 0b104 and 0b10 4 should definitely fail. If you still think 0b10a and 0b10 a should fail, then I believe we should also make 0b10x and 0b10 x fail as well for the sake of coherency.

    I seem to recall that originally, multiplication of two literal numbers wasn't allowed?

    It was forbidden in the compiler, but as some tokens where reduced to number tokens in the past, it made some implicit multiplications fail. Tokens reductions result has been reworked since then, so we can just add the NUMBER NUMBER exception for the implicit multiplication rule back in the compiler. It would fix this issue without having to add non-lexer logic in the lexer (what 74bc144 does actually), and I'd rather like it that way.

  13. Pol Welter reporter

    No strong opinion on 0b10a here.

    But disallowing number number yet allowing number abstract is not going to work. Consider 0b104^2.

    IMO the only way to go is to ban this in the tokenizer.

  14. Tey'

    Aw you're right, we can't do that at the compilation stage as the number tokens might have been reduced.

    No strong opinion on 0b10a here.

    Okay, let's just check for digits, radix characters and '#' then. I have a patch ready to be applied to your PR, unless you want to implement it yourself.

  15. Log in to comment