ScannerImpl.getToken() and ScannerImpl.peekToken() not correctly implemented?

Issue #402 resolved
Sylvain Baudoin created an issue

Hello,

The methods getToken() and peekToken() are not correctly implemented:

  • getToken() never calls needMoreTokens(), which causes the method to return null if called a second time withtout peekToken() in between.
  • peekToken() may raise an IndexOutOfBoundsException instead of returning null as it does not check if there are tokens left before calling tokens.get(0).

These issues causes the scanner not to work as expected, especially compared to its Pythonic brother.

Comments (17)

  1. Andrey Somov

    Do you mean: The methods getToken() and peekToken() are not correctly DOCUmented ? The profiler reports that these 2 methods are called a lot. The extra calls are not required by SnakeYAML itself and they would cause nothing but a performance degradation.

    Of course, they should be be properly used outside of SnakeYAML. But this is a matter of proper documentation.

  2. Sylvain Baudoin reporter

    No, it's not a matter of documentation. On the contrary, the doc says that the difference between peekToken() and getToken() relies only on fact that in a method the token remains in the stack, not in the other.

    IMHO, I'd stick with the implementation of PyYAML as follows:

        /**
         * Return the next token, but do not delete it from the queue.
         */
        public Token peekToken() {
            while (needMoreTokens()) {
                fetchMoreTokens();
            }
            if (!this.tokens.isEmpty()) {
                return this.tokens.get(0);
            }
            return null;
        }
    
        /**
         * Return the next token, removing it from the queue.
         */
        public Token getToken() {
            while (needMoreTokens()) {
                fetchMoreTokens();
            }
            if (!this.tokens.isEmpty()) {
                this.tokensTaken++;
                return this.tokens.remove(0);
            }
            return null;
        }
    

    In all cases the IndexOutOfBoundsException that peekToken() may raise is really a problem that must be fixed. Becaus it is not mentioned in the doc (unless you did it) but above all because this is a major behavioral difference between peekToken() and getToken().

  3. Andrey Somov

    Please read what I have already said before.

    1. The Javadoc has been already updated
    2. The while loop in getToken() was removed because it is not required
    3. The if statement in peekToken() is not required. I have created a patch for PyYAML
  4. Sylvain Baudoin reporter

    OK with the doc, it is clearer, but this make the method getToken() to work differently from the reference implementation of PyYAML.

    I can't say about the while loop.

    About the patch for PyYAML, I cannot speak on behalf of the people responsible for the project but I disagree with it because it changes the way peekToken() and getToken() currently works: the if makes that the function will return None if self.tokens is empty. With your patch we will get an IndexError, making the function to behave like snakeyaml that currently raises an IndexOutOfBounsException. I'd prefer snakeyaml to implement the if and return null instead. But yes, if correctly documented we can handle this exception.

  5. Andrey Somov

    The patch for PyYAML has a value:

    1. Remove the code which does not do anything
    2. Make the code work quicker

    If someone is going to use the low level API (does not matther PyYAML or SnakeYAML) he has to call the methods in the proper order.

    I do not see the reason to introduce redundant code in SnakeYAML instead of removing redundant code in PyYAML.

  6. Sylvain Baudoin reporter

    I still have questions, though. I looked at your doc update but I don't see how to use checkToken() is a generic way. I'm working on a project for which I need a generic scanner without knowledge of the next token to come: I just want to get the tokens as they come. How to do that with checkToken()?

  7. Andrey Somov

    You do not have to know the next token to come. Just call checkToken()

    (by the way, it works the same way in PyYAML)

  8. Sylvain Baudoin reporter

    Indeed, I did not pay enough attention to the code of checkToken(). Thanks.

    By the way, again, I would mention in the Javadoc of Scanner.peekToken() that a IndexOutOfBoundsException will be raised if there is no more token to be returned (and so use checkToken() beforehand).

  9. Sylvain Baudoin reporter

    Sorry to bother again but why not also remove the while loop in peekToken()? From a functional point of view, as the only difference between peekToken() and getToken() is a "stack-like operation" there is no justification for this while loop in peekToken() as well. I don't really understood why we should call checkToken() before getToken() and not before peekToken(). If we decide to externalize the check because it is already done in checkToken() it must be externalized for both peek and get methods. WDYT?

  10. Andrey Somov

    This is implemented in the release 1.21: Scanner throws IndexOutOfBoundsException if no more token left but getToken() or peekToken()

  11. Sylvain Baudoin reporter

    !!! How did you come to that conclusion? I managed to have my code compatible with both version (1.20 and 1.21) by the way.

  12. Log in to comment