Issue #624 resolved
Martin Holst Swende
created an issue

I have noticed a couple of bugs with the VbNetLexer. functionname, classname and namespace are not detected as such, since they are matched as Keyword. Variablename is not found, despite being pretty easy to spot

I have a patch-proposal, where I have remved the keywords for 'funcname' (Function|Sub|Property), 'classname' (Class|Structure|Enum) and 'namespace' (Namespace|Imports) from the generic Keyword regexps.

Also, I added a similar bygroups-construction for detecting variablenames. I am unsure if I did it correctly, since I don't know exactly how pygments work internally. {{{

!python

         (r'(?<!\.)(ByVal|ByRef|Dim|ReDim)(\s+)',
         bygroups(Keyword, Text), 'variablename'),

}}}

Attaching a file which contains some sample code. It monkey-patches the lexer with my patch. Sample code: {{{

!vb.net

Dim variableA As String
Protected Sub Page_Init(ByVal variableB As Object, ByRef variableC As System.EventArgs)
    variableA = "test"
End Sub

}}}

Result from standard pygments: {{{

!

(Token.Text, u' ') (Token.Keyword, u'Dim') (Token.Text, u' ') (Token.Name, u'variableA') (Token.Text, u' ') (Token.Operator.Word, u'As') (Token.Text, u' ') (Token.Keyword.Type, u'String') (Token.Text, u'\n ') (Token.Keyword, u'Protected') (Token.Text, u' ') (Token.Keyword, u'Sub') (Token.Text, u' ') (Token.Name, u'Page_Init') (Token.Punctuation, u'(') (Token.Keyword, u'ByVal') (Token.Text, u' ') (Token.Name, u'variableB') (Token.Text, u' ') (Token.Operator.Word, u'As') (Token.Text, u' ') (Token.Keyword.Type, u'Object') (Token.Punctuation, u',') (Token.Text, u' ') (Token.Keyword, u'ByRef') (Token.Text, u' ') (Token.Name, u'variableC') (Token.Text, u' ') (Token.Operator.Word, u'As') (Token.Text, u' ') (Token.Name, u'System') (Token.Punctuation, u'.') (Token.Name, u'EventArgs') (Token.Punctuation, u')') (Token.Text, u'\n ') (Token.Name, u'variableA') (Token.Text, u' ') (Token.Operator, u'=') (Token.Text, u' ') (Token.Literal.String, u'"') (Token.Literal.String, u'test') (Token.Literal.String, u'"') (Token.Text, u'\n ') (Token.Keyword, u'End') (Token.Text, u' ') (Token.Keyword, u'Sub') (Token.Text, u'\n') }}} Result from patched version: {{{

!

(Token.Text, u' ') (Token.Keyword, u'Dim') (Token.Text, u' ') (Token.Name.Variable, u'variableA') (Token.Text, u' ') (Token.Operator.Word, u'As') (Token.Text, u' ') (Token.Keyword.Type, u'String') (Token.Text, u'\n ') (Token.Keyword, u'Protected') (Token.Text, u' ') (Token.Keyword, u'Sub') (Token.Text, u' ') (Token.Name.Function, u'Page_Init') (Token.Punctuation, u'(') (Token.Keyword, u'ByVal') (Token.Text, u' ') (Token.Name.Variable, u'variableB') (Token.Text, u' ') (Token.Operator.Word, u'As') (Token.Text, u' ') (Token.Keyword.Type, u'Object') (Token.Punctuation, u',') (Token.Text, u' ') (Token.Keyword, u'ByRef') (Token.Text, u' ') (Token.Name.Variable, u'variableC') (Token.Text, u' ') (Token.Operator.Word, u'As') (Token.Text, u' ') (Token.Name, u'System') (Token.Punctuation, u'.') (Token.Name, u'EventArgs') (Token.Punctuation, u')') (Token.Text, u'\n ') (Token.Name, u'variableA') (Token.Text, u' ') (Token.Operator, u'=') (Token.Text, u' ') (Token.Literal.String, u'"') (Token.Literal.String, u'test') (Token.Literal.String, u'"') (Token.Text, u'\n ') (Token.Keyword, u'End') (Token.Text, u' ') (Token.Keyword, u'Sub') (Token.Text, u'\n')

}}}

Comments (4)

  1. Tim Hatch

    I see what you're talking about with Sub (and others) being keywords. They actually need to be keywords for uses like "End Sub" or "On Error Exit Sub", but you're right the funcname rule can never match. Could you explain your rationale for highlighting the variable *declarations* differently tha their future uses? They currently match as Token.Name...

  2. Martin Holst Swende reporter

    Off the top of my head, I have the impression that Token.Name could be many things, e.g the name of a function-call

    foo() <- Name, Operator, Operator

    or function definition (at least prior to my fix):

    Sub Page_Init <- (Token.Name, u'Page_Init')

    In the documentation, it said that Token.Name.Variable was a built-in token, and I assumed it existed for this reason but was just not implemented for VB.

    My rationale: I am trying to use pygments to build a very simple taint-machine(?) or framework to detect functioncalls with untrusted input (e.g for detection of sql injection). If I separate variable declarations, it is simple for the next filter to pick up all variable declarations and determine the type (e.g in the case of sql injection: strings are vulnerable, whereas integers are not).

    So, for my particular usecase it makes sense to distinguish between:

    • Variable references vs variable declarations
    • Functioncalls versus function declarations
  3. Martin Holst Swende reporter

    They actually need to be keywords for uses like "End Sub" or "On Error Exit Sub"

    It sounds correct, but when you look at the testcase above (and I reran it just to be sure), the "Sub" in "End Sub" actually is matched as keyword anyway. It appears that the function-rule fixes that, but I am not quite sure how it works, since funcname requires at least one character [a-z_] in order to match...?

  4. Log in to comment