bring 'switch(state->charsize)' out of loops.

Create issue
Issue #99 new
Former user created an issue

Why do you want this feature? What is your use-case?

state->charsize is const during matching, and all "switch" are in loops(backtrack, advance), since these functions have heavily duplicate code.

it should be bring out of loops not only for performance but also for maintenance.

What should the syntax or call look like?


Do any other regex implementations have something like this?


Please provide any additional information below.


Comments (3)

  1. Former user Account Deleted

    I'm not sure what you mean. I try not to duplicate code unless there's a measurable benefit in terms of speed, which only really occurs in a tight loop, e.g. in 'match_many_ANY'.

  2. Former user Account Deleted

    i am reading _regex.c,

    duplicate code of "switch (state->charsize)" and "_REV/_IGN" is really ugly ... modern compiler can optimize function with const parameter correctly, i think it's ok to make them together.

  3. Serhiy Storchaka

    See how this is implemented in the stdlib re module. The repeated code is parametrized by macros and moved into separated file, included multiple times with different definitions.

    /* generate 8-bit version */
    #define SRE_CHAR Py_UCS1
    #define SIZEOF_SRE_CHAR 1
    #define SRE(F) sre_ucs1_##F
    #include "sre_lib.h"
    /* generate 16-bit unicode version */
    #define SRE_CHAR Py_UCS2
    #define SIZEOF_SRE_CHAR 2
    #define SRE(F) sre_ucs2_##F
    #include "sre_lib.h"
    /* generate 32-bit unicode version */
    #define SRE_CHAR Py_UCS4
    #define SIZEOF_SRE_CHAR 4
    #define SRE(F) sre_ucs4_##F
    #include "sre_lib.h"

    This allowed to get rid of code duplication (except on the highest level) and switches on codesize in tight loops.

  4. Log in to comment