Support full PCRE syntax using Python's parser

Issue #2 new
Devin Jeanpierre
repo owner created an issue

This list may be incomplete, but it can be worked on later.

Remaining features:

  • context-sensitive assertions (^, $, \b, \B)
    • NFA might be extendable with state registers for these; otherwise we can keep state during differentiation.
  • categories (e.g. \d, \w)
    • Do this by extending antiset.Antiset to the notion of any boolean combination of unknown sets. For example. \d is 0b01, \w is 0b10, and the whole universe is 0b111 -- except obviously there are more sets than just two. We can represent relations using sets of these bitvectors (or maybe BDDs?) and the finite sets of elements we subtract and add to it.
    • Definitely BDDs. The whole point of this is to display nicely, and BDDs actually do display nicely. We only show decisions in the BDD that lead towards the true state, and the BDD itself is embedded in the DFA graph as an edge from the source to the destination (which is the true state). We have few enough variables we can probably even use the minimal BDD.
    • Examine relation to FSUBA. Considering the relationship with prolog going on here, it's quite probable that FSUBA are equivalent. But then, I don't know anything about FSUBA other than their intended purpose and name.
  • flags (in particular unicode, ignorecase, locale)
    • ignorecase can be done via using string homomorphism of c.lower() or c.casefold() for ASCII/UNICODE. For locale, who knows.
      • Simple/Full case folding needs to be implemented...
  • conditionals

Features that will not be supported for the forseeable future:

  • assertions
  • backreferences