hachoir / hachoir-regex / README

haypo 525cb43 

haypo 1076e6a 

haypo 525cb43 
ha...@marge. 174552b 
haypo 525cb43 
haypo 6f089af 

Victor Stinner 71f397c 
Victor Stinner b014e34 

Victor Stinner 71f397c 
Victor Stinner b014e34 
Victor Stinner a38d22f 
haypo 6e6c137 

Victor Stinner 6e64584 
haypo 6e6c137 

haypo 512cd20 

haypo 6f089af 
haypo fd6ecda 

haypo 525cb43 

Hachoir regex

hachoir-regex is a Python library for regular expression (regex or regexp)
manupulation. You can use a|b (or) and a+b (and) operators. Expressions are
optimized during the construction: merge ranges, simplify repetitions, etc.
It also contains a class for pattern matching allowing to search multiple
strings and regex at the same time.



Version 1.0.5 (2010-01-28)

 * Create a to include extra files: regex.rst,, etc.
 * Create an INSTALL file

Version 1.0.4 (2010-01-13)

 * Support \b (match a word)
 * Fix parser: support backslash in a range, eg. parse(r"[a\]x]")

Version 1.0.3 (2008-04-01)

 * Raise SyntaxError on unsupported escape character
 * Two dot atoms are always equals

Version 1.0.2 (2007-07-12)

 * Refix PatternMatching without any pattern

Version 1.0.1 (2007-06-28)

 * Fix PatternMatching without any pattern

Version 1.0 (2007-06-28)

 * First public version

Regex examples

Regex are optimized during their creation:

   >>> from hachoir_regex import parse, createRange, createString
   >>> createString("bike") + createString("motor")
   <RegexString 'bikemotor'>
   >>> parse('(foo|fooo|foot|football)')
   <RegexAnd 'foo(|[ot]|tball)'>

Create character range:

   >>> regex = createString("1") | createString("3")
   >>> regex
   <RegexRange '[13]'>
   >>> regex |= createRange("2", "4")
   >>> regex
   <RegexRange '[1-4]'>

As you can see, you can use classic "a|b" (or) and "a+b" (and)
Python operators. Example of regular expressions using repetition:

   >>> parse("(a{2,}){3,4}")
   <RegexRepeat 'a{6,}'>
   >>> parse("(a*|b)*")
   <RegexRepeat '[ab]*'>
   >>> parse("(a*|b|){4,5}")
   <RegexRepeat '(a+|b){0,5}'>

Compute minimum/maximum matched pattern:

   >>> r=parse('(cat|horse)')
   >>> r.minLength(), r.maxLength()
   (3, 5)
   >>> r=parse('(a{2,}|b+)')
   >>> r.minLength(), r.maxLength()
   (1, None)

Pattern maching

Use PatternMaching if you would like to find many strings or regex in a string.
Use addString() and addRegex() to add your patterns.

    >>> from hachoir_regex import PatternMatching
    >>> p = PatternMatching()
    >>> p.addString("a")
    >>> p.addString("b")
    >>> p.addRegex("[cd]")

And then use search() to find all patterns:

    >>> for start, end, item in"a b c d"):
    ...    print "%s..%s: %s" % (start, end, item)
    0..1: a
    2..3: b
    4..5: [cd]
    6..7: [cd]

You can also attach an objet to a pattern with 'user' (user data) argument:

    >>> p = PatternMatching()
    >>> p.addString("un", 1)
    >>> p.addString("deux", 2)
    >>> for start, end, item in"un deux"):
    ...    print "%r at %s: user=%r" % (item, start, item.user)
    <StringPattern 'un'> at 0: user=1
    <StringPattern 'deux'> at 3: user=2


With distutils:

   sudo ./ install

Or using setuptools:

   sudo ./ --setuptools install