CL-STRING-MATCH offers a simple regular expressions engine that is a portable version of RE engine created by Jeffrey Massung. It uses the simple non-recursive backtracking implementation from the Regular Expression Matching: the Virtual Machine Approach paper by Russ Cox as a blueprint.
Documentation of the API below is taken from the README.md file from the RE library.
To create a
re object, you can either use the
compile-re function or the
#/ dispatch macro.
CL-USER > (compile-re "%d+") #<RE "%d+"> CL-USER > #/%d+/ #<RE "%d+">
Both work equally well, but the dispatch macro will compile the pattern at read-time. The
re class has a load form and so can be saved to a FASL file.
HINT: when using the read macro, use a backslash to escape the
/ and other characters that might mess with syntax coloring.
with-re macro let's you user either strings or
re objects in a body of code. If a string is passed as the pattern, then it will be compiled before the body is evaluated.
CL-USER > (with-re (re "%d+") re) #<RE "%d+">
NOTE: All pattern matching functions use the
with-re macro, and so the pattern argument can be either a string or a pre-compiled
Basic Pattern Matching
The heart of all pattern matching is the
(match-re pattern string &key start end exact)
It will match
pattern and return a
re-match object on success or
nil on failure. The
end arguments limit the scope of the match and default to the entire string. If
t then the pattern has to consume the entire string (from start to end).
CL-USER > (match-re "%d+" "abc 123") NIL CL-USER > (match-re "%a+" "abc 123") #<RE-MATCH "abc">
Once you have successfully matched and have a
re-match object, you can use the following reader functions to inspect it:
match-stringreturns the entire match
match-groupsreturns a list of groups
match-pos-startreturns the index where the match began
match-pos-endreturns the index where the match ended
Try peeking into a match...
CL-USER > (inspect (match-re "(a(b(c)))" "abc 123")) MATCH "abc" GROUPS ("abc" "bc" "c") START-POS 0 END-POS 3
To find a pattern match anywhere in a string use the
(find-re pattern string &key start end all)
It will scan
string looking for matches to
all is non-
nil then a list of all matches found is returned, otherwise it will simply be the first match.
CL-USER > (find-re "%d+" "abc 123") #<RE-MATCH "123"> CL-USER > (find-re "[^%s]+" "abc 123" :all t) (#<RE-MATCH "abc"> #<RE-MATCH "123">)
Splitting by Pattern
Once patterns have been matched, splitting a string from the matches is trivial.
(split-re pattern string &key start end all coalesce-seps)
all is true, then a list of all sub-sequences in
string (delimited by
pattern) are returned, otherwise just the first and the rest of the string.
coalesce-seps is true the sub-sequences that are empty will be excluded from the results. This argument is ignored if
CL-USER > (split-re "," "1,2,3") "1" "2,3" CL-USER > (split-re "," "1,2,,,abc,3,," :all t :coalesce-seps t) ("1" "2" "abc" "3")
Replacing by Pattern
replace-re function scans the string looking for matching sub-sequences that will be replaced with another string.
(replace-re pattern with string &key start end all)
with is a function, then the function is called with the
re-match object, replacing the pattern with the return value. Otherwise the value is used as-is. As with
all is true, then the pattern is globally replaced.
CL-USER > (replace-re "%d+" #\* "1 2 3") "* 2 3" CL-USER > (replace-re "%a+" #'(lambda (m) (length (match-string m))) "a bc def" :all t) "1 2 3"
NOTE: The string returned by
replace-re is a completely new string. This is true even if
pattern isn't found in the string.
Using parenthesis in a pattern will cause the matching text to be groups in the returned
re-match object. The
match-groups function will return a list of all the captured strings in the match.
CL-USER > (match-groups (match-re #/(%d+)(%a+)/ "123abc")) ("123" "abc")
Captures can be nested, but are always returned in the order they are opened.
CL-USER > (match-groups (match-re #/(a(b(c)))(d)/ "abcd")) ("abc" "bc" "c" "d")
HINT: you can always use the
match-string function to get at the full text that was matched and there's no need to capture the entire pattern.
with-re-match macro can be used to assist in extracting the matched patterns and groups.
(with-re-match ((var match-expr &key no-match) &body body)
If the result of
no-match is returned and
body is not executed.
While in the body of the macro,
$$ will be bound to the
match-string and the groups will be bound to
$9. Any groups beyond the first 9 are bound in a list to
CL-USER > (with-re-match (m (match-re "(%a+)(%s+)(%d+)" "abc 123")) (string-append $3 $2 $1))) "123 abc" CL-USER > (flet ((initial (m) (with-re-match (v m) (format nil "~a." $1)))) (replace-re #/(%a)%a+%s*/ #'initial "Lisp In Small Pieces" :all t)) "L.I.S.P."
bench/benchmark-re.lisp runs a simple benchmark derieved from the patmatch:1t and patmatch:2t benchmarks of the Programming Languages Benchmark. At
- pre complete in: 574 seconds
- ppcre complete in: 24 seconds
- regex complete in: 43 seconds
- Documentation on Lua patterns, includes documentation on the syntax of patterns
- Regular expressions in Common Lisp