Commits

Anonymous committed dd279ed

packages: Add perl regexp support to re-builder

-------------------- ChangeLog entries follow: --------------------

xemacs-packages/re-builder/ChangeLog addition:

2005-08-29 Adrian Aichner <adrian@xemacs.org>

* re-builder.el: Add support for perl regexp via wrapping
functions reb-re-search-forward and reb-re-search-backward.
Define faces for all nine supported subexpressions. Spread out
face colors for light background with roughly 36 degree hue
spread.
* re-builder.el (reb-re-syntax): Add perl suport.
* re-builder.el (reb-match-4): New.
* re-builder.el (reb-match-5): New.
* re-builder.el (reb-match-6): New.
* re-builder.el (reb-match-7): New.
* re-builder.el (reb-match-8): New.
* re-builder.el (reb-match-9): New.
* re-builder.el (reb-perl-match-vector): New.
* re-builder.el (reb-perl-match-index): New.
* re-builder.el (reb-perl-syntax-p): New.
* re-builder.el (reb-next-match): Use reb-re-search-forward.
* re-builder.el (reb-prev-match): Use reb-re-search-backward.
* re-builder.el (reb-enter-subexp-mode): Add comma to message.
* re-builder.el (reb-change-syntax): Add perl support.
* re-builder.el (reb-delete-overlays): Reset reb-perl-match-vector
and reb-perl-match-index.
* re-builder.el (reb-read-regexp): Add perl support.
* re-builder.el (reb-insert-regexp): Ditto.
* re-builder.el (reb-count-subexps): Avoid false subexp matches,
more important for perl, but applicable to lisp regexp too.
* re-builder.el (reb-update-overlays): Use reb-re-search-forward.
* re-builder.el (reb-perl-program): New.
* re-builder.el (reb-re-search-forward): New.
* re-builder.el (reb-re-search-backward): New.
* re-builder.texi: Update for perl support.

  • Participants
  • Parent commits fb0d11c

Comments (0)

Files changed (3)

+2005-08-29  Adrian Aichner  <adrian@xemacs.org>
+
+	* re-builder.el: Add support for perl regexp via wrapping
+	functions reb-re-search-forward and reb-re-search-backward.
+	Define faces for all nine supported subexpressions.  Spread out
+	face colors for light background with roughly 36 degree hue
+	spread.
+	* re-builder.el (reb-re-syntax): Add perl suport.
+	* re-builder.el (reb-match-4): New.
+	* re-builder.el (reb-match-5): New.
+	* re-builder.el (reb-match-6): New.
+	* re-builder.el (reb-match-7): New.
+	* re-builder.el (reb-match-8): New.
+	* re-builder.el (reb-match-9): New.
+	* re-builder.el (reb-perl-match-vector): New.
+	* re-builder.el (reb-perl-match-index): New.
+	* re-builder.el (reb-perl-syntax-p): New.
+	* re-builder.el (reb-next-match): Use reb-re-search-forward.
+	* re-builder.el (reb-prev-match): Use reb-re-search-backward.
+	* re-builder.el (reb-enter-subexp-mode): Add comma to message.
+	* re-builder.el (reb-change-syntax): Add perl support.
+	* re-builder.el (reb-delete-overlays): Reset reb-perl-match-vector
+	and reb-perl-match-index.
+	* re-builder.el (reb-read-regexp): Add perl support.
+	* re-builder.el (reb-insert-regexp): Ditto.
+	* re-builder.el (reb-count-subexps): Avoid false subexp matches,
+	more important for perl, but applicable to lisp regexp too.
+	* re-builder.el (reb-update-overlays): Use reb-re-search-forward.
+	* re-builder.el (reb-perl-program): New.
+	* re-builder.el (reb-re-search-forward): New.
+	* re-builder.el (reb-re-search-backward): New.
+	* re-builder.texi: Update for perl support.
+
 2005-07-27  Norbert Koch  <viteno@xemacs.org>
 
 	* Makefile (VERSION): XEmacs package 1.03 released.

File re-builder.el

 
 ;; As the automatic updates can take some time on large buffers, they
 ;; can be limited by `reb-auto-match-limit' so that they should not
-;; have a negative impact on the editing.  Setting it to nil makes
-;; even the auto updates go all the way.  Forcing an update overrides
-;; this limit allowing an easy way to see all matches.
+;; have a negative impact on the editing.  Setting it to nil disables
+;; the auto update limit.  Setting it to 0 disables auto updates.
+;; Forcing an update overrides this limit allowing an easy way to see
+;; all matches.
 
-;; Currently `re-builder' understands five different forms of input,
-;; namely `read', `string', `rx', `sregex' and `lisp-re' syntax.  Read
-;; syntax and string syntax are both delimited by `"'s and behave
+;; Currently `re-builder' understands six different forms of input,
+;; namely `lisp-re', `perl', `read', `rx', `sregex', and `string'
+;; syntax.
+;; Read syntax and string syntax are both delimited by `"'s and behave
 ;; according to their name.  With the `string' syntax there's no need
 ;; to escape the backslashes and double quotes simplifying the editing
 ;; somewhat.  The other three allow editing of symbolic regular
 ;;    out.
 
 ;; Q: But how can I then make out the sub-expressions?
-;; A: Thats where the `sub-expression mode' comes in.  In it only the
+;; A: That's where the `sub-expression mode' comes in.  In it only the
 ;;    digit keys are assigned to perform an update that will flash the
 ;;    corresponding subexp only.
 
 
 (defcustom reb-re-syntax 'read
   "*Syntax for the REs in the RE Builder.
-Can either be `read', `string', `sregex' or `lisp-re'."
+
+Can either be `lisp-re', `perl', `read', `rx', `sregex', or `string'."
   :group 're-builder
   :type '(choice (const :tag "Read syntax" read)
 		 (const :tag "String syntax" string)
 		 (const :tag "`sregex' syntax" sregex)
 		 (const :tag "`lisp-re' syntax" lisp-re)
 		 (const :tag "`rx' syntax" rx)
+		 (const :tag "`perl' syntax" perl)
 		 (value: string)))
 
 (defcustom reb-auto-match-limit 200
   :type '(restricted-sexp :match-alternatives
 			  (integerp 'nil)))
 
-
+; hsv 0 0.882353 0.843137, rgb (55255 6500 6500) close to color
+; "firebrick3" ((52685 9766 9766))
 (defface reb-match-0
   '((((class color) (background light))
-     (:background "lightblue"))
+     (:background "firebrick3"))
     (((class color) (background dark))
      (:background "steelblue4"))
     (t
   "Used for displaying the whole match."
   :group 're-builder)
 
+; hsv 36 0.921569 0.843137, rgb (55255 34886 4333) close to color
+; "goldenrod3" ((52685 39835 7453))
 (defface reb-match-1
   '((((class color) (background light))
-     (:background "aquamarine"))
+     (:background "goldenrod3"))
     (((class color) (background dark))
      (:background "blue3"))
     (t
   "Used for displaying the first matching subexpression."
   :group 're-builder)
 
+
+; hsv 72 0.843137 0.803922, rgb (43800 52685 8264) close to color
+; "YellowGreen" ((39578 52685 12850))
 (defface reb-match-2
   '((((class color) (background light))
-     (:background "springgreen"))
+     (:background "YellowGreen"))
     (((class color) (background dark))
      (:background "chartreuse4"))
     (t
   "Used for displaying the second matching subexpression."
   :group 're-builder)
 
+; hsv 108 0.686275 0.686275, rgb (20282 44975 14109) close to color
+; "SeaGreen" ((11822 35723 22359))
+; hsv 108 0.843137 0.803922, rgb (17148 52685 8264) close to color
+; "LimeGreen" ((12850 52685 12850))
+; hsv 108 0.725490 0.686275, rgb (18871 44975 12346) close to color "PaleGreen4" ((21588 35723 21588))
+; hsv 108 0.843137 0.960784, rgb (20494 62965 9876) close to color
+; "chartreuse2" ((30326 61166 0))
 (defface reb-match-3
   '((((class color) (background light))
-     (:background "yellow"))
+     (:background "chartreuse2"))
+    (((class color) (background dark))
+     (:background "sienna4"))
+    (t
+     :inverse-video t))
+  "Used for displaying the third matching subexpression."
+  :group 're-builder)
+
+; hsv 144 0.882353 0.843137, rgb (6500 55255 26002) close to color
+; "MediumSeaGreen" ((15420 46003 29041))
+; hsv 144 1.000000 0.843137, rgb (0 55255 22101) close to color
+; "SpringGreen3" ((0 52685 26214))
+(defface reb-match-4
+  '((((class color) (background light))
+     (:background "MediumSeaGreen"))
+    (((class color) (background dark))
+     (:background "sienna4"))
+    (t
+     :inverse-video t))
+  "Used for displaying the third matching subexpression."
+  :group 're-builder)
+
+; hsv 180 1.000000 0.843137, rgb (0 55255 55255) close to color
+; "cyan3" ((0 52685 52685))
+(defface reb-match-5
+  '((((class color) (background light))
+     (:background "cyan3"))
+    (((class color) (background dark))
+     (:background "sienna4"))
+    (t
+     :inverse-video t))
+  "Used for displaying the third matching subexpression."
+  :group 're-builder)
+
+; hsv 216 0.803922 0.843137, rgb (10834 28602 55255) close to color
+; "RoyalBlue3" ((14906 24415 52685))
+(defface reb-match-6
+  '((((class color) (background light))
+     (:background "RoyalBlue3"))
+    (((class color) (background dark))
+     (:background "sienna4"))
+    (t
+     :inverse-video t))
+  "Used for displaying the third matching subexpression."
+  :group 're-builder)
+
+; hsv 288 0.843137 0.803922, rgb (43800 8264 52685) close to color
+; "DarkOrchid3" ((39578 12850 52685))
+(defface reb-match-7
+  '((((class color) (background light))
+     (:background "DarkOrchid3"))
+    (((class color) (background dark))
+     (:background "sienna4"))
+    (t
+     :inverse-video t))
+  "Used for displaying the third matching subexpression."
+  :group 're-builder)
+
+; hsv 324 1.000000 0.843137, rgb (55255 0 33152) close to color
+; "DeepPink3" ((52685 4112 30326))
+(defface reb-match-8
+  '((((class color) (background light))
+     (:background "DeepPink3"))
+    (((class color) (background dark))
+     (:background "sienna4"))
+    (t
+     :inverse-video t))
+  "Used for displaying the third matching subexpression."
+  :group 're-builder)
+
+; hsv 252 1.000000 0.725490, rgb (9509 0 47545) close to color "blue3"
+; ((0 0 52685))
+(defface reb-match-9
+  '((((class color) (background light))
+     (:background "blue3"))
     (((class color) (background dark))
      (:background "sienna4"))
     (t
 (defvar reb-valid-string ""
   "String in mode line showing validity of RE.")
 
+(defvar reb-perl-match-vector nil
+  "Last match data for perl syntax used by RE Builder.")
+
+(defvar reb-perl-match-index nil
+  "Last match data index for perl syntax used by RE Builder.")
+
 (make-variable-buffer-local 'reb-overlays)
 (make-variable-buffer-local 'reb-regexp)
 (make-variable-buffer-local 'reb-regexp-src)
+(make-variable-buffer-local 'reb-perl-match-vector)
+(make-variable-buffer-local 'reb-perl-match-index)
 
 (defconst reb-buffer "*RE-Builder*"
   "Buffer to use for the RE Builder.")
   "Return non-nil if RE Builder uses a Lisp syntax."
   (memq reb-re-syntax '(lisp-re sregex rx)))
 
+(defsubst reb-perl-syntax-p ()
+  "Return non-nil if RE Builder uses a Perl syntax."
+  (equal reb-re-syntax 'perl))
+
 (defmacro reb-target-binding (symbol)
   "Return binding for SYMBOL in the RE Builder target buffer."
   `(with-current-buffer reb-target-buffer ,symbol))
   (reb-assert-buffer-in-window)
   (reb-with-current-window
     reb-target-window
-    (if (not (re-search-forward reb-regexp (point-max) t))
+    (if (not (reb-re-search-forward reb-regexp (point-max) t))
 	(message "No more matches.")
       (reb-show-subexp
        (or (and reb-subexp-mode reb-subexp-displayed) 0)
   (reb-assert-buffer-in-window)
   (reb-with-current-window reb-target-window
     (goto-char (1- (point)))
-    (if (not (re-search-backward reb-regexp (point-min) t))
+    (if (not (reb-re-search-backward reb-regexp (point-min) t))
 	(message "No more matches.")
       (reb-show-subexp
        (or (and reb-subexp-mode reb-subexp-displayed) 0)
   (setq reb-subexp-mode t)
   (reb-update-modestring)
   (use-local-map reb-subexp-mode-map)
-  (message "`0'-`9' to display subexpressions  `q' to quit subexp mode."))
+  (message "`0'-`9' to display subexpressions, `q' to quit subexp mode."))
 
 (defun reb-show-subexp (subexp &optional pause)
   "Visually show limit of subexpression SUBEXP of recent search.
    (list (intern
 	  (completing-read "Select syntax: "
 			   (mapcar (lambda (el) (cons (symbol-name el) 1))
-				   '(read string lisp-re sregex rx))
+				   ;; APA TODO: Gotta be a way to
+				   ;; avoid duplication of choices
+				   ;; from `reb-re-syntax' here.
+				   '(read string lisp-re sregex rx perl))
 			   nil t (symbol-name reb-re-syntax)))))
 
-  (if (memq syntax '(read string lisp-re sregex rx))
+  (if (memq syntax '(read string lisp-re sregex rx perl))
       (let ((buffer (get-buffer reb-buffer)))
 	(setq reb-re-syntax syntax)
 	(if buffer
 (defun reb-delete-overlays ()
   "Delete all RE Builder overlays in the `reb-target-buffer' buffer."
   (if (buffer-live-p reb-target-buffer)
-    (with-current-buffer reb-target-buffer
-      (mapcar 'reb-delete-overlay reb-overlays)
-      (setq reb-overlays nil))))
+      (with-current-buffer reb-target-buffer
+	(mapcar 'reb-delete-overlay reb-overlays)
+	(setq reb-overlays nil
+	      reb-perl-match-vector nil
+	      reb-perl-match-index nil))))
 
 (defun reb-assert-buffer-in-window ()
   "Assert that `reb-target-buffer' is displayed in `reb-target-window'."
     (cond ((eq reb-re-syntax 'read)
 	   (goto-char (point-min))
 	   (read (current-buffer)))
-	  ((eq reb-re-syntax 'string)
+	  ((member reb-re-syntax '(perl string))
 	   (goto-char (point-min))
 	   (re-search-forward "\"")
 	   (let ((beg (point)))
 
   (let ((re (or (reb-target-binding reb-regexp)
 		(reb-empty-regexp))))
-  (cond ((eq reb-re-syntax 'read)
-	 (print re (current-buffer)))
-	((eq reb-re-syntax 'string)
-	 (insert "\n\"" re "\""))
-	;; For the Lisp syntax we need the "source" of the regexp
-	((reb-lisp-syntax-p)
-	 (insert (or (reb-target-binding reb-regexp-src)
-		     (reb-empty-regexp)))))))
+    (cond ((eq reb-re-syntax 'read)
+	   (print re (current-buffer)))
+	  ((member reb-re-syntax '(perl string))
+	   (insert "\n\"" re "\""))
+	  ;; For the Lisp syntax we need the "source" of the regexp
+	  ((reb-lisp-syntax-p)
+	   (insert (or (reb-target-binding reb-regexp-src)
+		       (reb-empty-regexp)))))))
 
 (defun reb-cook-regexp (re)
   "Return RE after processing it according to `reb-re-syntax'."
   "Return number of sub-expressions in the regexp RE."
 
   (let ((i 0) (beg 0))
-    (while (string-match "\\\\(" re beg)
-      (setq i (1+ i)
-	    beg (match-end 0)))
-    i))
+    ;; Remove character classes first, to avoid false subexp matches,
+    ;; then count beginnings of subexp grouping operators.
+    ;; Even before that remove escaped character class operators to
+    ;; simplify character class removal.
+    (let ((subexp-start-regexp (if (reb-perl-syntax-p) "(" "\\\\("))
+	  ;; escaped-char-class-regexp an char-class-regexp do not
+	  ;; differ between perl and lisp regexps.
+	  (escaped-char-class-regexp "\\\\[][]")
+	  (char-class-regexp "\\[^?.[^]]*\\]"))
+      (setq re (replace-regexp-in-string escaped-char-class-regexp "" re))
+      (setq re (replace-regexp-in-string char-class-regexp "" re))
+      (while (string-match subexp-start-regexp re beg)
+	(setq i (1+ i)
+	      beg (match-end 0)))
+      i)))
 
 
 (defun reb-update-overlays (&optional subexp)
       (set-buffer reb-target-buffer)
       (reb-delete-overlays)
       (goto-char (point-min))
-      (while (and (re-search-forward re (point-max) t)
+      (while (and (reb-re-search-forward re (point-max) t)
 		  (or (not reb-auto-match-limit)
 		      (< matches reb-auto-match-limit)))
 	(if (= 0 (length (match-string 0)))
 						 (match-end i)))
 		      (face-name (format "reb-match-%d" i)))
 		  (if (not firstmatch)
-		      (setq firstmatch (match-data)))
+		      ;; Use INTEGERS argument for reb-perl-syntax-p.
+		      (setq firstmatch (match-data (reb-perl-syntax-p))))
 		  (setq reb-overlays (cons overlay reb-overlays)
 			submatches (1+ submatches))
 		  (reb-overlay-put
 	(progn (store-match-data firstmatch)
 	       (reb-show-subexp (or subexp 0))))))
 
+(defconst reb-perl-program
+  "
+use strict;
+
+# Adrian Aichner <adrian@xemacs.org>, The XEmacs Project, 2005-08-23.
+# Perl script returning match data to be used by re-builder.el
+
+my $re = qr{%s}%s;
+my @matches;
+
+# Read in whole data to count positions relative to begin of data
+# stream.
+# Shouldn't be a problem for typical use of re-builder.el.
+# re-builder.el should emit warning on huge target buffers.
+undef $/;
+
+sub main {
+    while (<DATA>) {
+        my $re_comment = $re;
+        $re_comment =~ s/\\n/\\n ;; /g;
+        printf \"[\\n ;; perl (v%%vd) one-based match-data for qr{$re_comment}\", $^V;
+        while (m{$re}g) {
+            print \"\\n [\";
+            for (my $i = 0; $i < scalar (@-); $i++) {
+                printf(\"(%%d %%d)\", pos() - $+[0] + $-[$i] + 1, pos() - $+[0] + $+[$i] + 1);
+            }
+            print \"]\";
+        }
+        print \"]\\n\";
+    }
+}
+
+main();
+
+__DATA__
+%s"
+  "Perl program control string to pass to `format'.
+
+The control-string contains three %s sequences to receive the regular
+expression string, the regular expression flags to pass on the perl's
+qr{RE}FLAGS; opeartor, and finally the buffer content to match
+against.
+")
+
+(defun reb-re-search-forward (regexp &optional limit noerror count buffer)
+  (cond
+   ((reb-perl-syntax-p)
+    (if (not (reb-target-binding reb-perl-match-vector))
+	(let ((program-buffer (get-buffer-create " *reb-perl-program*"))
+	      (match-buffer (get-buffer-create " *reb-perl-match-data*")))
+	  (unwind-protect
+	      (progn
+		(with-current-buffer
+		    program-buffer
+		  (erase-buffer)
+		  (insert
+		   (format reb-perl-program
+			   (reb-target-binding reb-regexp)
+			   (concat
+			    (if (reb-target-binding case-fold-search) "i" "")
+			    ;; use extended perl regexp syntax?
+			    "x")
+			   (buffer-string reb-target-buffer)))
+		  (call-process "perl"
+				(list (get-buffer " *reb-perl-program*"))
+				match-buffer))
+		(with-current-buffer reb-target-buffer
+		  (setq reb-perl-match-vector
+			(read (buffer-string match-buffer)))))
+	    (kill-buffer program-buffer)
+	    (kill-buffer match-buffer))))
+    (let* ((vector (reb-target-binding reb-perl-match-vector))
+	   (index (reb-target-binding reb-perl-match-index))
+	   (max-index (1- (length vector))))
+      ;; Make sure to preserve previous match data on failing search!
+      (if (not index)
+	  (setq index 0)
+	(incf index))
+      (when (<= index max-index)
+	(with-current-buffer reb-target-buffer
+	  (setq reb-perl-match-index index))
+	(store-match-data
+	 (apply 'append
+		(append
+		 (elt
+		  vector
+		  (reb-target-binding index))
+		 nil)))
+	(with-current-buffer reb-target-buffer
+	  (goto-char
+	   (elt
+	    (apply 'append
+		   (append
+		    (elt
+		     vector
+		     ;; match data for current match index:
+		     (reb-target-binding index))
+		    nil))
+	    ;; index to end of subexp 0 (end of whole match):
+	    1))))))
+   (t (re-search-forward regexp limit noerror count buffer))))
+
+(defun reb-re-search-backward (regexp &optional limit noerror count buffer)
+  (cond
+   ((reb-perl-syntax-p)
+    (let* ((vector (reb-target-binding reb-perl-match-vector))
+	   (index (reb-target-binding reb-perl-match-index))
+	   (max-index (1- (length vector))))
+      ;; Make sure to preserve previous match data on failing search!
+      (if (> index 0)
+	(decf index))
+      (when (>= index 0)
+	(with-current-buffer reb-target-buffer
+	  (setq reb-perl-match-index index))
+	(store-match-data
+	 (apply 'append
+		(append
+		 (elt
+		  vector
+		  (reb-target-binding index))
+		 nil)))
+	(with-current-buffer reb-target-buffer
+	  (goto-char
+	   (elt
+	    (apply 'append
+		   (append
+		    (elt
+		     vector
+		     ;; match data for current match index:
+		     (reb-target-binding index))
+		    nil))
+	    ;; index to end of subexp 0 (end of whole match):
+	    1))))))
+   (t (re-search-backward regexp limit noerror count buffer))))
+
 (provide 're-builder)
 
 ;;; arch-tag: 5c5515ac-4085-4524-a421-033f44f032e7

File re-builder.texi

 @node Regex Types, Questions, Usage, Top
 @chapter Regex Types
 
-Currently @pkgname{} understands five different forms of input, namely
-@code{read}, @code{string}, @code{rx}, @code{sregex} and @code{lisp-re}
-syntax.  Read syntax and string syntax are both delimited by @code{"}s
-and behave according to their name.  With the @code{string} syntax
-there's no need to escape the backslashes and double quotes simplifying
-the editing somewhat.  The other three allow editing of symbolic regular
-expressions supported by the packages of the same name.  (@code{lisp-re}
-is a package by me and its support may go away as it is nearly the same
-as the @code{sregex} package in Emacs)
+Currently @pkgname{} understands six different forms of input, namely
+@code{lisp-re}, @code{perl}, @code{read}, @code{rx}, @code{sregex},
+and @code{string} syntax.  Read syntax and string syntax are both
+delimited by @code{"}s and behave according to their name.  With the
+@code{perl} and @code{string} syntax there's no need to escape the
+backslashes and double quotes simplifying the editing somewhat.  The
+other four allow editing of symbolic regular expressions supported by
+the packages of the same name.  (@code{lisp-re} is a package by me and
+its support may go away as it is nearly the same as the @code{sregex}
+package in Emacs)
 
 Note that the @code{sregex,} @code{rx} and @code{lisp-re} syntaxes will
 only be available in XEmacs if you've installed them yourself.