Commits

Søren Løvborg  committed 2879f74

Rework trailing punctuation handling to allow URLs in single quotes and parentheses.

  • Participants
  • Parent commits a20a4a8

Comments (0)

Files changed (2)

File UrlLinker-example.php

 Here are some URLs:
 stackoverflow.com/questions/1188129/pregreplace-to-detect-html-php
 Here's the answer: http://www.google.com/search?rls=en&q=42&ie=utf-8&oe=utf-8&hl=en. What was the question?
-A quick look at http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax is helpful.
+A quick look at 'http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax' is helpful.
 There is no place like 127.0.0.1! Except maybe http://news.bbc.co.uk/1/hi/england/surrey/8168892.stm?
 Ports: 192.168.0.1:8080, https://example.net:1234/.
-Beware of Greeks bringing internationalized top-level domains: xn--hxajbheg2az3al.xn--jxalpdlp.
+Beware of Greeks bringing internationalized top-level domains (xn--hxajbheg2az3al.xn--jxalpdlp).
 10.000.000.000 is not an IP-address. Nor is this.a.domain.
 
 <script>alert('Remember kids: Say no to XSS-attacks! Always HTML escape untrusted input!');</script>

File UrlLinker.php

 $rexUsername  = '[^]\\\\\x00-\x20\"(),:-<>[\x7f-\xff]{1,64}';
 $rexPassword  = $rexUsername; // allow the same characters as in the username
 $rexUrl       = "$rexProtocol(?:($rexUsername)(:$rexPassword)?@)?($rexDomain|$rexIp)($rexPort$rexPath$rexQuery$rexFragment)";
-$rexTrailPunct= '[?.!,;:"]';
-$rexUrlLinker = "{\\b$rexUrl(?=$rexTrailPunct?(\s|$))}";
+$rexTrailPunct= "[)'?.!,;:]"; // valid URL characters which are not part of the URL if they appear at the very end
+$rexNonUrl    = "[^-_$+.!*'(),;/?:@=&a-zA-Z0-9]"; // characters that should never appear in a URL
+$rexUrlLinker = "{\\b$rexUrl(?=$rexTrailPunct*($rexNonUrl|$))}";
 
 /**
  *  $validTlds is an associative array mapping valid TLDs to the value true.