1. Brodie Rao
  2. linkifier



Python Library Documentation: module linkifier

linkifier - Tools for turning links in plain text into HTML links
linkify(text, substitutions=None, wordregex=r"([w@$#~](?:[w!*'();:@&=+$,/?#[]_.~%-]*[w@&=+$/#_~])?)")

Turn URLs in text into links and perform custom URL substitutions.

For example:

>>> linkify('wikipedia.org is pretty neat')
'<a href="http://wikipedia.org">wikipedia.org</a> is pretty neat'

If substitutions are specified, words that don't look like URLs can be turned into links:

>>> linkify('see #123 for more info', [(r'#(\d+)', r'issues.com/\1')])
'see <a href="http://issues.com/123">#123</a> for more info'

If multiple substitutions are specified, they are performed in order until one matches:

>>> linkify('fixes PROJ-123',
...         [(r'#(\d+)', r'/issues/\1'),
...          (r'([A-Z]{4})-(\d+)', r'http://issues.net/\1/\2')])
'fixes <a href="http://issues.net/PROJ/123">PROJ-123</a>'

If a substitution would return text that isn't a URL, it's ignored and the remaining substitutions are processed:

>>> linkify('fixes PROJ-123',
...         [(r'.+', r'javascript:alert("boo!")'),
...          (r'([A-Z]{4})-(\d+)', r'http://issues.net/\1/\2')])
'fixes <a href="http://issues.net/PROJ/123">PROJ-123</a>'

Note that substitution patterns will only match words in their entirety. In other words, the pattern implicitly begins with '^' and implicitly ends with '$':

>>> linkify('X Y XX YY',
...         [(r'^X', r'x.com'),
...          (r'Y$', r'y.com')])
'<a href="http://x.com">X</a> <a href="http://y.com">Y</a> XX YY'

A replacement function can be provided instead of a string. If the function raises ValueError, the substitution is ignored and the remaining substitutions are processed:

>>> def replace(match):
...     number = match.group(0)
...     if int(number) % 3 != 0:
...         raise ValueError
...     return 'http://d3.us/' + number
>>> linkify('5 6 7 8 (d3.us)', [(r'\d+', replace)])
'5 <a href="http://d3.us/6">6</a> 7 8 (<a href="http://d3.us">d3.us</a>)'

For the daring programmer, wordregex can be customized to allow different word matching semantics:

>>> linkify('google.com #123', [('#(\d+)', r'/issues/\1')], r'(#\d+)')
'google.com <a href="/issues/123">#123</a>'
linkifyword(word, text=None, requiredomain=True)

Turn a word into an HTML anchor if it looks like a URL.

For example:

>>> linkifyword('google.com/search')
'<a href="http://google.com/search">google.com/search</a>'
>>> linkifyword('http://google.com/reader')
'<a href="http://google.com/reader">http://google.com/reader</a>'
>>> linkifyword('/about.html', requiredomain=False)
'<a href="/about.html">/about.html</a>'

If text is specified, it's used as the anchor text:

>>> linkifyword('http://bar.io/baz', 'baz')
'<a href="http://bar.io/baz">baz</a>'

If it doesn't look like a URL, ValueError is raised:

>>> linkifyword('localhost/blah')
Traceback (most recent call last):
ValueError: not a URL
urlizeword(word, requiredomain=True)

Turn word into a URL if it looks valid.

Example usage:

>>> urlizeword('http://foo!')
>>> urlizeword('https://bar!')
>>> urlizeword('baz.io/quux')

If it doesn't look like a URL, ValueError is raised:

>>> urlizeword('localhost/blah.blah.blah')
Traceback (most recent call last):
ValueError: not a URL

If requiredomain is False, URLs starting with a slash are allowed:

>>> urlizeword('/foo/bar', requiredomain=False)
__all__ = ['urlizeword', 'linkifyword', 'linkify']