1. Arthur McBain
  2. BB-Code Parser


BB-Code Parser is a bulletin-board code parsing library released under the 3-clause
BSD license and available in both PHP and JavaScript. It aims to ensure feature and
output compatibility between both implementations.

Sample bb-code:
  [b]bold text![/b]

The library is also configurable so it can handle non-standard (non-bb-code)
syntax, such as :smile: or :coffee: which is more like the emoticon syntax used by
many internet sites. In fact, two differently configured instances of the parser
could actually handle both syntaxes in the same text over two passes.

  [b]I :heart: bold text![/b]

By default the parser supports conversion of the following bb-code tags to X/HTML:
b, i, u, s, font, size, color, left, center, right, quote, code, codebox, url, img,
ul, ol, li, list, * (equivalent to li)

The parser also defaults to a mode called "all or nothing" which returns the input
if it finds invalid or unmatched codes. (Codes which have a valid implementation
but are not allowed are not considered invalid.) If this parser is used for posts
on a site that does not allow editing of those posts, it is now recommended to turn
the "all or nothing" mode off. The parser after version 1.0.1 is better at handling
ambiguous user input so the intended output results. Even if not, such as for user
mistakes like "[b]text[/u]", it may be preferable to users to try to render as much
correctly as possible rather than nothing at all. A future version of this parser
might default "all or nothing" to off.

This parser does not do anything special by default for line breaks. As such, a
simple string find-and-replace function can be run over the output to convert line
breaks to appropriate markup. Replacement done before sending input to the parser
may result in the replacement markup being escaped, depending on the parser
settings. Please see issue #22 for additional reasons why the library doesn't
handle line breaks by default, and how to add a custom code implementation for
explicit declaration of line breaks in bb-code.

In order to keep this README short, I haven't include any usage details. Please see
the top of either the PHP or JS bb-code parser file for how to use this library.

Please note starting in 1.0 the way options are passed to the constructor and
format method has changed. The last version with the old API is available as tag
v.9 under the downloads section.

(Please do not use the 1.0 tag as the parser may be unexpectedly broken.)

Python port notes

Currently only Py3k+ is explicitly supported and tested. If you require Python 2.7
support and the library as it stands does not work with 2.7, please file a proposal
in the issue tracker for consideration.

There are some differences in the Python port from the other two versions. They are

  The filename has no dashes because dashes are not a valid Python identifier which
  is required to be able to import it. It also includes the python version

  The library mostly conforms to PEP8. This means underscores_are_used_everywhere.
  Check pylint.rc for the suppressed messages and where formatting might differ
  from PEP8. Notably there's too many lines, some functions have too many
  arguments, statements, or variable declarations, and lines are not a max of 80

  BBCodeParser_MultiTokenizer is known just as MultiTokenizer as in Python the file
  acts as a suitable namespace without the clutter of actually having to "assign"
  things to one, like in JS, or the ugly syntax of PHP's namespaces. Well, and PEP8
  doesn't like underscores in class names.

Any other significant differences not listed here are related to internal members,
methods, functions, or classes intended to be used by the parser only.