OverZealous Creations Remark
Remark is a library for taking (X)HTML input and outputting clean Markdown, Markdown Extra, or MultiMarkdown compatible text. The purpose of this conversion is mainly to allow for the use of client-side HTML GUI editors while retaining safe, mobile-device editable markdown text behind the scenes. It is recommended that the markdown text is stored, to reduce XSS attacks by code injection.
Example Usage Scenario
- The user logs in from their desktop.
- Adding some text, the user inputs into a full-featured GUI, such as Dojo's rich text editor, or any of these editors.
- The webserver takes the generated HTML, which may contain a lot of bad HTML, depending on the browser, and passes it to Remark.
- Remark passes the HTML to jsoup, to clean up the input text, which strips unsupported HTML tags (the text will remain).
- Remark walks the generated DOM tree, and outputs clean, structured markdown text.
- The markdown text is returned.
- The webserver stores this markdown text for future display.
- The user chooses to re-edit the HTML text from their desktop.
- The webserver converts the Markdown back to HTML, and sends it to the client.
- Repeat the steps above to save it.
- The user later logs in from their mobile device.
- Mobile devices often not support rich text editing through the web browser.
- So, instead, render a plain text field with the raw markdown text.
- Because markdown is relatively easy to read and edit, the user can make simple changes without struggling with hundreds of messy HTML tags.
- If necessary, Remark can accept this input as well, and strip any HTML tags for security.
Remark can be configured to output extra functionality beyond straight markdown.
- Markdown Extra tables or Multimarkdown tables (which add column spanning support), including a best-guess attempt at alignment (based on style or align attributes)
- Reversal of various smart HTML entities or unicode characters:
- Simplified hardwraps — A
<br/>is converted to just a single linebreak, instead of
(space)(space)(newline), common in most third-party markdown renderers
- Autolinks — a link that has the same content as it's label (and starts with http or https) is simply rendered as is, like
- Markdown Extra definition lists
- Markdown Extra abbreviations
- Fenced code blocks, using either Markdown Extra's format using
~~~, or Github's format using
- Customization of allowed HTML tags - not really recommended.
The basic theory is that you match the extensions to your Markdown conversion library.
A Note on Forking:
Want to fork this project? Great! However, please note that I use [hglfow] to manage the develop-release cycle. If you are uncomfortable with that, that's fine, too! Just switch to the develop branch before working, or I won't be able to easily merge the changes back in.
Source code build is done via Gradle.