Wiki

Clone wiki

Diff Text / Home

Diff Text 1.x

Diff Text is a free web-based utility that finds the differences between plain text pasted into two of its text boxes. As it has a web-interface it can be used from any operating system.

The code was ported from a Windows desktop application, which the authors intend to release in 2013. The Diff Text algorithm is currently used by Selection Diff Tool, which is an app for Microsoft Excel 2013 and Word 2013. It is available from the Microsoft Office Store.

Virtually all software tools that compare text use the longest common subsequence algorithm which finds the longest in-order sequence between two strings. If an element is moved up or down in the document and out of the flow, then it will not be detected by the longest common subsequence algorithm. By default applications will report this as an unlinked addition and deletion. This is confusing for users. A paragraph simply moved up or down or a line of moved source code is said to be deleted or added, when it was just moved.

Some more recent utilities do attempt to detect moves, but they do so in a haphazard manner. Only some moves will be detected or heuristics are used and the attempt only works some of the time.

Diff Text is a new type of difference algorithm that will compare paragraphs and lines that are only partially identical and may have been moved as well.

The issue is that one original line of text or phrase may have multiple possible destinations in the modified document. The algorithm must chose the correct one to ensure a minimum edit distance between the two files.

When the user specifies that moved text is not to be detected, the algorithm runs in m log n time, which is an improvement from quadratic time often seen in off-the-shelf implementations of the longest commons subsequence algorithm. m and n refers to the sizes of the original and modified texts.

Diff Text works with Mac, Windows and Unix line terminators.

Features

Diff Text uses a new kind of algorithm which offers the following:

  1. Detection of moved text
  2. The ability of detect re-ordered phrases or sentences within a line or paragraph. The background color is changed to alternately light blue and yellow to indicate re-ordering.
  3. The ability to properly compare lines or paragraphs that although not identical have some similarity
  4. Navigation from one difference to another
  5. Choice of comparing at the level of whole lines, words or characters
  6. Choice of combining original and modified text into one pane or displayed in separate horizontal and vertical panes
  7. Ability to omit identical text from the report or just context on either side of differences

Updated