Major tidy of CodePoint and CodePointSet and creation of UnicodePoint class. This has been tested/developed on 12 papers. Missing characters have been added as discovered and the system resolves all codepoints in the set.

The unicode points are becoming formalized. Currently split over 32-127 and "high" code points. These will be united. For every uniCodepoint we need:

  • value (e.g. U+0394)
  • unicodeName (e.g. GREEK CAPITAL LETTER DELTA) from fileformat pages. I hope this is a standard name (apart from case)
  • trivial name. This is unclear - there are adobe names, html entities, etc.

There may also be replacement values (e.g. for ligatures or visually identical glyphs). These are NOT replaced in pdf2svg but are available for other software (e.g. svgplus).

For non-unicode points the following are required: unicode point. If unknown then there is a default UNKNOWN (U+274E at present) name OR * decimal value The decimal value is NOT necessarily the unicode decimal and may be a fixed point in a non-unicode set or just a local index. Lookup can be by name of value - there may have to be precedence rules.

non-unicode character sets are a mess!

