Overview

Sane OCaml String API

This library is a set of APIs defined with module types, and a set of modules and functors implementing one or more of those interfaces.

The APIs define what a character and a string of characters should be.

Module Types (APIs)

We have:

  • BASIC_CHARACTER: characters of any length.
  • NATIVE_CONVERSIONS: functions to transform from/to native OCaml strings.
  • BASIC_STRING: immutable strings of (potentially abstract) characters:
    • includes NATIVE_CONVERSIONS,
    • contains a functor to provide a thread agnostic output function: Make_output: OUTPUT_MODELsig val output: ... end.
  • UNSAFELY_MUTABLE: mutability of some string implementations (“unsafe” meaning that they break immutability invariants/assumptions).
  • MINIMALISTIC_MUTABLE_STRING: abstract mutable string used as argument of the Of_mutable functor.

Implementations

Native OCaml Characters

The Native_character module implements BASIC_CHARACTER with OCaml's char type.

Native OCaml Strings

The Native_string module implements BASIC_STRING and UNSAFELY_MUTABLE with OCaml's string type (and hence Native_character).

Lists Of Arbitrary Characters

List_of is a functor: BASIC_CHARACTERBASIC_STRING, i.e., it creates a string datastructure made of a list of characters.

Build From Basic Mutable Data-structures

The functor Of_mutable uses an implementation of MINIMALISTIC_MUTABLE_STRING to build a BASIC_STRING.

Integer UTF-8 Characters

The Int_utf8_character module implements BASIC_CHARACTER with OCaml integers (int) representing Utf8 characters (we force the handling of not more than 31 bits, even if RFC 3629 restricts them to end at U+10FFFF, c.f. also wikipedia). Note that the function is_whitespace considers only ASCII whitespace (useful while writing parsers for example).

Examples, Tests, and Benchmarks

See the file sosa_test.ml for usage examples, the library is tested with:

  • native strings and characters,
  • lists of native characters (List_of(Native_character)),
  • lists of integers representing UTF-8 characters (List_of(utf8-int array)),
  • arrays of integers representing UTF-8 characters (Of_mutable(utf8-int array)),
  • bigarrays of 8-bit integers (Of_mutable(int8 Bigarray1.t)).

The tests are a self-compiling “Shell-then-OCaml-script” which depends on the Nonstd, and the OCaml Bigarray libraries:

./test/sosa_test.ml

and you may add the basic benchmarks to the process with:

./test/sosa_test.ml bench