Overview
Atlassian Sourcetree is a free Git and Mercurial client for Windows.
Atlassian Sourcetree is a free Git and Mercurial client for Mac.
(* OASIS_START *) (* DO NOT EDIT (digest: c6966efed524d07847a43202f41b43d4) *) cadastr - OCaml data structures: generic interfaces, implementations ==================================================================== See the file [INSTALL.txt](INSTALL.txt) for building and installation instructions. Copyright and license --------------------- cadastr is distributed under the terms of the GNU Lesser General Public License version 3.0 with OCaml linking exception. (* OASIS_STOP *) What is it? Cadastr ( = "OCaml Data Structures") is a library that allows to work with OCaml data structures in uniform manner. Also there are some simple ways to work around strings mutability and unicode characters, but it's completely non-restrictive, so you can mutate every string you want. See section "Strings" below. What does the Cadastr provide? Interfaces and implementations for common data structures: containers (todo), maps from keys to values, maybe other data structures too, later. The interface or implementation is added on demand, when the author/contributor wants to use it in his work. How it is made? Data structures are wrapped in classes and objects, and the method call is dispatched to the correct datastructure's implementation using classical OO method calls dispatching. So the functions which work with Cadastr values does ignore their representation. As for values' creation, you can define your own classes. For example, you can use either class c_seg_map = Simp.map_rw_assoc [seg, disp_handler] ~keq:String.eq; to use assoc lists for values created with "new c_seg_map", or module Tr = Simp.Tree(String); class c_seg_map = Tr.map_rw_tree [disp_handler]; to use trees (OCaml stdlib's Map module) for "new c_seg_map" and consequent work with these values. Drawbacks (read carefully!) With Cadastr abstractions you can easily plug another underlying data structure for your algorithm, avoiding any code modifications. But any abstraction have a price to pay usually. As for Cadastr, the price is the performance. - Every value is wrapped into object: - Each value uses more memory (the overhead is constant for each value and depends on the type of the value (i.e. size of the object value)) - Each value creation uses more CPU (the overhead is constant and depends on the implementation of value's constructor, usually it is small) - Every operation on value is a method call: it is slower than direct function call. OCaml compiler does optimize it in some cases, but anyway it's slower. Do not use Cadastr in a very resource-bound code! However, usually you can use Cadastr ignoring the performance overhead, because it is small enough for the most of applications. Strings. Strings/Unicode. OCaml is a language with a great history and an old age. Many programs rely on the fact that strings are very very fast: they are represented as character arrays, where each character has fixed width (8 bits), they are null-terminated, but may contain \0 characters, they use clever encoding to calculate their length in two memory reads. But the time goes on. Unicode is the standard way to represent text. There exist a library that gives you the full unicode power -- it's the perfect "Camomile" [1] library. Also, there is convenient unicode support in "OCaml Batteries Included" [2] programming environment. But it is heavy enough for most of programs. On author's experience, only 20% of software written required any knowledge about characters' encoding (ascii/latin1/one-byte or utf8/unicode), and less than 5% of software required full unicode support from libraries (it was Camomile library in these cases). (of course it's a one man's experience only.) Also, UTF-8 has very good properties to use it just like usual one-byte strings: you can input/output it, you can concatenate UTF-8 strings to produce valid UTF-8 strings, you can concatenate it with any ASCII 7-bit strings, and it will be perfectly valid. The UTF-8 does not use \0 characters, so it's compatible with NUL-terminated strings. However, you can't get a string's character by its offset in constant time (only O(offset) time), and you can't modify the character by it offset in characters, since UTF-8 represents different unicode characters with byte sequences of different lengths (1..4 bytes). So Cadastr uses a partial solution: if you work with UTF-8 encoded strings, you can "open Cd_All;; open Strings.Utf8;;" and use only restricted set of functions (see file "test/test.ml", there are examples of commented code that does not compile when uncommented). The type "char" is Chars.Unicode.Char.t = private int. Some functions like "length" or "sub" will be added later, on demand / on need. But there won't be any support of mutating UTF-8 strings -- use "ropes" instead (for example, in Batteries [2]). [1] -- http://camomile.sourceforge.net/ [2] -- http://batteries.forge.ocamlcore.org/ Strings/Mutability. Except the unicode issue, OCaml strings are not strings in the common sense, but the "byte arrays". The great performance is gained because of this approach, but there are drawbacks in safety: the code can mutate YOUR string, and you can't prevent it. Of course, the OCaml coders are very gentle and they probably won't write the code that mutates YOUR strings. Cadastr won't give you a total solution to this problem, but using "open Cd_All;; open Strings.Latin1;;" or "... open Strings.Utf8" you give a guarantee that you will not mutate THEIR strings, there simply no Strings.Latin1.String.{set,fill,blit} operations (the operations that will be named as "String.{set,fill,blit}" after "open Strings.Latin1"). The type Strings.Yourencoding.String.t = private string, so you can easily mutate it after coercion to usual string type, but any such coercion is a visible operation, you won't write it mindlessly (just like you'll never use Obj module without a very good reason). Also, this open also redefines type "char" and operator "^" to the type of character of strings that you have opened for use. Of course there is a great need in byte arrays and bytes, so there are modules Byte (just "type t = char") and Bytes (equal to stdlib's String module) for any kind of byte arrays manipulations. Any string value can be created by copying the byte buffer (or a piece of it, in future versions) with function String.of_bytes (and with String.of_bytes_sub in future versions). Bytes.t = string, so there is no overhead, just the naming issue (it could be named "Strings.MutableOneByte", but "Bytes" looks better). Authors. - Dmitry Grebeniuk < gdsfh1 at gmail dot com > (commits are sponsored by Amatei)