# Overview

Atlassian Sourcetree is a free Git and Mercurial client for Windows.

Atlassian Sourcetree is a free Git and Mercurial client for Mac.

# Introduction

This is a tutorial and reference for tab, a kind of programming language/shell calculator.

Why another programming language? Because `tab`

is a special programming language unlike any other:

- It's statically-typed and type-infered.
- It also infers memory consumption and guarantees O(n) memory use.
- It is designed for concise one-liner computations right in the shell prompt.
- It features both a mathematics library and a set of data slicing and aggregation primitives.
- It is faster than all other interpreted languages with a similar scope. (Perl, Python, awk, ...)
- It is not Turing-complete. (But can compute virtually anything nonetheless.)
- It is self-contained: distributed as a single statically linked binary and nothing else.
- It has no platform dependencies.

You can think of `tab`

as a kind of general-purpose query language for text files.

(Also see 'Comparison' below, and a cookbook of examples.)

Skip to:

- Tutorial
- Reference
- Advanced features:

## Compiling and installing

Type `make`

. Requires a modern C++ compiler. Recent versions of gcc (4.9 and up) and clang will work.

Copy the resulting binary of `tab`

somewhere in your path.

If you want to use a compiler other than gcc, e.g., clang, then type this:

$ CXX=clang++ make

## Usage

The default is to read from standard input:

```
$ cat mydata | tab <expression>...
```

The result will be written to standard output.

You can also use the `-i`

flag to read from a named file:

$ tab -i mydata <expression>...

If your `<expression>`

is too long, you can pass it in via a file, with the `-f`

flag:

$ tab -f mycode <expression>...

(In this case, the contents of `mycode`

will be appended to `<expression>`

, separated with a comma.)

Run `tab -h`

to see the rest of the supported command-line parameters. The binary comes with built-in documentation; use `-h`

to read a complete language reference right in your shell prompt.

## Language tutorial

### Basic types

`tab`

is a statically-typed language. However, you will not need to declare any types, the appropriate type information will be deduced automatically, and any errors will be reported before execution.

There are four basic atomic types:

**Int**, a signed integer. (Equivalent to a`long`

in C.)**UInt**, an unsigned integer. (Equivalent to an`unsigned long`

in C.)**Real**, a floating-point number. (Equivalent to a`double`

in C.)**String**, a string, stored as a byte array.

There are also four structured types:

**Tuple**, a sequence of several values of (possibly) different types. The number of values and their types cannot change at runtime.**Array**, an array of values. Elements can be added and removed at runtime, but the type of all of the values is the same and cannot change.**Map**, a hash map (associative array) from values to values. Like with the array, elements can be added and removed, but the type of keys and values cannot change.**Sequence**, a.k.a. "lazy list" or "generator". A sequence doesn't store any values, but will generate a new element in the sequence each time is asked to. As with arrays, all generated elements are of the same type.

Structures can be composed together in complex ways. So, for example, you cannot mix integers and strings in an array, but you can store pairs of strings and integers. (A pair is a tuple of two elements.)

When outputing, each element of an array, map or sequence is printed on its own line, even when nested inside some other structure. The elements of a tuple are printed separated by a tab character, `\t`

.

(So, for example, a printed sequence of arrays of strings looks exactly the same as a sequence of strings.)

Maps, by default, store values in an unspecified order. Use the `-s`

command-line parameter to force a strict ordering on map keys.

### Atomic types

The default number type in `tab`

is the unsigned integer. A plain sequence of digits will be interpreted as a `UInt`

. When you need an explicitly signed `Int`

, put an `s`

, `i`

or `l`

suffix onto the digits; for example, `1996l`

. All three suffixes are equivalent, they are syntactic sugar.

Floating-point number literals can be entered using a `.`

or using scientific notation; for example, `3.`

or `3e0`

.

String literals are delimited with single or double quotes. Both are equivalent. (Again, syntactic sugar.) A limited set of escape characters are supported within strings: `\t`

, `\n`

, `\r`

, `\e`

, `\\`

, `\'`

, `\"`

.

### Control structures

`tab`

has no loops or conditional "if" statements; the input expression is evaluated, and the resulting value is printed on standard output.

Instead of loops you'd use sequences and comprehensions.

The input is a file stream, usually the standard input. A file stream in `tab`

is represented as a sequence of strings, each string being a line from the file. (Lines are assumed be be separated by `\n`

.)

Built-in functions in `tab`

are polymorphic, meaning that a function with the same name will act differently with input arguments of different types.

You can enable a verbose debug mode to output the precise derivations of types in the input expression:

`-v`

will output the resulting type of the whole input expression`-vv`

will output the resulting type along the the generated virtual machine instruction codes and their types`-vvv`

will output the parse tree along with the generated code and resulting type.

### Examples

An introduction to `tab`

in 10 easy steps.

###### 1.

```
$ ./tab '@'
```

This command is equivalent to `cat`

. `@`

is a variable holding the top-level input, which is the stdin as a sequence of strings. Printing a sequence means printing each element in the sequence; thus, the effect of this whole expression is to read stdin line-by-line and output each line on stdout.

###### 2.

$ ./tab 'sin(pi()/2)' 1 $ ./tab 'cos(1)**2+sin(1)**2' 1

`tab`

can also be used as a desktop calculator. `pi()`

is a function that returns the value of *pi*, `cos()`

and `sin()`

are the familiar trigonometric functions. The usual mathematical infix operators are supported; `**`

is the exponentiation oprator.

###### 3.

```
$ ./tab 'count(@)'
```

This command is equivalent to `wc -l`

. `count()`

is a function that will count the number of elements in a sequence, array or map. Each element in `@`

(the stdin) is a line, thus counting elements in `@`

means counting lines in stdin.

###### 4.

```
$ ./tab '[ grep(@,"[a-zA-Z]+") ]'
```

This command is equivalent to `egrep -o "[a-zA-Z]+"`

. `grep()`

is a function that takes two strings, where the second argument is a regular expression, and outputs an array of strings -- the array of any found matches.

`[...]`

is the syntax for *sequence comprehensions* -- transformers that apply an expression to all elements of a sequence; the result of a sequence comprehension is also a sequence.

The general syntax for sequence comprehensions is this: `[ <element> : <input> ]`

. Here `<input>`

is evaluated (once), converted to a sequence, and each element of that sequence becomes the input to the epxression `<element>`

. The result is a sequence of `<element>`

. (Or, in other words, a sequence of transformed elements from `<input>`

.)

If the `: <input>`

part is omitted, then `: @`

is automatically implied instead.

Each time `<element>`

is evaluated, its argument (an individual element in `<input>`

) is passed via a variable that is also called `@`

.

Thus: the expressions `@`

, `[@]`

and `[@ : @]`

are all equivalent; they all return the input sequence of lines from stdin unchanged.

The variables defined in `<element>`

(on the left side of `:`

) are *scoped*: you can read from variables defined in a higher-level scope, but any variable writes will not be visible outside of the `[ ... ]`

brackets.

###### 5.

```
$ ./tab 'zip(count(), @)'
```

This command is equivalent to `nl -ba -w1`

; that is, it outputs stdin with a line number prefixed to each line.

`zip()`

is a function that accepts two or more sequences and returns one sequence of tuples of elements from each input sequence. (The returned sequence stops when any of the input sequences stop.)

`count()`

when called without arguments will return an infinite sequence of successive numbers, starting with `1`

.

###### 6.

```
$ ./tab 'count(:[ grep(@,"\\S+") ])'
```

This command is equivalent to `wc -w`

: it prints the number of words in stdin. `[ grep(@,"\\S+") ]`

is an expression we have seen earlier -- it returns a sequence of arrays of regex matches.

`:`

here is *not* part of a comprehension, it is a special `flatten`

operator: given a sequence of sequences, it will return a "flattened" sequence of elements in all the interior sequences.

If given a sequence of arrays, maps or atomic values then this operator will automatically convert the interior structures into equivalent sequences.

Thus, the result of `:[ grep(@,"\\S+") ]`

is a sequence of strings, regex matches from stdin, ignoring line breaks. Counting elements in this sequence will count the number of matches of `\S+`

in stdin.

**Note:** the unary prefix `:`

operator is just straightforward syntactic sugar for the `flatten()`

builtin function.

###### 7.

```
$ ./tab '{ @ : :[ grep(@,"\\S+") ] }'
```

This command will output an unsorted list of unique words in stdin.

The `{ @ : ... }`

is the syntax for *map comprehensions*. The full form of map comprehensions looks like this: `{ <key> -> <value> : <input> }`

. Like with sequence comprehensions, `<input>`

will be evaluated, each element will be used to construct `<key>`

and `<value>`

, and the key-value pairs will be stored in the resulting map.

If `-> <value>`

is omitted, then `-> 1`

will be automatically implied. If `: <input>`

is omitted, then `: @`

will be automatically implied.

The result of this command will be a map where each word in stdin is mapped to an integer value of one.

(Note: you can use whitespace creatively to make this command prettier, `{ @ :: [ grep(@,"\\S+") ] }`

You can also wrap the expression in `count(...)`

if you just want the number of unique words in stdin.

###### 8.

```
$ ./tab '?[ grepif(@,"this"), @ ]'
```

This command is equivalent to `grep`

; it will output all lines from stdin having the string `"this"`

.

`grepif()`

is a lighter version of `grep()`

: given a string and a regular expression it will return an integer: `1`

if the regex is found in the string and `0`

if it not. (You could use `count(grep(@,"this"))`

instead, but `grepif`

is obviously shorter and quicker.)

`grepif(@,"this"), @`

is a tuple of two elements: the first element is `1`

or `0`

depending on if the line has `"this"`

as a substring, and the second element is the whole line itself.

**Note**: tuples in `tab`

are *not* surrounded by parentheses. There is no syntax for creating nested tuples literally. (Though they can exist as a result of a function call, and there is a built-in function called `tuple`

for doing just that.)

To write a tuple, simply list its elements separated by commas.

`?`

is the *filter* operator: it accepts a sequence of tuples, where the first element of each tuple must be an integer. The output is also a sequence: if a tuple of the input sequence has `0`

as the first element, then it is skipped in the output sequence; if the first element of the input tuple is any other value, then it is removed, and the rest of the input tuple is output.

(So, for example: `?[1,@ : x]`

is equivalent to the original sequence `x`

.)

**Note**: the `?`

operator is straightforward syntactic sugar for the `filter()`

function.

**Note**: the `?[ grepif(@,b), @ : a ]`

expression has a shortcut convenience function, written simply as `grepif(a, b)`

. Thus, one could have simply run `./tab 'grepif(@,"this")'`

instead.

###### 9.

```
$ ./tab '{ @[0] % 2 -> sum(count(@[1])) : zip(count(), @) }'
```

This command will output the number of bytes on even lines versus the number of bytes on odd lines in stdin.

`{ ... : zip(count(), @) }`

is, as before, a map comprehension, with a sequence of pairs (line number, line) as the input.

`@[0] % 2`

is the key in the map: we use the indexing operator `[]`

to select the first element from the input pair, which is the line number. `%`

is the mathematical modulo operator (like in C); line number modulo 2 gives us `0`

for even line numbers and `1`

for odd line numbers.

`sum(count(@[1]))`

is the mapped value in the map. As before, indexing the input pair with `1`

gives us the second element, which is the contents of the line from stdin; `count()`

, when applied to a string, gives us the length of the string in bytes.

`sum()`

is a little tricker: when applied to a number, it returns the input argument, but marks it with a special tag that causes the map comprehension to add together values marked with `sum()`

when groupped together as part of the map's value.

(So, for example, using `sum(1)`

on the right side of `->`

in a map comprehension will count the number of occurences of whatever is on the left side of `->`

.)

###### 10.

```
$ ./tab 'z={ tolower(@) -> sum(1) :: [grep(@,"[a-zA-Z]+")] }, sort([ @~1, @~0 : z ])[-5,-1]'
```

This command will tally a count for each word (first lowercased) in a file, sort by word frequency, and output the top 5 most frequent words.

The `z=`

here is an example of *variable assignment*. Here the variable `z`

will be assigned a map of unique words with their frequencies. (See example 7; `z`

here is the same, except that each word is lowercased and a word count is tallied.)

Variable assignments do not produce a type and do not evaluate to a value; whatever is between the `=`

and the `,`

(the map comprehension in this case) will not be output.

Moving on: `sort()`

is a function that accepts an array, map or sequence and returns its elements in an array, sorted lexicographically. Here we reverse the keys and values in the map `z`

by wrapping it in a sequence, so that the resulting array is sorted by word frequency, not by word.

`@~0`

is syntactic sugar that is completely equivalent to `@[0]`

.

`[-5,-1]`

is the *indexing* operator, which accesses elements in a tuple, array or map. The logic and arguments of this operator differ depending on what type is being indexed:

- Tuples can only be indexed with literal integer values. (Not variables or results of a computation.)
- Maps can be indexed by the key, returning the corresponding value; if the key is not in the map, an error will be signalled. (There is a corresponding
`get`

function that returns a default value instead signalling an error.) - Arrays indexes are more complex, they can be indexed by:
- 0-based integers. (0 being the first element in an array.)
- Negative indexes, where -1 is the last element in the array, -2 is second-to-last, etc.
- Real-valued indexes; in this case 0.0 is interpreted as the first element in the array and 1.0 as the last. (So 0.5 would be the middle element in the array.)
- Splices, which are two comma-separated indexes. In this case a sub-array will be returned, beginning with element referenced by the first index and ending with the element referenced by the last. (The last element is also part of the range, unlike in Python and C++.)

- Strings can be spliced as if they were byte arrays; substrings will returned.

In this case a sub-array of five elements is returned -- the last five elements in the array returned by `sort()`

**Note**: the `[...]`

indexing operator is straightforward syntactic sugar for the `index()`

function.

**Note**: the `~`

indexing operator is equivalent to `[...]`

. It's syntactic sugar to make chained indexes more palatable: `a~0~1`

is equivalent to `a[0][1]`

. (The `~`

will only work for single-element indexes, not splices.)

###### Bonus track

$ ./tab -i req.log ' def stats tuple(avg.@, stdev.@, max.@, min.@, sort.@), def uniq { 1 -> stats(@) }[1], x=[ uint.cut(@,"|",3) ], x=uniq(x), avg=x[0], stdev=x[1], max=x[2], min=x[3], q=x[4], tabulate(tuple("mean/median", avg, q[0.5]), tuple("68-percentile", avg + stdev, q[0.68]), tuple("95-percentile", avg + 2*stdev, q[0.95]), tuple("99-percentile", avg + 3*stdev, q[0.99]), tuple("min and max", real(min), max))' mean/median 1764.54 1728 68-percentile 1933.15 1840 95-percentile 2101.75 1992 99-percentile 2270.35 2419 min and max 0 2508

Here we run a crude test for the normal distribution in the response lengths (in bytes) in a webserver log. (The distrubution of lengths doesn't look to be normally-distributed.)

**Note**: The `f.x`

notation is an alternative syntax for calling functions with only one argument; `f.x`

is completely equivalent to `f(x)`

. (Likewise, `g.f.x`

is equivalent to `g(f(x))`

.)

**Note**: The `def`

keyword is for defining user-defined functions. User-defined functions in `tab`

are polymorphic and bound at call time; they act like templates that are inlined when called. The names of user-defined functions have lexical scope, like variables. (However, they are stored in a separate namespace; you cannot assign a function to a variable.)

You can use parentheses to delimit code blocks in function definitions. For example:

def square_of_square ( def square @*@; square(@)*square(@) ); square_of_square(4)

**Note**: The semicolon is an equivalent way of writing the comma, because multi-line code looks better with semicolons.

Let's check the distribution visually, with a histogram: (The first column is a size in bytes, the second column is the number of log lines; for example, there were 227 log lines with a response size between 1504.8 and 1755.6 bytes.)

$ ./tab -i req.log 'hist([. uint.cut(@,"|",3) .], 10)' 250.8 23 501.6 0 752.4 1 1003.2 0 1254 0 1504.8 227 1755.6 28027 2006.4 19986 2257.2 490 2508 1792

## Comparison

A short, hands-on comparison of `tab`

with equivalent shell and Python scripts.

The input file is around 100000 lines of web server logs, and we want to find out the number of requests for each URL path.

Here is a solution using standard shell utilities:

$ cat req.log | cut -d' ' -f3 | cut -d'?' -f1 | sort | uniq -c

Running time: around 2.7 seconds on my particular (slow) laptop.

Here is an equivalent Python script:

import sys d = {} for l in sys.stdin: x = l.split(' ')[2].split('?')[0] d[x] = d.get(x,0) + 1 for k,v in d.iteritems(): print k,v

Running time: around 3.1 seconds.

Perl:

my %counts; for my $line (<>) { my $path = (split /\?/, (split / /, $line)[2])[0]; $counts{$path}++ } for my $path (keys %counts) { my $count = $counts{$path}; print("$count $path\n"); }

Running time: around 4.1 seconds.

A resonably simple solution using `awk`

:

$ awk -F" " '{ split($3,x,"?"); paths[x[1]]++; } END { for (path in paths) { print paths[path], path }}'

Running time: around 2.1 seconds.

Here is the solution using `tab`

:

```
$ ./tab -i req.log '{cut(cut(@," ",2),"?",0) -> sum(1)}'
```

Running time: around 0.9 seconds.

Not only is `tab`

faster in this case, it is also (in my opinion) more concise and idiomatic.

## Reference

### Grammar

expr := atomic_or_assignment (("," | ";") atomic_or_assignment)* atomic_or_assignment := assignment | define | atomic assignment := var "=" atomic define := def_fun | def_struct def_fun := "def" var (atomic | "(" expr ")") def_struct := "def" "[" var atomic? ("," var atomic?)+ "]" atomic := e_andor e_andor := e_eq | e_eq "&&" e_eq | e_eq "||" e_eq e_eq := e_bit | e_bit "==" e_bit | e_bit "!=" e_bit | e_bit "<" e_bit | e_bit ">" e_bit | e_bit "<=" e_bit | e_bit ">=" e_bit e_bit := e_add | e_add "&" e_add | e_add "|" e_add | e_add "^" e_add e_add := e_mul | e_mul "+" e_mul | e_mul "-" e_mul e_mul := e_exp | e_exp "*" e_exp | e_exp "/" e_exp | e_exp "%" e_exp e_exp := e_not | e_not "**" e_not e_not := e_flat | "!" e_not e_flat := e_idx | ":" e_flat | "?" e_flat e_idx := e | e ("[" expr "]")* | e ("~" e)* e := literal | funcall | var | array | map | seq | paren literal := real | int | uint | string funcall := funcall_paren | funcall_dot funcall_paren := var "(" expr ")" funcall_dot := var "." atomic array := "[." "try"? expr (":" expr)? ".]" map := "{" "try"? expr ("->" expr)? (":" expr)? "}" seq := "[" "try"? expr (":" expr)? "]" paren := "(" atomic ")" var := "@" | [a-zA-Z][a-zA-Z0-9_]* digits := [0-9]+ int := "-" digits+ | digits ("i" | "s" | "l") uint := digits "u"? | ("0x" | "0X") [0-9a-fA-F]+ real := [-+]? digits ("." [0-9]*)? ([eE] [-+]? digits)? string := '"' chars '"' | "'" chars "'" chars := ("\t" | "\n" | "\r" | "\e" | "\\" | any)*

### Semantics

##### Expressions

An expression is either an atomic value, an assignment or definition. Assignments and definitions do not produce a value and return nothing.

Expressions separated by `,`

or `;`

are a tuple. A tuple is itself an expression and a value.

Note: tuples cannot be surrounded by parentheses; if you need to nest tuples, use the builtin function named `tuple`

.

This expression produces the tuple `(0, 1)`

:

0, a = 1, def b @; b(a)

##### Variables

Variables are single-assignment: you cannot change the value of an existing variable.

Assigning to a variable with a name that already exists will create a new variable; the old variable will become unreachable.

This is a legal expression that returns `2`

:

a = 1, a = a + 1, a

This is also a legal expression, and will return a sequence of ten numbers `2`

:

a = 1, [ a = a + 1, a : count.10 ]

##### Defining functions

Functions can be defined with the `def`

keyword. All function calls are always inlined, and recursive function calls are impossible.

There are three forms for `def`

:

`def f expr`

: defines the functon`f`

, and`expr`

is an atomic value.`def f (expr)`

: same, but`expr`

can be a tuple, including nested definitions and assignments.`def [f expr, g expr, ...]`

: defines two or more functions, an equivalent shortcut for`def f (@=@[0], expr), def g (@=@[1], expr), ...`

. This form is intented to make it easy to give human-readable names to tuple elements. The`expr`

is an atomic value and can be omitted -- the simplest form is`def [f,g,...]`

.

##### Calling functions

There are two function call syntaxes: `f(a, b, ...)`

and `f.a`

. Both are equivalent, except that the first form allows calling a function with a tuple argument.

Note, however, that the `.`

has the lowest precedence! Thus, this code `f.a == 1`

is equivalent to `f(a == 1)`

!

##### Operators

In order of precedence, from highest to lowest:

Operator | Meaning |
---|---|

`a~b` `a[b]` |
Indexing arrays, maps and tuples. See the `index` function. Use `~` with atomic values, while `[]` can accept tuples. |

`:a` `?a` |
Syntactic sugar for the functions `flatten` and `filter` , respectively. |

`a**b` |
Exponentiation. |

`a*b` `a/b` `a%b` |
Multiplication, division, modulo. |

`a+b` `a-b` |
Addition and subtraction. |

`a&b` `aÇ€b` `a^b` |
Binary AND, OR and XOR. |

`a==b` `a!=b` `a<b` `a>b` `a<=b` `a>=b` |
Comparision. |

`a&&b` `aÇ€Ç€b` |
Equivalent to `&` and `Ç€` except with a different precedence. |

Note that arithmetic operators will silently promote the type of the the result as needed. (Subtracting integers always results in a signed integer, adding a real results in a real, etc.)

Also note that function calls will *not* promote numeric types as needed! If a function requires a signed integer, then passing in an unsigned is an error.

The `&&`

and `||`

operators are there because otherwise an expression like `a == b & c == d`

is parsed as `a == (b & c) == d`

and results in a syntax error.

##### Literals

Syntax for literal number and string values:

Type | Syntax |
---|---|

`UInt` |
`1234` or `1234u` or `0x4D2` . Numbers are unsigned by default. Hexadecimal notation is supported for unsigned numbers. |

`Int` |
`-1234` or `1234i` or `1234s` or `1234l` . Numbers must be explicitly marked as signed; `i` , `s` and `l` are all equivalent syntactic sugar. |

`Real` |
`+10.50` or `1.` or `4.4e-10` . Scientific notation and trailing dot are supported. |

`String` |
`'chars'` or `"chars"` . Supported escape sequences: `\t` `\n` `\r` `\e` `\\` . |

##### Magic variables

The magic variable `@`

is used by the language to denote the input value in generator expressions and function definitions.

Note that in all other respects this variable acts like a normal variable.

##### Generator expressions

Type | Syntax |
---|---|

`Seq` |
`[ elt : input ]` |

`Arr` |
`[. elt : input .]` |

`Map` |
`{ key -> value : input }` |

The `: input`

part can be omitted, in which case `: @`

will be silently assumed. For maps the `-> key`

can also be omitted, in which case `-> 1`

will be assumed.

The right-hand argument `input`

will be converted to a sequence of values automatically. If it is a single value, then a sequence of one element will be assumed.

The keyword `try`

can be inserted after the opening bracket; fatal errors while generating elements will then be silently swallowed. (See error handling.)

See also recursion for a generator expression for complex single values.

The left- and right-hand sides can include assigment and definition statements. Anything defined or assigned in a generator expression is limited in scope only to this generator expression.

Thus, this code

[ a=@, @ ], a

Will result in an 'undefined variable' error.

### Builtin functions

Listed alphabetically.

`abs`

- Computes absolute value.

Usage:

`abs Int -> Int`

`abs Real -> Real`

`and`

- Returns 1 if all the arguments are not 0, returns 0 otherwise. Equivalent to
`a & b & c ...`

. See also`or`

.

Usage:

`and (Integer, Integer...) -> UInt`

`array`

- Stores a sequence or map or atomic value into an array. See also
`sort`

for a version of this function with sorting. See also:`iarray`

.

Usage:

`array Map[a,b] -> Arr[(a,b)]`

`array Seq[a] -> Arr[a]`

`array Number|String|Tuple -> Arr[Number|String|Tuple]`

-- returns an array with one element.

**Note:**when arrays are used as values in a map, they will concatenate. (See aggregators below for details.) `avg`

- Synonym for
`mean`

. `box`

- Remembers a value. Returns a 'box', which is a tuple of one remembered value. Stores the second argument in the box if the box is empty. If the box is not empty and the first argument is not zero, then replaces the value in the box with the second argument.

Usage:

`box UInt, a -> (a,)`

`bucket`

- Return a bucket key.
`bucket(x, a, b, n)`

will split the interval`[a, b]`

into`n`

equal sub-intervals and return`x`

rounded down to the nearest sub-interval lower bound. Useful for making histograms. See also:`hist`

.

Usage:

`bucket Number, Number, Number, UInt -> Number`

-- the first three arguments must be the same numeric type. `bytes`

- Accepts a string and returns an array of integers representing the bytes in the string.
*Warning*: this function is not Unicode-aware and assumes the string is an ASCII bytestream.

Usage:

`bytes String -> Arr[UInt]`

`case`

- A switch/case function. The first argument is compared to every argument at position
`n+1`

, and if they compare equal, the argument at position`n+2`

is returned. If none match equal, then the last argument is returned. See also:`if`

.

Example:`[ case(int.@; 1,'a'; 2,'b'; 'c') : count(4) ]`

returns`a b c c`

.

Usage:

`case a,a,b,...,b -> b`

`cat`

- Concatenates strings.

Usage:

`cat String,... -> String`

. At least one string argument is required. `ceil`

- Rounds a floating-point number to the smallest integer that is greater than the input value.

Usage:

`ceil Real -> Real`

`cos`

- The cosine function.

Usage:

`cos Number -> Real`

`count`

- Counts the number of elements.

Usage:

`count None -> Seq[UInt]`

-- returns an infinite sequence that counts from 1 to infinity.

`count UInt -> Seq[UInt]`

-- returns a sequence that counts from 1 to the supplied argument.

`count Number, Number, Number`

-- returns a sequence of numbers from`a`

to`b`

with increment`c`

. All three arguments must be the same numeric type.

`count String -> UInt`

-- returns the number of bytes in the string.

`count Seq[a] -> UInt`

-- returns the number of elements in the sequence. (*Warning*: counting the number of elements will consume the sequence!)

`count Map[a] -> UInt`

-- returns the number of keys in the map.

`count Arr[a] -> UInt`

-- returns the number of elements in the array. `cut`

- Splits a string using a delimiter. See also
`recut`

for splitting with a regular expression.

Usage:

`cut String, String -> Arr[String]`

-- returns an array of strings, such that the first argument is split using the second argument as a delimiter.

`cut String, String, Integer -> String`

-- calling`cut(a,b,n)`

is equivalent to`cut(a,b)[n]`

, except much faster.

`cut Seq[String], String -> Seq[Arr[String]]`

-- equivalent to`[ cut(@,delim) : seq ]`

. `date`

- Converts a UNIX timestamp to a textual representation of a UTC date.

Usage:

`date Int -> String`

-- returns a UTC date in the`"YYYY-MM-DD"`

format. `datetime`

- Converts a UNIX timestamp to a textual representation of a UTC date and time.

Usage:

`datetime Int -> String`

-- returns a UTC date and time in the`"YYYY-MM-DD HH:MM:SS"`

format. `e`

- Returns the number
*e*.

Usage:

`e None -> Real`

`eq`

- Checks values for equality. If the first argument is equal to any of the other arguments, returns 1. Otherwise returns 0.

Usage:

`eq a,a,... -> UInt`

`exp`

- The exponentiation function. Calling
`exp(a)`

is equivalent to`e()**a`

.

Usage:

`exp Number -> Real`

`explode`

- Makes a sequence of sequences from a plain sequence: given an input sequence, returns that sequence for every element in it. Equivalent to
`x=@, [ glue(@, x) ]`

.

Usage:

`explode Seq[a] -> Seq[Seq[a]]`

`file`

- Opens a file and returns the lines in the file as a sequence of strings. (This allows a
`tab`

expression to process several files instead of just one.)

Usage:

`file String -> Seq[String]`

`filter`

- Filters a sequence by returning an equivalent sequence but with certain elements removed. The input sequence must be a tuple where the first element is an integer; elements where this first elelemt is equal to 0 will be removed from the output sequence. See also:
`while`

.

Usage:

`filter Seq[(Integer,a...) -> Seq[(a...)]`

`find`

- Finds a substring match in a string. The first argument is the string to search in, the second argument is the substring. Returns an array of one element containing the substring if found, and an empty array otherwise. See also:
`grep`

,`grepif`

,`findif`

for the rationale.

Usage:

`find String, String -> Arr[String]`

`findif`

- Filter strings that contain a substring. See also:
`grep`

,`grepif`

,`find`

.

Usage:

`findif String, String -> UInt`

-- returns 1 if the first argument contains the second argument as a substring, 0 otherwise. Equivalent to`count(find(a,b)) != 0u`

, except much faster.

`findif Seq[String], String -> Seq[String]`

-- returns a sequence of only those strings that have a substring match. Equivalent to`?[ findif(@,b), @ : a ]`

. `first`

- Return the first element in a pair, map or sequence or pairs. See also:
`second`

.

Usage:

`first a,b -> a`

`first Map[a,b] -> Seq[a]`

`first Seq[(a,b)] -> Seq[a]`

`flatten`

- Flattens a sequence of sequences, a sequence of arrays or a sequence of maps into a sequence of values.

Usage:

`flatten Seq[ Seq[a] ] -> Seq[a]`

`flatten Seq[ Arr[a] ] -> Seq[a]`

`flatten Seq[ Map[a,b] ] -> Seq[(a,b)]`

`flatten Seq[a] -> Seq[a]`

-- sequences that are already flat will be returned unchanged. (Though at a performance cost.) `flip`

- Given a sequence of pairs or a map, returns a sequence where the pair elements are swapped.

Usage:

`flip Seq[(a,b)] -> Seq[(b,a)]`

`flip Map[a,b] -> Seq[(b,a)]`

`floor`

- Rounds a floating-point number to the greatest integer that is less than the input value.

Usage:

`floor Real -> Real`

`get`

- Accesses map or array elements (like
`index`

), but returns a default value if the key is not found in the map or if the index is out of bounds. (Unlike`index`

which throws an exception.)

Usage:

`get Map[a,b], a, b -> b`

-- returns the element stored in the map with the given key, or the third argument if the key is not found.

`get Arr[a], UInt, a -> a`

-- returns the element at the given index, or the third argument if the index is out of bounds. `glue`

- Adds an element to the head of a sequence.
`glue(1, seq(2, 3))`

is equivalent to`seq(1, 2, 3)`

. See also:`take`

,`peek`

.

Usage:

`glue a, Seq[a] -> Seq[a]`

`gmtime`

- Converts a UNIX timestamp to a UTC date and time.

Usage:

`gmtime Int -> Int, Int, Int, Int, Int, Int`

-- returns year, month, day, hour, minute, second. `grep`

- Finds regular expression matches in a string. The first argument is the string to match in, the second argument is the regular expression. Matches are returned in an array of strings. Regular expressions use ECMAScript syntax. See also:
`grepif`

,`find`

,`findif`

.

Usage:

`grep String, String -> Arr[String]`

`grepif`

- Filter strings according to a regular expression. See also:
`grep`

,`find`

,`findif`

.

Usage:

`grepif String, String -> UInt`

-- returns 1 if a regular expression has matches in a string, 0 otherwise. Equivalent to`count(grep(a,b)) != 0u`

, except much faster.

`grepif Seq[String], String -> Seq[String]`

-- returns a sequence of only those strings that have regular expression matches. Equivalent to`?[ grepif(@,b), @ : a ]`

. `has`

- Checks for existence in a map or array.

Usage:

`has Map[a,b], a -> UInt`

-- returns 1 if a key exists in the map, 0 otherwise. The first argument is the map, the second argument is the key to check.

`has Arr[a], a -> UInt`

-- returns 1 if a value is in the array, 0 otherwise. The first argument is the array, the second argument is the value. Equivalent to`has(map.zip(seq.a, count()), b)`

. `hash`

- Hashes a value to an unsigned integer. The FNV hash function (32 or 64 bit depending on CPU architecture) is used.

Usage:

`hash a -> UInt`

`head`

- Accepts a sequence or array and returns an equivalent sequence that is truncated to be no longer than N elements. See also:
`skip`

,`stripe`

.

Usage:

`head Seq[a], UInt -> Seq[a]`

`head Arr[a], UInt -> Seq[a]`

`hex`

- Marks the given unsigned integer such that it is output in hexadecimal.

Usage:

`hex UInt -> UInt`

`hist`

- Accepts an array of numbers and a bucket count and returns an array of tuples representing a histogram of the values in the array. (The interval between the maximum and minimum value is split into N equal sub-intervals, and a number of values that falls into each sub-interval is tallied.) The return value is an array of pairs: (sub-interval lower bound, number of elements). See also:
`bucket`

. - Usage:

`hist Arr[Number], UInt -> Arr[(Real,UInt)]`

`iarray`

- Exactly equivalent to
`array`

, except when printing the elements will be separated with a`;`

instead of a newline.

Usage:

`iarray Map[a,b] -> Arr[(a,b)]`

`iarray Seq[a] -> Arr[a]`

`iarray Number|String|Tuple -> Arr[Number|String|Tuple]`

`iarray Arr[a] -> Arr[a]`

`if`

- Choose between alternatives. If the first integer argument is not 0, then the second argument is returned; otherwise, the third argument is returned. The second and third arguments must have the same type.
*Note*: this is not a true conditional control structure, since all three arguments are always evaluated.

Usage:

`if Integer, a, a -> a`

`index`

- Select elements from arrays, maps or tuples. Indexing a non-existent element will cause an error.

Usage:

`index Arr[a], UInt -> a`

-- returns element from the array, using a 0-based index.

`index Arr[a], Int -> a`

-- negative indexes select elements from the end of the array, such that -1 is the last element, -2 is second-to-last, etc.

`index Arr[a], Real -> a`

-- returns an element such that 0.0 is the first element of the array and 1.0 is the last.

`index Map[a,b], a -> b`

-- returns the element stored in the map with the given key. It is an error if the key is not found; see`get`

for a version that returns a default value instead.

`index (a,b,...), UInt`

-- returns an element from a tuple.

`index Arr[a], Number, Number -> Arr[a]`

-- returns a sub-array from an array,*including*the end element.`index String, Integer, Integer -> String`

-- returns a substring from a string, as with the array slicing above.*Note:*string indexes refer to*bytes*,`tab`

is not Unicode-aware. `int`

- Converts an unsigned integer, floating-point value or string into a signed integer.

Usage:

`int UInt -> Int`

`int Real -> Int`

`int String -> Int`

`int String, Integer -> Int`

-- tries to convert the string to an integer; if the conversion fails, returns the second argument instead. `join`

- Concatenates the elements in a string array or sequence using a delimiter.

Usage:

`join Arr[String], String -> String`

`join Seq[String], String -> String`

`join String, Arr[String], String, String -> String`

-- adds a prefix and suffix as well. Equivalent to`cat(p, join(a, d), s)`

.

`join String, Seq[String], String, String -> String`

`lines`

- Returns its arguments as a tuple, except that each element will be printed on its own line. See also:
`tuple`

.

Usage:

`lines (a,b,...) -> (a,b,...)`

`log`

- The natural logarithm function.

Usage:

`log Number -> Real`

`lsh`

- Bit shift left; like the C
`<<`

operator. (See also`rsh`

.)

Usage:

`lsh Int, Integer -> Int`

`lsh UInt, Integer -> UInt`

`map`

- Stores a sequence of pairs or a single pair into a map.

Usage:

`map Seq[(a,b)] -> Map[a,b]`

`map (a,b) -> Map[a,b]`

-- returns a map with one element.

**Note:**when maps are used as values in other maps, they will merge. (See aggregators below for details.) `max`

- Finds the maximum element in a sequence or array. See also:
`min`

.

Usage:

`max Arr[a] -> a`

`max Seq[a] -> a`

`max Number -> Number`

--**Note:**this version of this function will mark the return value to calculate the max when stored as a value into an existing key of a map. `mean`

- Calculates the mean (arithmetic average) of a sequence or array of numbers. See also:
`var`

and`stdev`

.

Usage:

`mean Arr[Number] -> Real`

`mean Seq[Number] -> Real`

`mean Number -> Real`

--**Note:**this version of this function will mark the returned value to calculate the mean when stored as a value into an existing key of a map. `merge`

- Aggregates a sequence of values.
`merge(a)`

is equivalent to`{ 1 -> @ : a }~1`

, except faster. See also aggregators.

Usage:

`merge Seq[a] -> a`

`min`

- Finds the minimum element in a sequence or array. See also:
`max`

.

Usage:

`min Arr[a] -> a`

`min Seq[a] -> a`

`min Number -> Number`

--**Note:**this version of this function will mark the return value to calculate the min when stored as a value into an existing key of a map. `ngrams`

- Similar to
`pairs`

and`triplets`

, except returns a sequence of arrays of length N instead of tuples.

Usage:

`ngrams Seq[a], UInt -> Seq[Arr[a]]`

`normal`

- Returns random numbers from the normal (gaussian) distribution. (See also:
`rand`

,`sample`

.)

Usage:

`normal None -> Real`

-- returns a random number with mean`0`

and standard deviation`1`

.

`normal Real, Real -> Real`

-- same, but with mean and standard deviation of`a`

and`b`

. `now`

- Returns the current UNIX timestamp.

Usage:

`now None -> Int`

`open`

- Same as
`file`

. `or`

- Returns 0 if all the arguments are 0, returns 1 otherwise. Equivalent to
`a | b | c ...`

. See also`and`

.

Usage:

`or (Integer, Integer...) -> UInt`

`pairs`

- Given a sequence, return a sequence of pairs of the previous sequence element and the current sequence element. Example: given
`[ 1, 2, 3, 4 ]`

will return`[ (1, 2), (2, 3), (3, 4) ]`

. (See also:`triplets`

and`ngrams`

.)

Usage:

`pairs Seq[a] -> Seq[(a,a)]`

`peek`

- Given a sequence, return a pair of its first element and the sequence itself with the first element reattached. Equivalent to
`h=take.@, h, glue(h, @)`

. See also:`take`

,`glue`

.

Usage:

`peek Seq[a] -> (a, Seq[a])`

`pi`

- Return the number
*pi*.

Usage:

`pi None -> Real`

`rand`

- Returns random numbers from the uniform distribution. (See also:
`normal`

,`sample`

.)

Usage:

`rand None -> Real`

-- returns a random real number from the range`[0, 1)`

.

`rand Real, Real -> Real`

-- same, but with the range`[a, b)`

.

`rand UInt, UInt -> UInt`

`rand Int, Int -> Int`

-- returns a random number from the integer range`[a, b]`

. `real`

- Converts an unsigned integer, signed integer or string into a floating-point value.

Usage:

`real UInt -> Real`

`real Int -> Real`

`real String -> Real`

`real String, Real -> Real`

-- tries to convert the string to a floating-point value; if the conversion fails, returns the second argument instead. `recut`

- Splits a string using a regular expression. See also
`cut`

for splitting with a byte string.

`recut String, String -> Arr[String]`

-- returns an array of strings, such that the first argument is split using the second argument as a regular expression delimiter.

`recut String, String, UInt -> String`

-- calling`recut(a,b,n)`

is equivalent to`recut(a,b)[n]`

, except faster.

`recut Seq[String], String -> Seq[Arr[String]]`

-- equivalent to`[ recut(@,delim) : seq ]`

. `replace`

- Search-and-replace in a string with regexes. The first argument is the string to search, the second argument is the regex, and the third argument is the replacement string. Regex and replacement string use ECMAScript syntax.

Usage:

`replace String, String, String -> String`

`resplit`

- A synonym for
`recut`

. `reverse`

- Reverses the elements in an array.

Usage:

`reverse Arr[a] -> Arr[a]`

`round`

- Rounds a floating-point number to the nearest integer.

Usage:

`round Real -> Real`

`rsh`

- Bit shift right; like the C
`>>`

operator. (See also`lsh`

.)

Usage:

`rsh Int, Integer -> Int`

`rsh UInt, Integer -> UInt`

`sample`

- Sample from a sequence of atomic values, without replacement. (See also:
`rand`

,`normal`

.)

Usage:

`sample UInt, Seq[Int] -> Arr[Int]`

`sample UInt, Seq[UInt] -> Arr[UInt]`

`sample UInt, Seq[Real] -> Arr[Real]`

`sample UInt, Seq[String] -> Arr[String]`

-- the first argument is the sample size. `second`

- Return the second element in a pair, map or sequence or pairs. See also:
`first`

.

Usage:

`second a,b -> b`

`second Map[a,b] -> Seq[b]`

`second Seq[(a,b)] -> Seq[b]`

`seq`

- Accepts two or more values of the same type and returns a sequence of those values. (A synonym for
`tabulate`

.)

If one argument is passed, then it is equivalent to`[@ : arg]`

.

Usage:

`seq (a,...),... -> Seq[a]`

`seq Arr[a] -> Seq[a]`

`seq Map[a,b] -> Seq[(a,b)]`

`seq a -> Seq[a]`

`sin`

- The sine function.

Usage:

`sin Number -> Real`

`skip`

- Accepts a sequence or array and returns an equivalent sequence where the first N elements are ignored. See also:
`head`

,`stripe`

.

Usage:

`skip Seq[a], UInt -> Seq[a]`

`skip Arr[a], UInt -> Seq[a]`

`sort`

- Sorts a sequence, array or map lexicographically. The result is stored into an array if the input is a map or a sequence. See also
`array`

a version of this function without sorting.

Usage:

`sort Arr[a] -> Arr[a]`

`sort Map[a,b] -> Arr[(a,b)]`

`sort Seq[a] -> Arr[a]`

`sort Number|String|Tuple -> Arr[Number|String|Tuple]`

--**Note:**this version of this function will return an array with one element, marked so that storing it as a value in an existing key of a map will produce a sorted array of all such values. `split`

- A synonym for
`cut`

. `sqrt`

- The square root function.

Usage:

`sqrt Number -> Real`

`stddev`

- Synonym for
`stdev`

. `stdev`

- Calculates the sample standard deviation, defined as the square root of the variance. This function is completely analogous to
`var`

, with the difference that the square root of the result is taken. See also:`mean`

.

Usage:

`stdev Arr[Number] -> Real`

`stdev Seq[Number] -> Real`

`stdev Number -> Real`

--**Note:**this version of this function will mark the returned value to calculate the standard deviation when stored as a value into an existing key of a map. `string`

- Converts an unsigned integer, signed integer, floating-point number or a byte array to a string.

Usage:

`string UInt -> String`

`string Int -> String`

`string Real -> String`

`string Arr[UInt] -> String`

--**Note:**here it is assumed that the array will hold byte (0-255) values. Passing in something else is an error. This function is not Unicode-aware. `stripe`

- Accepts a sequence or array and returns an equivalent sequence except with only every Nth element. See also:
`head`

,`skip`

.

Usage:

`stripe Seq[a], UInt -> Seq[a]`

`stripe Arr[a], UInt -> Seq[a]`

`sum`

- Computes a sum of the elements of a sequence or array.

Usage:

`sum Arr[Number] -> Number`

`sum Seq[Number] -> Number`

`sum Number -> Number`

--**Note:**this version of this function will mark the value to be aggregated as a sum when stored as a value into an existing key of a map. `take`

- Returns the first element in a sequence. Equivalent to
`array(head(@, 1))[0]`

. See also:`peek`

,`glue`

.

Usage:

`take Seq[a] -> a`

-- gives an error on empty sequence.

`take Seq[a], a -> a`

-- returns the second argument on empty sequence. `tan`

- The tangent function.

Usage:

`tan Number -> Real`

`tabulate`

- A synonym for
`seq`

. `time`

- Converts a UNIX timestamp to a textual representation of a UTC time.

Usage:

`time Int -> String`

-- returns a UTC time in the`"HH:MM:SS"`

format. `tolower`

- Converts to bytes of a string to lowercase.
*Note:*only works on ASCII data, Unicode is not supported.

Usage:

`tolower String -> String`

`toupper`

- Converts to bytes of a string to uppercase.
*Note:*only works on ASCII data, Unicode is not supported.

Usage:

`toupper String -> String`

`triplets`

- Similar to
`pairs`

, except returns triplets of before-previous, previous and current elements. (See also:`pairs`

and`ngrams`

.)

Usage:

`triplets Seq[a] -> Seq[(a,a,a)]`

`tuple`

- Returns its arguments as a tuple. Meant for grouping when defining tuples within tuples. See also:
`lines`

.

Usage:

`tuple (a,b,...) -> (a,b,...)`

`uint`

- Converts a signed integer, floating-point number or string to an unsigned integer.

Usage:

`uint Int -> UInt`

`uint Real -> UInt`

`uint String -> UInt`

`uint String, Integer -> UInt`

-- tries to convert the string to an unsigned integer; if the conversion fails, returns the second argument instead. `uniques`

- Returns an aggregator for counting the number of unique values. Hashes of all values are stored, so the result is exact as long as there are no hash collisions. Memory usage is proportional to the count of unique items. See also
`uniques_estimate`

.

Usage:

`uniques a -> UInt`

`uniques_estimate`

- Returns an aggregator for estimating the number of unique values. A statistical estimator is used instead of exact counts; memory usage is constant. Note: the estimator works better with larger counts of unique values. See also
`uniques`

.

Usage:

`uniques_estimate a -> UInt`

`url_getparam`

- Splits a string with URL query-string parameters into keys and values. Values will be automatically percent-decoded.

Usage:

`url_getparam String, String -> String`

-- calling`url_getparam(url, key)`

will return the first value in`url`

for`key`

. Example:`url_getparam("http://www.google.com?q=Hello%20World", "q")`

will return`"Hello World"`

.

`url_getparam String -> Seq[(String,String)]`

-- returns a sequence of all key/value pairs in the url. Example:`url_getparam."&one=1&two=2"`

will return a value equivalent to`seq(tuple("one","1"), tuple("two","2"))`

. `var`

- Calculates the sample variance of a sequence of numbers. (Defined as the mean of squares minus the square of the mean.) See also:
`mean`

and`stdev`

.

Usage:

`var Arr[Number] -> Real`

`var Seq[Number] -> Real`

`var Number -> Real`

--**Note:**this version of this function will mark the returned value to calculate the variance when stored as a value into an existing key of a map. `variance`

- Synonym for
`var`

. `while`

- Similar to
`filter`

, but stops the output sequence once the first filtered element is reached. See:`filter`

.

Usage:

`while Seq[(Integer,a...)] -> Seq[(a...)]`

`zip`

- Accepts two or more sequences (or arrays) and returns a sequence that returns a tuple of elements from each of the input sequences. The output sequence ends when any of the input sequences end.

Usage:

`zip Seq[a], Seq[b],... -> Seq[(a,b,...)]`

`zip Arr[a], Arr[b],... -> Seq[(a,b,...)]`

### Aggregators

Aggregators are functions like any other; they accept a value and return a value, though usually the result is not useful as such. What's important is that aggregators have a side effect: the returned value is (invisibly) marked such that it will combine in special ways when it ends up keyed in a map that already stores another element at this key.

Aggregation is performed efficiently: no unnecessary temporary data structures are created and no unnecessary bookkeeping calculations are performed.

Here is a list of aggregators and their effects, sorted alphabetically:

`array`

,`[. .]`

- Arrays are implicit aggregators. When combined together under one key of a map, arrays will concatenate, with the resulting elements appearing according to insertion order. (Last inserted elements coming last in the array.) See also:
`sort`

. `avg`

- Accepts a numeric value, returns a floating-point number. When combined together, the arithmetic mean of the numbers will be computed.
`iarray`

- Like
`array`

except all elements are printed on one line. `map`

,`{ }`

- Maps are implicit aggregators. When a value of a map is another map, those maps will merge when aggregated under one key. (See below for an example.)
`max`

- Accepts a numeric value, returns a value of the same type. When combined together, the maximum value is computed.
`mean`

- Synonymous with
`avg`

. `min`

- Accepts a numeric value, returns a value of the same type. When combined together, the minimum value is computed.
`sort`

- Like
`array`

, except that the resulting elements will be sorted in ascending order. `stddev`

- Synonymous with
`stddev`

. `stdev`

- Accepts a numeric value, returns a floating-point number. When combined together, the sample standard deviation is computed, defined as the square root of the variance. See also:
`var`

. `sum`

- Accepts a numeric value, returns a value of the same type. When combined together, the sum of the values is computed.
`uniques`

- Accepts any value and returns a
`UInt`

-valued aggregator that counts the number of unique values when combined.*Note:*hashes of values are stored, so the result is exact as long as there are no hash collisions. Memory usage is proportional to the count of unique values. `uniques_estimate`

- Like
`uniques`

, except that a statistical estimator is used instead. The result is not exact but the estimator uses constant memory.*Note:*the estimator works better with larger counts of unique values. `var`

- Accepts a numeric value, returns a floating-point number. When combined together, the sample variance is computed, defined as the mean of squares minus the square of the mean.
`variance`

- Synonymous with
`var`

.

An explanation of how arrays and maps are aggregated implicitly:

{ @~0 -> map(@~1, sum.1) : pairs(@) }

This program will produce the intuitively obvious result -- a map of maps where the leaf values are frequency counts. This works as expected because maps-inside-maps will automatically aggregate.

Similarly for arrays:

{ month(@) -> array(day_values(@)) : data }

Arrays under a map key will concatenate, and such a program will produce the expected result -- an array of all day values for each month.

### Error handling

Sequence, map and array comprehensions allow a special syntax for handling exceptions thrown while evaluating generator expressions.

Simply put the special token `try`

after the `[`

, `{`

or `[.`

opening parenthesis to silently ignore errors instead of aborting evaluation.

For example:

[ try uint.@ ]

will ignore any lines on the standard input that can't be parsed as a number.

first.{ try cut(@, " ", 1) }

will output the second word from each line, and ignore all lines that don't contain a space character.

### Recursion

`tab`

supports a limited kind of tail recursion for special cases when a simple step-by-step application of operations will not work.

Consider the example of computing the factorial: given a sequence of integers, compute its product.

In `tab`

the factorial function looks like this:

def fac << @~0 * @~1 : 1, count.@ >>

The `<< ... : ... >>`

takes an expression on the left-hand side and a pair of value and sequence on the right-hand side.

An expression that looks like `<< f(@~0, @~1) : a, seq(b, c, d) >>`

will be unrolled to be equivalent to this:

f(f(f(a, b), c), d)

The left-hand side will be evaluated repeatedly, with an argument that is a pair of values. The first element of the pair is the previous evaluation result, and the second element is the next element in the input sequence. The right-hand side is also a pair, with the first element a starting value and the second element the input sequence.

For example: calling `fac.3`

from the above example results in evaluating `(((1 * 1) * 1) * 2) * 3`

.

Note that the type of the result and the type of the sequence elements can be different. This will calculate the 11th Fibonacci number:

<< a=@~0~0, b=@~0~1, tuple(b, a + b) : tuple(0, 1), count.10 >>~1

### Multi-core

`tab`

can take advantage of multi-core systems by evaluating expressions using multiple threads.

Use the `-t`

command-line option to enable multithreaded evaluation.

Parallel evaluation is not quite automatic: `tab`

uses a simple scatter/gather evaluation model. N parallel threads will evaluate a 'scatter' expression, generating N independent sequences. A separate 'gather' thread will then read sequentially from all N sequences and aggregate them into a single result stream.

The syntax for parallel evaluation looks like this:

$ tab -tN scatter --> gather

The `-->`

is a special token that separates 'scatter' and 'gather' expressions.

Examples:

###### 1.

:[ grep(@, '[0-9]{4}') ]

A simple expression that will search for all four-digit numbers.

**Note:** if there is no `-->`

token in the epxression, then a default `--> @`

will be automatically appended.

In this case no result aggregation is done, all parallel threads will simply print what they found to standard output.

###### 2.

count.flatten.[ grep(@, '[0-9]{4}') ] --> sum.@

Same as the previous example, except that we want to count the numbers we found, instead of outputting them. The aggregating 'gather' expression will compute the sum of the counts found by all of the 'scatter' counting threads.

**Note:** the 'scatter' threads will read from the input stream atomically; there is no danger of an input line being read twice.

(A reminder that the `:`

operator is equivalent to the `flatten()`

function.)

###### 3.

{ @ ::[ grep(@, '[0-9]{4}') ] } --> count.map.@

Here we count the unique numbers found. The 'scatter' threads will aggregate a subset of the input into a map with a four-digit number as the key. The 'gather' thread will aggregate each of the 'scattered' maps into one final map, and output the count of its keys.

**Note:** the output of each 'scatter' thread will be a *sequence*. When a map or array is the result, it will be automatically turned into a sequence by an automatic application of `seq()`

. (Same as with the right-hand side expression in a `[ ... : ... ]`

or `{ ... : ... }`

generator.)

The input type of the 'gather' thread is `Seq[(String, Int)]`

.