1. SLE@UiB
  2. inf225
  3. inf225


Clone wiki

inf225 / glossary / Glossary

Glossary of Software Language Engineering Terms

[Alphabetical Index | Tag Index] – Important stuff is marked in bold, stuff with asterisk (*) will not be on the exam.

Abstract data type*

A type which is defined only through an interface of operations used to manipulate its value, and where the data representation is hidden.


Abstract syntax tree

A tree representation of the syntactic structure of a sentence; similar to a parse tree, but usually ignoring literal nonterminals and nodes corresponding to productions that don't directly contribute to the structure of the language (e.g., parenteses, productions used to encode operator priority and so on). May represent a slightly simpler language that the parse tree, for example, with operator calls desugared to function calls, and variations of a construct folded into one. Can be represented using as trees or terms, and described by an algebraic data type or a regular tree grammar.


Abstract value*

A value (of an abstract data type) known only through the operations used to create it. The user of an abstract data type operates on abstract values; only the implementation of the abstract data type sees the concrete value. A single abstract value may have many different concrete representations (either because there are several implementations of the data type, or because several representations mean the same – for example, 1/2 = 2/4)


Focusing on relevant details while leaving irrelevant details unspecified or hidden away. Data abstraction (e.g., classes and interfaces) hides the details of how a data structure is represented, instead giving an interface to manipulate the data. Control abstraction (e.g., functions, methods) hides how something is done (the algorithm), focusing instead on what is done.



When the same memory location or object can be accessed through different names in a program. Can make programs very difficult to understand and analyse. Typically a problem in imperative programming and object-oriented programming, where data is placed in an updatable store rather than everything being immutable values.


Algebraic data type

A composite data type defined inductively from other types. Typically, each type has a number of cases or alternatives, which each case having a constructor with zero or more arguments. For example, data Expr = Int(int i) | Plus(Expr a, Expr b). The data type can be seen as an algebraic signature, with the expressions written using the constructors being terms. For our purposes, values may be interpreted as trees, with the constructor name being node labels, arguments being children, and atomic values and nullary constructors being leaves.


Ambiguous grammar

A grammar for which there is a string which has more than one leftmost derivation.


Requires a generalised parser, produces a parse forest.


Analytic grammar*

A grammar which corresponds directly to the structure and semantics of a parser. For example, parser combinators and parsing expression grammars (PEGs).


Anonymous function

A function occuring as a value, without being bound (directly) to a name. C.f. closure.

Application binary interface*

Specifies how software modules or components interact with each other at the machine code level. Typically includes such things function calling conventions (whether arguments are passed on the stack or in registers, and so on), the binary layout of data structures and how system calls are done.


Application programming interface*

Specifies how software modules or components interact with each other.



A property of binary operators in parsing, indicating whether expressions such as a + b + c should be interpreted as (a + b) + b (left associative), a + (b + c) (right associative) or as illegal (non-associative).


Operations are grouped on the left, giving a tree which is “heavy” on the left side; typically used for arithmetic operators. Without associativity rules, grammar productions usually look like PlusExpr ::= PlusExpr "+" MultExpr | ..., forcing any expression containing the operator to be on the left side.


Operations are grouped on the right, giving a tree which is “heavy” on the right side; typically used for assignment and exponentiation operators.


Associativity rule

A disambiguation rule stating that an operator is either left-, right- or non-associative. E.g., in Rascal: syntax Expr = left (Expr "*" Expr | Expr "/" Expr );

Attribute grammar

A grammar where each production rule has attached attributes that are evaluated whenever the production rule is used in parsing. Can be used, for example, to build an intermediate representation directly from the parser, or to do typechecking while parsing.



The final stage of a compiler or language processor, often tasked with low-level optimisation and code generation targeted at a particular machine architecture.

Backus-Naur form

A formal notation for grammars, where productions are written <symbol> ::= <symbol1> "literal" .... Often extended with support for repetition (*, +), optionality (? or []) and alternatives (|).


Bottom-up parser

A parser that works by identifying the lowest-level details first, rather than working top-down from the start symbol. For example, an LR parser.


Chomsky normal form*

A simplified form of grammars where all the production rules are of the form A → BC or A → a or S → ε, where A, B and C are nonterminals (with neither B nor C being the start symbol), a is a terminal, ε is the empty string, and S is the start symbol. The third rule is only applicable if the empty string is in the language. Any context-free grammar can be converted to this form, and any grammar in this form is context free.



A function (or other operation) packaged together with all the variables it can access from the surrounding scope in which it was defined. A related term is anonymous function, which does not necessarily imply access to variables from the surrounding scope. See also environment, stack frame.


Composite data type

A data type constructed from other types (or itself, in the case of a recursive data type), e.g., a structure or an algebraic data type.

Context-free grammar

A formal grammar in which every production rule has a form of A → w, where A is a single nonterminal symbol and w is a sequence of terminals and nonterminals.



An abstract representation of the control state of a program. Often used in the sense of first-class continuations which let the execution state of a program be saved, passed around and then later restored.


Cross-cutting concern*

A programming concern, such as logging or security, which impacts many parts of a program and is difficult to decompose (cleanly separate into a library) from the rest of the system.


Dangling else problem

A common ambiguity in programming languages (particularly those with C-like syntax) in which an optional else clause may be interpreted as belonging to more than one if sentence. Usually resolved in favour of the closest if, often by an implicit disambiguation rule (at least in non-generalised parsing).


Declarative programming*

A programming paradigm where programs are built by expressing the logic of a computation rather than the control flow; i.e., what should be done, rather than how. Includes, e.g., functional programming and logic programming.


Definite clause grammar*

A way of expressing grammars in logic programming languages such as Prolog.



A sequence of production rule applications that rewrites the start symbol into the input string (i.e., by replacing a nonterminal symbol by its expansion at each step). This can be seen as a trace of a parser's actions or as a proof that the string belongs to the language.


A derivation where the leftmost nonterminal symbol is selected at every rewrite step.


A derivation where the rightmost nonterminal symbol is selected at every rewrite step.


Removal of syntactic sugar. Sometimes used in a frontend to translate convenient language constructs used by the programmer into more fundamental constructs. For example, translating Java's enhanced for into a more basic iterator use.


Deterministic context-free grammar*

A context-free grammar that can be derived from a deterministic pushdown automaton (DPDA). Always unambiguous.


Disambiguation rule

Used to resolve ambiguities in a grammar, so that the parser yields a single unambiguous parse tree. Includes techniques such as follow restrictions, precede restrictions, priority rules, associativity rules, keyword reservation and implicit rules.

Domain-specific language

A language (i.e., not just a library) with abstractions targeted at a specific problem domain.


Easier programming, more efficient or secure, possibly better error reports


Lots of implementation work, language fragmentation, learning/training issues, less tooling, troublesome interoperability, possibly worse error reports

External DSL

A DSL defined as a separate programming language.

Internal or embedded DSL

A DSL defined as language-like interface to library.


Duck typing

A typing style where the exact type of an object is not important, rather, any object is usable in any situation as long as it supports whatever methods are called on it. Used in many dynamic languages, such as Python, and in C++ templates. C.f. structural typing. Named after the duck test (attributed to James Whitcomb Riley): “When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.”


Dynamic dispatch

The process of selecting, at runtime, which implementation of a method to call at runtime; typically based on the the actual class of the object on which the method is called (as opposed to the static type of the variable). With multiple dispatch, the selection is done based on some or all arguments, making it a kind of runtime overload resolution.


Dynamic language

A language where most or all of the language semantics is processed at runtime, including aspects such as name binding and typing. May have features such as duck typing, dynamic typing, runtime reflection and introspection, and often allows code to be replaced and objects to be extended at runtime.


Dynamic scoping

When names are resolved by finding the closest binding in the runtime environment (i.e., the execution stack), rather than in the local lexical environment (i.e., the containing scopes at the use site). C.f. lexical scoping.


Dynamic semantics

Gives the meaning of a program at execution time; either in terms of values being computed, actions being performed and so on.


Dynamic typing

When type safety is enforced at runtime. Values are associated with type information, which can also be used for other purposes, such as runtime reflection. Used in languages such as Python, Ruby, Lisp, Perl, etc.


Compiler may run faster; easy to load code dynamically at runtime; allows some things that are type safe but are still excluded by a static type system; easy to use duck typing to get naturally generic code with little overhead for the programmer; reflection, introspection and metaprogramming becomes easier.


Type errors cannot be detected at compile time; rigorous testing is needed to avoid type errors; some optimisations may be difficult to perform (less of a problem with just-in-time compilation). Dynamic typing does not imply weak typing. C.f. static typing, duck typing.



A mapping of names to values or types. Used in evaluation and typechecking to carry name bindings. The environment is passed around (propagated) according to the scoping rules of the language, and may use a more complicated data structure to accomodate nested and/or named scopes. In a language with store semantics, the ``value'' will typically be a store location.



In a grammar, the empty string.


A program that executes another program.


Extended Backus-Naur form*

A syntax notation introduced by Niklaus Wirth, which includes notation for optionality, repetition, alternation, grouping and so on.



A data member of a data structure.

Follow restriction

A disambiguation technique where a symbol is forbidden from or forced to be immediately followed by a certain terminal.

Formation rule*

A grammar; rules for describing which strings are valid in a language. This term is used mainly in logic.


The first stage of a compiler or language processor, typically including a parser (possibly with a tokeniser), and a typechecker (semantic analyser). Sometimes also includes desugaring. Is typically responsible for giving the programmer feedback on errors, and translating to the internal AST or representation used by the rest of the system.


An abstraction over expressions (or more generally, over expressions, statements and algorithms).

Function type

The representation of a function in the type system. Typically includes parameter types and return type, written t_1, ..., t_k → t.

Function value

The representation of a function in an evaluator or in a dynamic semantics specification. Usually includes the parameter names and the function body. Forms a closure together with an environment giving the function's declaration scope.

Functional programming

A programming paradigm which is value-oriented, based on mathematical functions, usually without state and mutable variables. Pure functional languages have referential transparency, and typically allows Higher-order functions.


Generalised parser

A parser that can handle the full range of context-free grammars, including nondeterministic and ambiguous grammars. For example, a GLL parser or a GLR parser.

Generative grammar*

An approach to specifying the syntax of the language, using a set of rules that produce all strings in the language. Formalised by Noam Chomsky in the late 1950s, but the idea goes back to Pāṇini's Sanskrit grammar, 4th century BCE.


Generative programming*

Programming aided by automatic generation of code, including techniques such as generic programming, templates, aspects, code generation, etc.


Generic programming*

A programming style which allows the same piece of code to deal with many different types, for example through polymorphism.


GLL parser*

An LL parsing algorithm extended to handle nondeterministic and ambiguous grammars, making it capable of parsing any context-free grammar. Unlike normal LL or recursive descent parsers, it can also handle left recursion.


GLR parser*

An LR parsing algorithm extended to handle nondeterministic and ambiguous grammars, making it capable of parsing any context-free grammar.



A formal set of rules defining the syntax of a language. Formally, a tuple 〈N,T,P,S〉 of nonterminal symbols N, terminal symbols T, production rules P, and a start symbol S ∈ N. In software languages, the most frequently used kinds are context-free and regular grammars.


Grammar in a broad sense*

A structural description in software systems, and a description of structures used in software systems: a parser specification is an enriched grammar, a type definition is a grammar, an attribute grammar comprises a grammar, a class diagram can be considered a grammar, a metamodel must contain a grammar, an algebraic signature is a grammar, an algebraic data type is a grammar, a generalized algebraic data type is a very powerful grammar, a graph grammar and a tree grammar are grammars for visual concrete notation, an object grammar contains two grammars and a mapping between them.

[Paper: Toward an Engineering Discipline for Grammarware]

Higher-order function

A function which takes takes functions as arguments or returns function values.


Immutable value

A value or object that cannot be changed after it is created. The default in functional programming.


Imperative programming

A programming paradigm based on statements that change program state (update a store); as opposed to declarative programming. May be combined with object-oriented programming.


Implicit disambiguation rule

A disambiguation rule built into the parser, such as longest match for regular expressions, or resolving the dangling else problem by preferring shift over reduce in an LR parser.


A technique in object-oriented programming which combines automatic code reuse with subtyping.



A technique in language processing where a call to a function or procedure is replaced by the code being called. Often used as part of code optimisation; removes abstraction introduced by the programmer.


Island grammar*

A grammar which describes only small parts of a language, skipping over the rest. Used, for example, to recover documentation from a program.

[Program Transformation Wiki]

Just-in-time compilation*

A technique used in interpreted or bytecode-compiled languages where program code is compiled at runtime, during evaluation. This gives the speed advantages of compilation, while retaining the dynamic flexibility and architecture-independence of a interpreted or bytecode-based language. Can sometimes give even better performance than static compilation, since more information may be available at runtime, leading to better optimisation. Heavily used in modern implementations of JVM and .NET.


Kleene closure

A metasyntactic sugar for repetition: x* means that x can be repeated zero or more times. The language that the Kleene star generates, is a monoid with concatenation as the binary operation and epsilon as the identity element.



A system of communication, with structure (syntax) and meaning (semantics), and abstractions that allow you to communicate usefully at different levels (i.e., more than just pointing at concrete things or showing a picture of something).

Left factoring*

A technique used to avoid backtracking in top-down parsing, by ensuring that the productions for a nonterminal don't have alternatives that start with the same terminals.. Used to massage a grammar to LL form.


Left recursion*

When production rules have the form A → Aa | b, with the Nonterminal occurring directly or indirectly to the left of all terminals on the right-hand side. Must be eliminated in order to use an LL parser.



A string of characters that is significant as a group; a word or token.

Lexical analysis

Converting a sequence of characters (letters) to a sequence of tokens (lexemes or words).

Lexical scoping

When names are resolved (possibly statically) by finding the closest binding in the lexical environment (i.e., by looking at the scopes that lexically contains the name). C.f. dynamic scoping.


Lexical syntax

Describes (often using a Regular grammar) the syntax of tokens; e.g., what constitutes an identifier, a number, different operators and the whitespace that separates them.


Literate programming*

A programming style where programs can be read as documents that explain the implementation, with explanations in a natural language. Tools allow programs to be compiled as either code or documents.


LL parser

A table-driven top-down parser, similar to a recursive descent parser. Has trouble dealing with left recursion in production rules, so the grammar must typically be left factored prior to use. The LL parser reads its input in one direction (left-to-right) and produces a leftmost derivation, hence the name LL. Often referred to as LL(k), where the k indicates the number of tokens of lookahead the parser uses to avoid backtracking.


LL grammar

A grammar that can be parsed by a LL parser.


Logic programming

A declarative programming paradigm based on formal logic, inference and reasoning. Useful for many purposes, including formal specification of language semantics. Prolog is the most well-known logic language.


LR parser

A bottom-up parser that can handle deterministic context-free languages in linear time. Common variantes are LALR parsers and SLR parsers. It reads its input in one direction (left-to-right) and produces a rightmost derivation, hence the name LR.


LR grammar

A grammar that can be parsed by a LR parser.



The act of modifying a grammar to make it fit a particular technology or purpose.


A result of megamodelling — a model which elements are software languages, models, metamodels, transformations, etc

[Paper: On the Need for Megamodels]


An element of a structure or class; a field, method or inner class/type.


A function which is a member of a class. Typically receives a self-reference to an object as an implicit argument.



A partial class (data fields and methods) that can be used to plug functionality into another class using inheritance.


Multi-paradigm programming*

Programming which combines several paradigms, such as functional, imperative, object-oriented or logic programming. Languages that support multiple paradigms include, for example, C++, Scala, Oz, Lisp, and many others.


Name binding

A part of language processing where names are associated with their declarations, according to scoping and namespace rules. A name's binding is typically determined by checking the environment at the use site.


When done statically (or early), name binding is often combined with typechecking.


Names are bound at runtime; also applies to dynamic dispatch (where it is sometimes called late or virtual binding), where certain properties (such as types) may be known statically, but the exact operation called is determined at runtime.


Named tuple

A tuple where the elements are named, like in a structure. Often exhibits structural type equivalence, even in languages that normally use nominative type equivalence


Some kind name grouping that makes it possible to distinguish different uses of the same name. For example, having variable names be distinct from type names; or treating names in one module as distinct from the same names in another module (In this sense, namespaces are related to scope).


Nominative type equivalence*

A system where type equivalence or compatibility is determined based on the type names (or, more strictly, which declaration the names refer to) and not the structure of the type. C.f. structural type equivalence.


Nonterminal footprint*

A non-recursive measure of nonterminal symbol usage in a grammatical expression: a multiset of presence indicators (1 for the nonterminal itself, ? for its optional use, * for its Kleene closure, etc). A usefulness of a footprint for grammar matching depends on how rich the metalanguage is.

Nonterminal symbol

A symbol in a grammar which is defined by a production. Can be replaced by terminal symbols by applying the production rules of the grammar. In a context-free grammar, the left-hand side of a production rule consists of a single nonterminal symbol.

Object-oriented programming*

A programming paradigm based on modelling interactions between objects. Objects have fields and methods and encapsulate state. Provides data abstraction, and usually supports inheritance, dynamic dispatch and subtype subtype polymorphism.



The process of transforming program code to make it more efficient, in terms of time or space or both.



When the same name is used for multiple things (of the same kind). For example, several functions with the same name, distinguished based on the parameter types. C.f. namespace, where the same name can have different meanings in different context (e.g., type names are distinct from variable names).


Overload resolution

A compilation step, usually combined with typechecking, where the name of an overloaded function is resolved based on the types of the actual arguments. C.f. dynamic dispatch, which does something similar at runtime.


Parse forest

The parse trees that are the result of parsing an ambiguous grammar using a generalised parser.

Parse tree

A tree that shows the structure of a string according to a grammar. The tree contains both the tokens of the original string, and a trace of the derivation steps of the parse, thus showing how the string is a valid parse according to the grammar. Typically, each leaf corresponds to a token, each interior node corresponds to a production rule, and the root node to a production rule of the start symbol.


A program that recognises input according to some grammar, checking that it conforms to the syntax and builds a structured representation of the input.

Parser combinator*

A way of expression a grammar and a parser using higher-order functions. Each combinator accepts parsers as arguments and returns a parser.


Parsing expression grammar*

A form of analytic grammar, giving rules that can be directly applied to parse a string top-down. Similar to a context-free grammar, but the rules are unambiguously interpreted; for example, alternatives are tried in order. Related to parse combinators. PEGs are a useful and straight-forward technique for parsing software languages. It is suspected that there are context-free languages that cannot be parsed by a PEG, but this has not been proved.



Recovering the grammatical structure of a string. The task done by a parser.

Pattern matching

A technique for comparing (typically algebraic) data structures, where one or both structures may contain variables (sometimes refered to as meta-variables). Upon successful match, variables are bound to the corresponding substructure from the other side. Related to unification in Prolog, but often more restricted.



Ad hoc

Another name for function or operator overloading.


When a function or data type is generic and handle values of different types in the same; for instance, List<T> – a list with elements of an arbitrary type. The specific type in question is often given as a parameter, hence the name.


When an object belonging to a subtype can be used in a place where the supertype is expected (as with classes and inheritance in object-oriented programming).


Precede restriction

A disambiguation technique where a symbol is forbidden from or forced to be immediately preceded by a certain terminal.

Predictive parser

A recursive descent parser which does not require backtracking. Instead, it looks ahead a finite number of tokens and decides which parsing function should be called next. The grammar must be LL(k) for this to work, where k is the maximum lookahead.

Priority rule

A disambiguation rule declaring an operator's priority/precedence. E.g., in Rascal: syntax Expr = Expr "*" Expr > Expr "+" Expr;

Procedural programming

A programming paradigm based around procedure calls. Sometimes considered the same as imperative programming and typically based on structured programming.


Production rule

A rule describing which symbols may replace other symbols in a grammar. In a context-free grammar, the left-hand side consists of a single nonterminal symbol, while the right-hand side may be any sequence of terminals and nonterminals. For example, Expr ::= Expr "+" Expr says that anywhere you may have an expression, you can have an expression plus another expression. The rules may be used to generate syntactically correct strings, by applying them as rewrite rules starting with the start symbol, or be used to parse strings, e.g., in a top-down parser or bottom-up parser.

Product type

A composite data type that describes the type of a tuple. Similar to, and typically used in the form of a structure type where the fields are labeled. If a type is seen as the set of all values belonging to the type, the product type corresponds to the Cartesian product on sets.


Program slicing

A program transformation technique where all code that is irrelevant to a certain set of inputs or outputs is removed. Applied forwards, any code not directly or indirectly using a selection of inputs is discarded; applied backwards, all code that does not contribute to the computation of the selected outputs is discarded. Mainly used in debugging (e.g., to find the code that might have contributed to an error), but sometimes also as an optimisation technique. Originally formalised by Mark Weiser in the early 1980s.



A program that recognises input according to some grammar, giving an error if it does not conform to the grammar, but does not build a data structure.

Recursive data type*

A composite data type, such as an algebraic data type which may contain itself. Used, for example, to define data structures such as lists and trees.

Recursive descent parser

A top-down parser built from mutually recursive functions, where each function typically implements one production rule of the grammar.


Referential transparency

When an expression can be replaced by its value without changing the meaning of the program; i.e., it will evaluate to the same value every time and not cause side effects. Usually a property of functional programming languages.


Regular expression

A formalism for describing a regular grammar, using the normal alphabet mixed with special metasyntactic symbols, such as the Kleene star. Commonly used to specify the lexical syntax of a language, and also for searching and string matching in many different applications.


Regular grammar

A formal grammar where every production rule has the form A → aB (for a right regular grammar), or A → a or A → ε, where A and B are nonterminal symbols and a is a terminal symbol, and ε is the empty string. Alternatively, the first production form may be A → Ba, for a left regular grammar.


Can't express arbitrary nesting, such as nested parentheses or block structure in a language.


Reserve rule

A disambiguation rule which states that a grammar symbol cannot match some constraint. For example, identifiers could be defined as any word matching [a-zA-Z]+ except if, while, ...

Scannerful parsing

Parsing is divided into two parts; a tokeniser that deals with the lexical syntax and a parser that deals with the sentence syntax.


Faster than scannerless parsing, because the lexical syntax is specified with a regular grammar which can be parsed very efficiently.


Cannot deal with arbitrary composition of languages.

Scannerless parsing

When scanning and parsing is unified into one process that deals with with the input characters directly.


Can parse combinations of languages that have different lexical syntax. Lexical syntax can be context-free, not just regular.


Slower than scannerful parsing. Can lead to hard to find lexical ambiguities.


A collection of identifier bindings – i.e., what is captured by the environment at some point in the code or in time.


With nested scopes, variables in inner scopes may shadow those in outer scopes, and variables are removed as control flows out of the scope. Variable shadowing may be forbidden in some languages.


With named scopes, we can refer to names in scopes that are not ancestors of the current scope. For example, with C++ classes and namespaces and Java packages and (static) classes.


Semantic analysis

A phase of language processing that enforces the static semantics of a language. Includes typechecking, name binding, overload resolution and checking other static constraints.

Software Language

An artificial language used in software development.For example, Java (programming language), HTML (markup language), XML (data language), CSS (domain-specific language).

Stack frame

A data structure that contains the local state of a function. Typically includes argument values, local variables and a return address (for transferring control back to the caller when the function returns). In a language with nested functions and updatable state (such as Scheme), stack frames are not necessarily actually placed on a stack. For example, if an inner function g is returned from an outer function f and also accesses variables in f, then f's activation record must remain intact as long as g lives.


Start symbol

The nonterminal symbol in a grammar that generates all valid strings in a language.

Static semantics

The part of language semantics which is processed at compile time (statically). Often includes constraints that might be part of the syntax, but which is done separately in order to keep the grammar context-free. Includes concepts like name binding and typechecking, and is used to eliminate a large class of invalid or erroneous programs. See static typing.

Static typing

When type safety is enforced at compile type (though some tests, such as for typecasting, may be done at runtime).


Detects a large an important class of errors (type errors) at compile time; enables advanced optimisations and efficient memory use.


Type system may become either overly complicated or overly restrictive; doesn't help with non-type errors; makes dynamic loading of code somewhat more complicated; type declarations may be cumbersome if the language lacks type inference.


An abstraction of the computer's memory, where data (e.g. objects) is addressed by locations, and can be updated. Used in defining the semantics of imperative programming, where variables typically refer to storage locations rather than being directly bound to values. See also environment. In a running system, the store is typically realised as a memory heap and program stack.

Strong typing

When a language (to some degree) enforces type safety.


Structural type equivalence*

A system where two types are equal or compatible if they have the same structure; e.g., have the same fields with the same types in the same order. C.f. nominative type equivalence.


Structure, Structure type

A composite data type with named fields members; such as struct in C or record in Pascal. Similar to (or same as, with structural type equivalence) a named tuple, or a product type with named components.


Structured programming*

A programming paradigm where the clarity of programs is improved by nestable language constructs like if, while, as opposed to conditional jumps.


Sum type*

A composite data type that describes the a choice of multiple possible types. An algebraic data type with multiple constructors is an example of a sum union type (each constructor corresponds to a product type). In a set-theoretic interpretation of types, this corresponds to disjoint union on sets.


Syntactic sugar

See Desugaring.


Terminal symbol

An elementary symbol in the language defined by a grammar, which cannot be changed/matched by the production rules in the grammar (i.e., the symbol doesn't occur (alone) on the left-hand side of a production). Corresponds to a token or an element of the alphabet of a language.


A lexeme or group of characters that forms a basic unit of parsing, categorised according to type, e.g., identifier, number, addition operator, etc. Forms the alphabet of the parser in scannerful parsing.



A program that performs lexical analysis, grouping and classifying input into tokens.

Top-down parser

A parser using a strategy where the top-level constructs are recognised first, starting with the start symbol. The parser starts at the root of the parse tree, and builds it top-down, according to the rules of the grammar. Includes LL parsing and Recursive descent parsing.


A program that detects type errors, ensuring type safety in a statically typed language. Often combined with other static semantic checks and processing, such as overload resolution, name binding, and checking access restrictions on names. Can be seen as a form of abstract interpretation, where the abstract values are the types, and all operations are defined according to their static semantics (i.e., type signature) rather than their dynamic semantics. C.f. semantic analysis.


Type inference

Automatic deduction of the types in a language. Used with static typing to avoid having to declare types for variables and functions. Particularly useful in generic programming and type polymorphism where type expressions can become quite complicated. Used, e.g., in Haskell and Standard ML.


Type safety

Whether a language protects against type errors, such as when a value of one data type is interpreted as another type (e.g., an int as a float or as a pointer to a string).



Forced conversion from one type to another. In a languages with weaker type systems, may lead to data being misinterpreted.



A pattern matching-like operation, where the goal is to find an assignment of variables so that two terms become equal. Used heavily in Prolog.


Weak typing

When a language (to some degree) does not enforces type safety.



The yield of a parsetree is the unparsed input text.

[Alphabetical Index | Tag Index]