OptionalTypeSystem

Type System

I have stopped working on this project. In its current state, I believe it is usable for medium sized projects (<800 files), though it has not been tested much. I am very interested in hearing feedback from anyone who tries this, and would be glad to help troubleshoot any problems that are encountered.

Motivation

Though Lua was originally designed for creating small scripts, it increasingly has been used for writing entire games. The scale of these games can range anywhere from small prototypes to complete commercial games, often written using one of the several lua based game development SDKs, such as Corona or Moai, that have recently emerged.

I believe that the source code of an entire game exceeds the scope that Lua was originally intended for. Does this mean that programming a game entirely in Lua is a bad idea? Probably not; Lua is a very simple and elegant language with unique features which make it suitable for game development. Furthermore, Lua's meta table feature gives programmers control over the policies used to structure their code, so programmers writing large programs can use custom class systems intended for such.

Even with specialized class systems, Lua still faces certain issues when scaling up. First, when the interface for a piece of code is changed, any clients of this interface may break, and tracking down all points at which this happens can possibly be non-trivial. Second, it can be difficult for programmers to query for information about the external modules that a piece of code is interfacing with.

For example, a certain function in a program might involve calls to five different other functions spread across three different files. To understand the function at hand, the programmer must understand the interface of each function that it calls out to. It is sometimes possible to do this by figuring out the interfaces based on callee function names and arguments, but this method is ultimately just a guess. A more reliable method would be to track down the implementations of the callees and read the comments above each of them; doing this could be tedious, though. (At minimum, it would require opening three different files.) In my opinion, one of the largest benefits of static type analysis is that it can perform the task of collecting relevant function and object interfaces automatically for the programmer. The information is presented to the programmer in the form of call tips and auto completion lists.

Overview

I have implemented a type checker for Lua 5.1 in which annotations can usually be omitted, and in such cases either the dynamic type, or type computation is used as a fallback.

The philosophy behind this type system is that its primary purpose is to enable IDE features rather than to achieve type safety (i.e. rule out runtime errors via static analysis). However, it does generate errors for obvious problems, such as the misspelling of the names of methods and globals, which tend to be the most common.

All typing annotations are embedded into comments in lua 5.1 source code. Enforcing good programming style is prioritized over maintaining compatibility with lua; arbitrary unannotated lua programs will not necessarily pass the type checker.

Annotations

A function with type annotations looks like this:

-- this is a description
-- @param(a : number) some parameter 
-- @ret(number) first return value
-- @ret(string) second return value
function Thing:Method(a)
	return 3,"blarg"
end

parameters are annotated using the format "param(paramName : type) description" return values are annotated using the format "ret(type) description"

Return value annotations must correspond to the order of the actual return values. So in the above example, the first return value would be a number and the second a string.

When parameters are not given type annotations, they default to type unknown. The type system is unable to determine statically the types of expressions having type unknown. Hence, it delegates this task to the programmer; it allows the programmer to treat expressions of type unknown in whichever way she wants, assuming that she knows how to soundly deal with any such expression in the program. Unknown is treated as a subtype and supertype of every type, so expressions having type unknown can be used in any context.

In addition to functions, assignments can also be annotated. An assignment annotation has the form "var(type) description". Multiple annotations can be given for multiple assignments and, as for return annotations, the annotations must be given in order.

--@var(number) The x coordinate
--@var(number) The y coordinate
local x,y = 0,0 

In the following example, the description section is empty. It is always permissible to omit descriptions. We can use var annotations for table entries as well as local assignments, as shown in the following example.

local tbl = {
  --@var(string)
  a = "aaaaaah!"
}

The type checker will determine the type of many expressions even when no annotation is given. For example, the following code would generate an error:

local x = "hello"
x = 3

The type checker would complain that x's actual type, string, does not have expected type number. (The expressions on the right-hand side do not have a one-to-one correspondence with the actual values involved, so the type checker checks the expressions on the left-hand side instead.) In the above example, the variable x is initially assigned to a value of type string. The type checker does not allow variables to change type; this is good, because code that does this sort of thing can be difficult read.

Nillable Types

Similarly, the following code would generate an error:

local x = "hello"
x = nil

Now, this might worry you—assigning variables of various types to nil is a fairly common technique, so you might think that preventing the programmer from doing so would be too constraining. I have added nillable types to deal with this. Prefixing any type with the '?' character marks that type as nillable. You could annotate the above declaration as nillable to alleviate the problem.

--@var(?string)
local x = "hello"
x = nil

A nillable type's underlying type and the nil type are both treated as its subtypes. In addition, values having nillable types can have their fields, methods, and operators accessed as if they belonged to the underlying type. The policy of making types non-nillable by default is in contrast to most mainstream languages. I feel that it is the correct choice, though, because nil values are an infamous source of errors in programs.

If Statements

If statements require their conditions to have type boolean. Using an expression e with a nillable type as an if condition will generate a type error. Instead, use the expression e ~= nil.

and

The and operator can be interpreted in two different ways by the static type checker.

1.) As a logical operator. In this case, the left and right operators must both have type boolean. 2.) As a ternary operator. In this case, it appears in an expression of the form /a and b or c/, where a must have type boolean, and b and c must have the same type.

or

The or operator, in addition to being part of a ternary operator, may play two additional roles.

1.) As a logical operator. Both operands must be boolean. 2.) As a default operator. In this case, the left operand must have type ?T (where ? denotes nillability) and the right operand must have type T for some arbitrary type T.

Constants

We can prevent an identifier from being reassigned to a different value by using a const annotation rather than a var annotation. const annotations can be used for table entries as well as local and global identifiers.

--@const(boolean)
local debugModeEnabled = true

We can leave out the type if we want it to be inferred.

--@const()
local doWeNeedToAnnotateTheType = false

Unknown Type

When the typing of a term in a program is beyond the capabilities of the type checker, it is given type unknown. unknown is treated as a both a subtype and a supertype of every type in the program. It is unsound from the perspective of type safety, but it allows the programmer to work around the limitations of the type system.

Empty table constructors, as well as unannotated function parameters and return values are given type unknown. Requiring modules which cannot be processed by any type collector plugin results in unknown also. In addition, it is always valid to ascribe a variable or constant to type unknown. To give an example of unsound unknown is, here is an example of a buggy program which would not result in typechecking errors.

local a = {}
a = a + "hello"
a = a .. 3

Tables

A table constructor which includes only string keys is given a record type which contains one field named after each key in the constructor. Such record types are annotated as { ... li : Ti ... } for i ∈ 1..n, n ≥0. Record type annotations may contain constant field annotations; such field annotations use :~ instead of :. The following record type annotation represents a person with a constant name and variable age: "{ name :~ string, age : number }". Nominal table types are not currently treated as subtypes of compatible structural types, though this might be a useful feature to add in the future.

A table constructor which includes only number keys is given an array type, which allows the user to access and assign values with arbitrary number keys. An array type is annotated as "{elemTy}", where elemTy is an annotation representing the type of the array's elements.

In addition, class system plugins (see Basic Class System) allow nominal table types to be defined; lua modules are interpreted as object type definitions, and these object types are named after the modules which they are defined by.

Function Types

The type of a function with formal types A1,...,An and return types R1,...Rm for some n,m ≥ 0 is written as [A1,...,An]->[R1,...,Rm].

There are two families of function types: latent and non-latent. Latent functions may call coroutine.yield and other latent functions, whereas non-latent functions may not.

A latent function can be defined by prepending a line containing the text "--@latent" in front of its definition.

A latent function type is written as [A1,...An]-L>[R1,...,Rm].

Function types may have variable trailing arguments as well as variable trailing return values. A function type annotation with variable trailing arguments has ",..." after An. A function type annotation with variable trailing return values has ",..." after Rn. A function with an arbitrary number of arguments and return values is therefore annotated as [...]->[...]. Think of "..." as shorthand for the infinite sequence "unknown,unknown,unknown,...". This shorthand makes things a lot easier :)

Let's suppose you define a function which takes a variable number of untyped trailing arguments. A description can be attached to these arguments by prepending a varargs annotation in front of the function definition. A varargs annotation is formatted as "@varargs description" where description is a description of what the trailing arguments are. For variable trailing untyped return values, a varrets annotation can be used; similar to varargs annotations, varrets annotations are formatted as "@varrets description".

Accumulated Types

When a variable is declared, it is bound to a type. However, the variable's type may temporarily be promoted to some subtype of the bound type at different points in the program.

For example, if a data member x with type ?string appears in the condition of an if statement such as this one:

if self.x ~= nil then

end

x will accumulate type "string" inside the body of the if statement. Intellisense reports accumulated types as "bound(accumulated)" where bound is the name of the type the variable was bound to, and accumulated is the name of the type which it has accumulated. It should always be the case that accumulated is a subtype of bound.

A variable which has type "bound(accumulated)" can be used as if it has type "accumulated".

Note that the type accumulation described above is unsound: a method could easily be called inside the if statement's body which sets x to nil. No attempt is made to detect such invalidations, so it is up to the programmer to make sure that no such problematic scenarios occur; this is in accordance with Love Studio's philosophy of valuing programmer freedom over type safety.

Globals

All globals must be assigned in a file called globals.lua. Assigning a global anywhere else will generate a "variable not in context" error. Of course, it is permissible to reassign globals which are initially assigned in globals.lua, as well as globals defined by external APIs.

Type Collection

Since many lua programs are written using custom class systems, our type system would be useless without some way of typing objects. My solution to this was to set up an interface for type collector plugins. A type collector is a small program, written in F#, that extracts types from class defintions written for a specific lua class system.

I have implemented one such type collector for a class system called Basic Class System.

Updated

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.