Wiki

Clone wiki

docs / Structs

A struct aggregates data of different type. Structs must be declared so that the compiler can check that its components are used at the correct types. For example,

struct person {
  int id;
  string name;
  int age;
};

declares struct person to a type whose values are tuples consisting of an id (of type int), a name (represented as a string), and an age (again, an int). id, name, and age are called the fields of the struct.

The C0 programming language does not provide any explicit notation for values of type struct s. Such values are large and cannot be held directly in a variable. Instead, they must be in allocated memory. For example, trying to declare a variable of struct type will fail:

% coin struct.c0
Coin 0.2.7 "Penny" (r2209M, Thu Aug 18 14:07:47 EDT 2011)
Type `#help' for help or `#quit' to exit.
--> struct person p;
<stdio>:1.1-1.16:error:type struct person not small
cannot pass or store structs in variables directly
use struct person*
--> 

Instead, the variable must be a pointer to a struct, which at some point should be allocated in memory before we access it.

--> struct person* p;
--> p = alloc(struct person);
p is 0xFFAFFF80 (struct person*)
--> 

Note that struct person* should be read as (struct person)*, although the latter is not legal syntax. Once we have a pointer to a struct, we can read its component values value using the p -> f notation, where p is a pointer to a struct s and f is a valid field for a struct s.

--> p->id;
0 (int)
--> p->name;
"" (string)
--> p->age;
0 (int)
-->

We see that each field has been initialized to the default value of the given type. We can explicitly write the fields of a struct by using this notation on the left-hand side of an assignment.

--> p->id = 15122;
p->id is 15122 (int)
--> p->name = "Cap'n Crunch";
p->name is "Cap\'n Crunch" (string)
--> p->age = 48;
p->age is 48 (int)
--> 

Type Definitions

Because structs cannot be held directly in variables (must be in memory), they occur only at memory addresses of array elements, pointers, or possibly as parts of other structs. By far the most common case is the use of pointers to structs, like struct person * in the example above. Because this is an awkward phrase, it is common to define a new type which is a synonym for a pointer to a struct. We do this with a typedef t a; definition, where t is an arbitrary type, and a is its new name. Rephrasing the example above:

/* file struct.c0 */
struct person {
  int id;
  string name;
  int age;
};
typedef struct person* person;

Now we can use person as a shorthand for struct person*.

% coin struct.c0 
Coin 0.2.7 "Penny" (r2209M, Thu Aug 18 14:07:47 EDT 2011)
Type `#help' for help or `#quit' to exit.
--> person p;
--> p = alloc(struct person);
p is 0xFFAFFF80 (struct person*)
--> p->id = 15122;
p->id is 15122 (int)
--> 

C0 does not get confused, because a type name person is distinct from a struct name person. A struct name can only appear right after struct and forms a type, while a type name by itself stands for a type.

Data Structure Invariants

Not all values of a given type actually make sense as an instance of a data structure. The purpose of data structure invariants is to capture some requirements on valid instances of the data structure, which often comes down to properties of the structs used to implement them. In the example above, we might require both the id and the age to be positive, and also put some sanity check on it. First, informally as a comment:

/* file struct.c0 */
struct person {
  int id;       /* id > 0 */
  string name;
  int age;      /* 0 <= age && age < 200 */
};

As a second step, we write a function to check the validity of the data structure that incorporates the invariants. Such a function should return a boolean, and it is intended to be used in pre- and post-conditions for functions and loop invariants.

bool is_person(person p) {
  if (p == NULL) return false;        /* test this first! */
  if (!(p->id > 0)) return false;     /* so that this dereference is safe */
  if (!(0 <= p->age && p->age < 200)) return false;
  return true;
}

Recall that the type person stands for struct person*, so the argument to is_person could be NULL. This, however, does not describe a valid person so we return false. It is absolutely crucial to test this before the second test, because the expression p->id dereferences the pointer p and then retrieves the contents of the field id. If p is NULL, this would yield an error instead of returning false from the is_person function.

Remember: We must be able to show from the contracts that every dereference of a pointer (with *p or p->f) in a correct program is safe in the sense that p is not NULL.

Recursive Structs

A very common case in the implementation of data structures are recursive structs, which are a particular form of recursive types in programming languages in general. The following states that a list has a data and a next field, where the next field points to another list. This kind of structure is usually called a linked list.

struct list {
  int data;
  struct list* next;
};
typedef struct list* list;

This is a description of the structure of a data structure, it is not code that actually runs. But the question arises is how we can ever "get of the ground" to start a list. Here is where the default element of type t*, the NULL pointer comes in. We exploit this to indicate the end of the list (or, if you will, an empty list).

The first function below, nil(), constructs the empty list, while the second function cons(x,l) adds an additional item x to the front of the list l.

/* file list.c0 */
struct list {
  int data;
  struct list* next;
};

typedef struct list* list;

list nil() {
  return NULL;
}

list cons(int x, list l) {
  list p = alloc(struct list);
  p->data = x;  /* safe, because alloc never returns NULL */
  p->next = l;
  return p;
}

For example, we can build a list of two elements using the functions above and then examine the elements using the arrow (->) notation.

% coin list.c0
Coin 0.2.7 "Penny" (r2209M, Thu Aug 18 14:07:47 EDT 2011)
Type `#help' for help or `#quit' to exit.
--> list p = cons(1,cons(2,nil()));
p is 0xFFAFFF50 (struct list*)
--> p->data;
1 (int)
--> p->next;
0xFFAFFF60 (struct list*)
--> p->next->data;
2 (int)
--> p->next->next;
NULL (struct list*)
--> 

Note that p itself is printed as an address, as is p->next.

A deceptively simple problem is that computing the length of a linked list. The problem is that a list could by cyclic, with the next pointer addressing some earlier element in the list. It is possible to check for this, but here we just make it an informally stated requirement. The code below will abort if given a cyclic list, because eventually the assert statement will fail and an error will be signaled.

/* length(l) = length of l */
/* l must be acyclic and shorter than 2^31 elements */
int length(list l)
//@ensures \result >= 0;
{
  int result = 0;
  while (l != NULL)
    //@loop_invariant result >= 0;
    {
      result = result+1;
      assert(result > 0);
      l = l->next;
    }
  return result;
}

Updated