Wiki

Clone wiki

docs / Ints

The type int is the only numeric type in C0. We would like to think of it as the type of integers ..., -3, -2, -1, 0, 1, 2, 3, ... but this is not quite correct because its values only cover a fixed range, namely from -2^31 to 2^31 - 1, inclusively. To understand the reasons requires a brief detour to talk about the underlying machine model; to understand what this means for the usual arithmetic operations requires a brief detour to talk about modular arithmetic and two's complement representation. After this we discuss the arithmetic operators, bit-level operators, and shift operators.

Machine Model

C0 is a relatively low-level, imperative language that was designed to have a close relationship to the code executed by the processor of your computer. This processor's instructions directly operate on registers that have a fixed size. This native word size is the number of bits (0 or 1) that can be stored and manipulated by single instructions. Throughout history the word size of processors has increased, as design and manufacturing techniques have improved. Modern processors have a word size of 64 bits, although they are often used as if the word size was just 32 bits. In order for the basic operators to map directly to machine instructions, the size of the basic representation of integers in C0 was chosen to be 32 bits.

How many different values of type int are there? One bit can have 2 values (0 or 1), two bits have 2 * 2 = 4 values, three bits have 2 * 2 * 2 = 8 values, etc., since we can chose the bits independently. In general, a word of k bits can have 2^k different values. So the type int has 2^32 = 4,294,967,296 (about 4 billion) different values. If you know something about the binary number system, you might expect these values to be 0, 1, 2, 3, ..., 4,294,967,295, where 0 is represented as 00000000000000000000000000000000 and 4,294,967,295 as 11111111111111111111111111111111. However, this would mean we would have no negative numbers at all (and it would seem pretty convenient to have, for example, -1 as a value). Furthermore, we have to consider what happens if operations such as addition overflow. To understand these two issues, we need a basic understanding of modular arithmetic and two's complement representation.

Modular Arithmetic

The basic idea behind modular arithmetic is to fix a modulus m > 1 and then restrict numbers to the range 0, 1, 2, ..., m-1. We count as follows: 0, 1, 2, ..., m-1, 0, 1, 2, ..., m-1, 0, ..., so the the successor to m-1 is 0. Another way to think about it is that every arithmetic operation is performed modulo m, that is, we always take the remainder after dividing by m.

For example, if we perform arithmetic modulo 16, then 15+3 = 2 (mod 16) because 15+3 = 18, and 18 divided by 16 leaves a remainder of 2. It is remarkable that almost all the usual arithmetic laws you have learned in high school still hold when we do modular arithmetic. For example, multiplication distributed over addition (a + b) * c = a * c + b * c (mod m) for any modulus m. But we have to be careful with comparisons. For example, it is not necessarily the case that a+1 > a because we wrap around back to 0 (for example, 15+1 < 15 (mod 16)).

The elegant answer to the question how the processor performs arithmetic given a fixed word size w is arithmetic modulo 2^w. We just need to figure out how to get negative numbers into the picture...

Two's Complement Representation

Assume we have the natural numbers 0, 1, 2, .... Then -a is the number b such that a + b = 0. What happens when we apply this idea to modular arithmetic? Let's consider the case of m = 16. What would be -1? It would be the number b such that 1 + b = 0 (mod 16). Perhaps surprisingly such a number exists, namely 15. So -1 = 15 (mod 16). That make sense, since we can always add or subtract 16 from any number and get the same number modulo 16. What's -2? Following the same reasoning, we have -2 = 14 (mod 16) since 2 + 14 = 0 (mod 16). So we don't have to invent the negative numbers; they already exist in modular arithmetic!

Taking things a little further, what is -15? Since 15 + 1 = 0 (mod 16) we see that -15 = 1 (mod 16). We are looking at a table like this, where the number is each row are equal modulo 16.

... -16   0  16 ...
... -15   1  17 ...
... -14   2  18 ...
... -13   3  19 ...
... -12   4  20 ...
... -11   5  21 ...
... -10   6  22 ...
...  -9   7  23 ...
...  -8   8  24 ...
...  -7   9  25 ...
...  -6  10  26 ...
...  -5  11  27 ...
...  -4  12  28 ...
...  -3  13  29 ...
...  -2  14  30 ...
...  -1  15  31 ...
Arithmetic modulo 16 corresponds to a fictional machine with word size 4, because 2^4 = 16. On such a machine there are 16 different bit patterns, from 0000 to 1111. For each row in the table we have to decide which number we want it to represent. The obvious choice would be to take the middle column, which gives us the integers 0, 1, ..., 15. This is the so-called unsigned representation which is available in C but not in C0. Another choice is to have an equal number of positive (including 0) and negative integers. In that case we would have -8, -7, ..., -1, 0, 1, ..., 7. This is the choice highlighted in bold in the table. This is called the two's complement representation.

Two's complement representation has many remarkable properties. Several important ones to remember:

  • The underlying arithmetic operations, such as addition and multiplication, are simply performed in modular arithmetic. No sign bit or other information needs to be taken into account.
  • The range of numbers is from -2^(k-1) to 2^(k-1) - 1 for a word size of k (and therefore a modulus of 2^k).
  • The most significant bit of the underlying binary representation is 1 for negative numbers, 0 for positive numbers (including 0).
  • We can compute -a by complementing each bit in the representation of a and then adding 1.

Arithmetic Operators

The arithmetic operators of C0 are: * -e unary minus

  • e1 * e2 multiplication
  • e1 / e2 integer division
  • e1 % e2 integer modulus

  • e1 + e2 addition

  • e1 - e2 subtraction

The operators grouped together (first *, /, % and second +, -) have the same precedence and are disambiguated from left to right. For example, a+b/c*d is parsed as a+((b/c)*d). If the precedence of the operators is not obvious, it is good style to write explicit parenthesis to make code more readable and prevent misinterpretations. A good way to write the expression above would be a + (b/c)*d.

Addition, subtraction, multiplication, and negation are the standard operations from modular arithmetic; the integer division (/) and modulus (%) are a little trickier and explained next.

Since C0 does not have rational numbers or real numbers in floating point format, division e1 / e2 is integer division. In order to return an integer it rounds its result toward 0. For example:

% coin
Coin 0.2.6 "Penny" (r2087, Fri May 27 12:10:21 EDT 2011)
Type `#help' for help or `#quit' to exit.
--> 3/2;
1 (int)
--> 5/2;
2 (int)
--> 7/8;
0 (int)
--> 

For negative numbers, we follow the same rule rounding toward 0, which means that (-a)/b = -(a/b).

--> -1/2;
0 (int)
--> -7/8;
0 (int)
--> -7/3;
-2 (int)
--> 

Now the modulus e1 % e2 is defined by the property

a = (a/b)*b + a%b

Rounding negative results towards 0 actually increases their value, so a % b must be negative to compensate for that in some cases. Consider that 1/7 = -1/7 = 1/-7 = -1/-7 = 0, so for the equation above we need:

--> 1%7;
1 (int)
--> -1%7;
-1 (int)
--> 1%-7;
1 (int)
--> -1%-7;
-1 (int)
--> 

Another special consideration introduced by division and modulus is arithmetic safety violation, a form of runtime error. Because we are performing modular arithmetic, addition, subtraction, multiplication, or negation do not introduce any possibility of runtime errors. However, dividing any integer by zero and dividing the minimal integer (-2^31) by -1, results in a runtime error (even though it might be reasonable to just return -1 * (-2^31) = -2^31 (mod 2^32) in this last case). An arithmetic safety violation is also signaled for -2^31 % -1, for consistency.

The effect of an arithmetic safety violation (only the two special cases mentioned in the preceding paragraph!) is to terminate the execution of the current program immediately and print a short message indicating that an error occurred (for more exceptions that might occur, see debugging-C0-Programs).

Comparison Operators

Besides the arithmetic operators that take integers to integers, we also have comparison operators that take integers to booleans. Some of these are overloaded in that they also apply at other types. They are summarized in the table below. * e1 < e2 less than * e1 <= e2 less or equal * e1 >= e2 greater or equal * e1 > e2 greater than

  • e1 == e2 equal
  • e1 != e2 not equal

The comparisons are based on the two's complement representation of 32-bit integers, so some "natural" laws regarding comparisons may not be satisfied. For example, it is not always the case that a+1 > a, because adding 1 to the maximal integer yields the minimal integer.

Bit-Level Operators

There are several operators on values of type int that are better understood by looking at the bit patterns of the underlying 32-bit words rather than thinking of them are representing integers in two's complement representation. We call these bit-level operators. They are based on one-bit operations that are applied simultaneously to all 32 bits of a value of type int. These one bit operators are bit-wise negation (~), and the binary operators bit-wise and (&), bit-wise exclusive-or (^), and bit-wise or (|). Negation simply replaces 0 by 1 and vice versa. The binary operators are defined by the following tables.

  and           xor           or

&  0  1       ^  0  1       |  0  1
-------       -------       -------       
0  0  0       0  0  1       0  0  1
1  0  1       1  1  0       1  1  1
Negation (~) has highest precedence, followed by and (&), then exclusive-or (^), then or (|). However, it is good style not to rely on operator precedence for binary operators but explicitly use parentheses to disambiguate bit-level operators to make the code easier to read and understand.

Hexadecimal Notation

When integers are considered as 32 bits words (as for the bit-level operators) and also the shift operators, the decimal integer notation is awkward. We might consider binary notation, but sequences of 32 0's and 1's are difficult to type and to parse. A useful solution is to group the bits into segments of four. A sequence of 4 bits can represent 2^4 = 16 different numbers, so the usual decimal notation for digits is not sufficient. Instead, we use the 16 digits 0123456789ABCDEF, representing the numbers 0 through 15. So A represents 10, B 11, C 12, D 13, E 14, and F 15.

Some numbers would be ambiguous. For example, 11 could either by the (decimal) number 11, or it could be the hexadecimal number whose value would be 16+1 = 17 in decimal notation. In order to disambiguate, we precede hexadecimal numbers with 0x which can otherwise not be legal in the program except inside a string. A 32-bit number is represented by 8 hexadecimal digits. For example, the largest positive integer (one 0 followed by all 1's) would be written as 0x7FFFFFFF; the minimal negative integer (one 1 followed by all 0s) would be written as 0x80000000.

Shift Operators

Shift operators are operators on values of type int that have some characteristics of arithmetic operators and some of bit-level operators. The left shift (w << k) shifts each bit in the representation of w to the left by k positions. On the right, it adds zeroes. To see what this means, we need to know that the bits of an integer are displayed with the least significant on the left and the most significant on the right.

b_31 b_30 ... b_0

Here, bit 31 is also referred to as the sign bit, because the number is negative if b_31 = 1. Modulo 2^32, would be b_312^31 + b_302^30 + ... + b_0*2^0. Then we have to subtrace 2^32 if the sign bit b_31 is 1 in order to get the correct negative number.

Shifting to the left by one position multiplies the value of each bit by 2, so it corresponds to a multiplication by 2, modulo 2^32. Shifting to the left by k therefore multiplies k times by 2 (which is the same as multiplying by 2^k). This provides a convenient shorthand for writing powers of 2. For example, 1 << 20 is 2^20.

The right shift (w >> k) shift each bit in the representation of w to the right by k positions. Shifting to the right by one position decreases the value of each bit by 2, so it corresponds to a division by 2, modulo 2^32. Since we drop the rightmost (= least significant) bit, this truncates the result. If a is positive, then a >> 1 = a / 2 since both round towards zero.

But what do we do with the most significant bit after the right shift? We could fill in zeroes, which would mean the result is always positive. If we fill in ones, then the result is always negative. If we want to preserve the sign of the number, we can keep the most significant bit whatever it was before the shift. This is called an arithmetic right shift because it corresponds to a division by 2 for both positive and negative numbers. However, unlike integer division it always rounds down (towards -&infinity;) as can be seen from the case for -1. So even arithmetic right shift is not quite like division by zero.

The following experiments confirm this.

% coin
Coin 0.2.6 "Penny" (r2087, Fri May 27 12:10:21 EDT 2011)
Type `#help' for help or `#quit' to exit.
--> 13 << 1;
26 (int)
--> -3 << 1;
-6 (int)
--> 25 >> 1;
12 (int)
--> -25 >> 1;
-13 (int)
--> -25 / 2;
-12 (int)
--> 

From the definition, we can see that shifting w << k or w >> k is meaningful when 0 ≤ k < 32. One could simply continue to shift even for larger k so that, for example, w << 33 would always be equal to 0. Most processors, however, will interpret the shift quantity as one between 0 and the word size, 32 in the case of the word size adopted for C0. They do this by masking all bits but the lower 5, since 2^5 = 32. We can write this masking expression as k & 31, since 31 has a 1 in the lowest five bits and 0 everywhere else. So when we write w << k or w >> k, w is really shifted by k & 31 bits. If k is positive, this is the same as k % 32.

Even though the result is well-defined, one should never shift by any quantity not between 0 and 31, because it is most likely a mistake.

Updated