Wiki

Clone wiki

docs / From_C0_to_C_-_Basics

C0 acts more or less like a "safe subset" of C. Transitioning from C0 to C can be a rocky road, unless you are very careful and do some planning ahead. But we have to warn you, no matter how much you prepare, there is no way to completely avoid many of C's nuances. They may include weird interpretations, implementation-dependent behavior and undefined program behavior. This tutorial will cover some of the basic ideas of transitioning from C0 to C programming. This guide assumes you are already familiar with C0 and now want to make a transition to C programming language. (A good resource for learning C is the book titled "The C Programming Language" by Brian Kernighan and Dennis Ritchie, who originally designed and implemented the C language. They were also involved in the design of the Unix operating system. The C language and an UNIX operating system are closely intertwined.)

This document has two main parts. In the first part, we show how a number of C0 types translate into C:

  • Arrays
  • Integers
  • Booleans
  • Pointers
  • Structs
  • Strings

In the second part, we cover some topics that are new in C.

  • Header files
  • C macros
  • Function pointers
  • Generic data types
  • C standard IO
  • Freeing memory

We are still developing a more a detailed guide at From C0 to C that covers extra concepts like file input/output that you will need in later courses.

C0 types in C

Lets take a look at some of the familiar C0 constructs and what they look like in C.

Arrays

An array is an aggregate data structure that contains data elements of the same type (we call arrays homogeneous data structures). In C0, we define arrays of type t as t[]. So we can have integer arrays as int[] or boolean arrays as bool[]. We allocated arrays in C0 as follows.

int[] A = alloc_array(int, 10);

In C, arrays that are explicitly allocated on the heap are indistinguishable from pointers. C does not know how long your array is or even what type of data it holds. In other words, it is the programmer's responsibility rather than the compiler's to track this information. So the C0 code above can be converted to C as follows.

int* A = calloc(10, sizeof(int));  <--- include <stdlib.h> to access calloc 
or
int* A = xcalloc(10, sizeof(int)); <--- a safe version of calloc is xcalloc (validates A != NULL)
                                   <--- include "lib/xalloc.h" or include "xalloc.h"

calloc is a function that is included in the C standard library <stdlib.h> with the prototype

void* calloc (size_t num, size_t size);  <--- size_t is defined in stddef.h as an unsigned long int  

calloc takes the number of elements as the first argument and the size of each element (in bytes) as the second argument and returns a memory block of total size (num×size) bytes (initialized to 0). The sizeof operator, which is implementation dependent, returns the number of bytes necessary to hold the data type. We have to write sizeof(int) because the size of an integer is implementation dependent - it could be 4 (4 bytes, or 32 bits), 8 (8 bytes, or 64 bits), or anything else.

The function calloc returns a void* (pointer to data of arbitrary type), an address that holds the starting location of a memory block of (num×size) bytes. It is possible that calloc may return a NULL pointer if it is not able to allocate the requested heap-allocated memory. We can avoid testing for NULL pointers by using xcalloc instead (included in the CMU-based xalloc library). The following code allocates an array of 10 integers and initializes the all elements to zero.

int* A = xcalloc(10, sizeof(int));

A here is just a pointer to a block of memory. Why is its type int important then? For one, the compiler can produce a warning or error if you are trying to use it inconsistently. Second, when we dereference the array A as A[i], then it needs to calculate the address of element i in the array based on the size of each element.

In C0, arrays are always heap-allocated. But in C, we can also allocate arrays on the stack. A stack-allocated array A of type t has the form

t A[exp];

The expression exp must be an integer expression, and it is undefined what happens if it evaluates to a negative number. The type t can be any standard type such int or char, a struct, or even another array.

Stack-allocated arrays are fixed in size and are discarded when variable A goes out of scope. You can read more about stack-allocated and heap-allocated arrays in the section on Declaring, Initializing and Accessing C arrays.

Integers

In C0, we used only 32-bit signed integers, and all C0 integers obey the rules of modular arithmetic. In C, there are both signed and unsigned integers, and there are many different sizes of integers, like, short int, int, or long int, that indicate allocation of different memory sizes. This is implementation dependent, however, so while short int is frequently 16 bits (2 bytes), int is frequently 32 bits (4 bytes), and long int is frequently 64 bits (8 bytes), you cannot depend on this behavior! Furthermore, characters (type char) are treated in C like signed integers, usually ones that take up one byte (8 bits), though again this is implementation-specific. The constant 'A' is a character constant equal to 65.

Signed versus unsigned integers Any integer declared as an int is by default a signed integer. Signed integers in C may not obey the laws of modular arithematic. But unsigned integers are always positive or zero and obeys the laws of modular arithmetic. If int's are represented with 32 bits (4 bytes), then unsigned int's range from 0 to 2^32 - 1.

Casting Integers of different sizes can be converted to each other, as well as signed and unsigned integers can be converted to each other, by casting. This provides a great deal of flexibility in managing data types in your code, but it may also introduce some strange behavior.

For example, a long int can be casted to an int as

long int x = 8348572718389549;   /*initialize x to some large value*/;
int y = (int)x;  /* cast long int x to an int y */

If int and long int are defined by the implementation to be the same size, then y will contain the same value x does after this code. On the other hand, if an int is 4 bytes (32 bits) and a long int is 8 bytes (64 bits), then y will contain a truncated version of the bits that were in x, which as an signed integer corresponds to -1891444435!

We can do the following.

int x = 321;
char ch = (char)x;

In this case, assuming sizeof(char) is 1, then ch will contain only the lowest order byte of x, which is 65 (the constant 'A'.

We can convert between signed and unsigned integers. For example,

int x = -10;
unsigned int y = (unsigned int)x;

In this case, -10 will be converted to an unsigned integer by adding UINT_MAX + 1 where UINT_MAX is defined in the header file <limits.h>. It is the maximal value that can be stored in an unsigned integer.

We can also mix expressions that contains signed and unsigned ints. For example,

unsigned int x = 45;
int y = -56;
int z = x + y;
unsigned int uz = x + y;

In the case of expression x + y, the signed integer y is converted to an unsigned integer before adding x and y in the unsigned mode. The result is then converted back to signed or unsigned type based on expression on the left hand side (int z or unsigned int uz).

Booleans

In C0, we used the type bool. But there is no data type bool in C. Instead we can use the library <stdbool.h> to have access to a defined type bool and constants true and false. Internally, these are just integers, where true is 1 and false is 0. Boolean operations interpret all non-zero values as true and zero as false. For example even if x and y are integers, we can still do the following.

x&y    <---  bitwise operator & is applied to x and y and returns 0 (false) or some non-zero value
x&&y   <---  logical operator && is applied to x and y and returns 0 or 1. 

Therefore using constructs such as

while (x&y) {...}    <--- this loop can run forever unless x&y=0 
while (x&&y) {...}   <--- this loop can run forever unless x=0 or y=0
while (1) {...}      <--- this is an infinite loop unless you break
while (0) {...}      <--- this loop never runs

are all legal, but may not be what you want.

Pointers

In C0 and C, pointers are defined the same way. A pointer is an address in memory. A pointer to a data type t is defined as t*. For example, in C and Co we can define a pointer to an integer as

int* ptr;   <--- define pointer in C and C0

One difference you find in C is that, you can use the unary address-of operator (&) to find the address of any data location. For example, consider the following C code.

int x = 10;
int* ptr = &x;
printf("The value of x is %d and the address of x is %p\n", x, ptr);

The output shows the value of x and the address of x. The unary & operator when applied to the operand x (&x) gives the address of where the operand x is stored in memory. Even pointer variables have addresses. Consider the following code where we print the address of the pointer variable ptr.

int x = 10;
int* ptr = &x;
printf("The address of the ptr is &ptr = %x\n", &ptr);

Validating pointers

Pointers in C must be validated by testing for NULL pointer, as in C0. When calling calloc or malloc (from stdlib.h) we always need to perform NULL checks as follows.

int* ptr = calloc(10,sizeof(int));
if (ptr != NULL) {...}

But you can avoid NULL checks by using the CMU-based (not standard C) library xalloc.h that contains the xcalloc and xmalloc functions.

A pointer can be dereferenced only if it is not NULL. Even then, you may dereference an illegal block of memory and can cause the program to display undefined behavior. For example, consider this code.

char ch = 'a';
int* ptr = (int*)&ch;  
*ptr = 1000;

What is the problem here? The char ch is allocated only the sizeof(char) bytes. But as we dereference the ptr, we dereference sizeof(int) bytes, an illegal access of memory that we may not get a warning about.

Structs

A struct is just an aggregate type, consisting of several data elements stored together of potentially different types. Compare this to arrays, which is an aggregate of elements of the same type. Structs must be explicitly declared. We use very similar syntax in C and C0 to declare structs. For example, a struct representing a point in two-dimensional space (such as the location of a pixel in an image), could be declared in C and C0 as

struct point {
 int x;
 int y;
};

The expression sizeof(struct point) returns the number of bytes necessary to hold a struct point data type. As we work with pointers and structs in C, we must be very careful. For example, sizeof(struct point) returns the total number of bytes necessary to hold the x and y fields in the struct. But sizeof(struct point*) only returns the bytes necessary to hold an address. It is also important to note that

sizeof(t*)

always returns the same value regardless of type t. That is, sizeof(int*), sizeof(char*), and sizeof(bool*) are all the same value.

Strings

In C and C0, string is a sequence of characters. In C0, strings are immutable, which means we cannot modify a string once it has been constructed. However in C, strings are mutable. Also C strings are NUL ('\0') terminated. Here is an example of C0 and C strings.

string s = "me";                   <--- C0 code
char* s = "me";                    <--- C code
   or
char* s = malloc(strlen("me")+1);  <--- C code
strcpy(s, "me");

Memory was implicitly allocated in C0 to hold the sequence of characters "me" (plus any overhead) and the string s was initialized to "me". In C, there are two ways to do the same. In the first approach

char* s = "me";    <--- C code
free(s);           <--- ERROR freeing of read-only memory
strcpy(s,"to");    <--- ERROR attempting to write to unallocated memory

s is pointing to some "read only" block of memory "me". No data can be written to this block nor data can be freed.

In the second approach

char* s = malloc(strlen("me")+1);  <--- C code
strcpy(s, "me");   <--- copy data to s
free(s);           <--- freeing heap-allocated memory is ok

we explicitly allocated the string of length "me" (equals to 2) plus 1 extra character for the NUL char, which is written '\0' and corresponds to the integer value 0. The behavior of the program is undefined if less memory is allocated or NUL character missing. We also note that the strcpy function included in the <string.h> library will add the NUL character to end of the string "me". You can read more about C Strings later in this tutorial.

New concepts for C

The first necessary concept is using the GCC compiler, which we always use with a standard set of flags: -Wall and -Wextra asks the gcc compiler to generate all warnings, and -Werror directs the compiler to treat warnings as errors, refusing to compile the code. -std=c99 directs GCC to apply the C99 standard, and -pedantic asks for this to be done pedantically. Finally -g keeps around certain helpful information. All told, this means that we would compile and run a file like test.c by writing

$ gcc -Wall -Wextra -Werror -std=c99 -pedantic -g test.c
$ ./a.out

Adding the flag -DDEBUG to GCC is the equivalent of running with -d in C0.

$ gcc -Wall -Wextra -Werror -std=c99 -pedantic -g -DDEBUG test.c
$ ./a.out

Header files for interfaces

In C0, we included a library like string by writing #use <string>, and we included a file containing C0 code by writing #use "thefilename.c0". This looks a bit like what happens in C with #include, but there is one critical difference: in C, we always separate out interfaces and implementations in two different files, and we only #include the header files containing the interface, never the files containing the implementation.

A C header file (.h) is where the interface goes. Interfaces may contain variables, function prototypes and other identifiers. For example, the stacks.h file may define the interface to our stack library. Our stacks.h header file may look like this:

/* stacks.h */

#include <stdbool.h>

#ifndef _STACKS_H_
#define _STACKS_H_

typedef struct stack_header* stack;
bool stack_empty(stack S);      /* O(1) */
stack stack_new();              /* O(1) */
void push(stack S, void* e);    /* O(1) */
void* pop(stack S);             /* O(1) */
void stack_free(stack S);       /* O(1), S must be empty! */

#endif

If we want to use the stack library in a different file like stacks-test.c, we would write #include "stacks.h" in stacks-test.c just as we wrote #use "stacks.c0" in C0. Again, the difference between C0 and C is that, in C, we don't include the implementation directly; if we write #include "stacks.h" in stacks-test.c, we have to compile them together by writing

The actual implementation goes in the file stacks.c, which has to be included on the command line:

$ gcc -Wall -Wextra -Werror -std=c99 -pedantic -g -DDEBUG stacks.c stacks-test.c
$ ./a.out

C macros

A macro allows us to define expressions that is replaced by a text during pre-compilation process. A generic form of a macro substitution is

#define NAME replacement_text

Here is an example.

#define SIZE 10

allows us to use size throughout our program. But size is actually replaced by 10 after the pre-processing stage - there is never a variable named SIZE. This is very powerful, but it can be very confusing because it is just a text replacement, and the replacement is made without any. One way that we deal with this is the convention that the names of things that we #define, like SIZE, are written in ALL CAPS.

We can do other interesting things with macro substitutions. For example we can define a C macro for C0 alloc function as follows,

#define alloc(t) (xcalloc(1,sizeof(t)))

This defies the convention we just mentioned, which would demand that we write ALLOC(t) instead of alloc(t). But that is just a convention. This allocation macro function allows us to use C0 construct alloc in our C code without having to convert all alloc statements to calloc since compiler will take care of that during pre-preprocessing stage. Similarly, we can use a macro substitution for C0 function alloc_array as

#define alloc_array(t,n) (xcalloc(n,sizeof(t)))

In both cases, after the pre-processing stage, any code with alloc(t) will be replaced by the corresponding macro definition. For example, here is how alloc looks in your C0 code and after conversion to xalloc in C code.

int* ptr = alloc(int);                <--- C0 code

int* ptr = (xcalloc(1,sizeof(int)));  <--- C code after macro substitution

C macros are a great way to improve the readability and manageability of your code. Also we use C macros to write C0 style @requires, @ensures and @assert statements. You can read more at "contracts" in C.

Function pointers

Function is itself is not a variable. But we can define pointers to functions, we call function pointers. The function pointers can be stored in arrays, passed to other functions and can be returned from other functions. A notation to declare the type of a function pointer is somewhat opaque. For example, a declaration such as

int (*fn)(int x, int y);

actually declares a function pointer fn. The main confusion here is where the actual function pointer is placed in the declaration. Unlike declaring other pointers (eg: int* ptr) where the pointer variable ptr is placed at the end, in the above declaration, the pointer variable is actually placed in the middle.

Getting function pointers

We don't allocate space for function pointers like we do for pointers to integers or structs; instead, we use the address-of operation to get the address of an existing function. To use the function pointer declared above, we need an actual function with the same signature. For example, we can consider a simple sum function as

int sum(int x, int y) {return x+y;}

and use function pointer defined above as

*fn = &sum;   

Since fn is a function pointer, we dereference the pointer as (*fn).

Using function pointers

Having access to function pointers in C makes it possible to do elegant things such as passing a function as an argument to another function. A good example of passing a function pointer to another function is the qsort function. The qsort function prototype defined as

void qsort(void *base, size_t nmemb, size_t size, int(*compar)(const void *a, const void *b)); 

takes a pointer to a compare function as its fourth argument. Here is an example of a compare function that can be used with qsort above.

int intcompare(const void* a, const void* b){
    int x = *((int*)a);   <--- cast void* to int* before dereference
    int y = *((int*)b);   <--- cast void* to int* before dereference
    return (x-y);
}

The intcompare function returns positive, zero or negative value based on content stored at a and b. The qsort function can then be used as follows.

int A[3] = {4,1,3};   <---- define a stack-allocated array of size 3 and initializes elements
qsort(A, 3, sizeof(int), &intcompare);  <--- calls the qsort function on array A

More details about how to use function pointer are given in the section on Function Pointers.

Generic data types in C

A useful feature of C is the ability to define a pointer to an arbitrary type. We call this void*. Defining a pointer to an arbitrary type allows us to develop generic C libraries without having to bind data to any particular type. The type void* can also be used for multiple purposes. Here we define a stack-allocated array of pointers and initialize each element to point to an array of ints.

#define n 3
int A[n]={3,1,2};
void* ptr[n];
for (int i=0; i<n; i++) {ptr[i]=&A[i];}

Now if we need to print the content of array A using the indirect references in ptr array, we must do the following.

for (int i=0; i<n; i++) {
  printf ("%d ", *((int*)ptr[i]));
}

Since we can never dereference a void, we cast the pointer to an int before dereferencing as an int. A longer discussion on Generic Data Types in C is given later.

Standard IO

In C0, the conio library contained functions for performing basic console input and output. First we will show how to do the same in C. lets consider all the print functions we used in C0. Here are the C0 functions used to write to standard output (stdout)

void print(string s); /* print s to standard output */  <--- in C0
void println(string s); /* print s with trailing newline */
void printint(int i); /* print i to standard output */
void printbool(bool b); /* print b to standard output */
void printchar(char c); /* print c to standard output */

All of the above functions are now replaced by printf function given by the following prototype.

int printf (const char* format, ... );

Where the format is a string(char*) and three dots(...) indicates variable number of arguments. For example, a signed integer x can be printed in C0 and C as follows.

printint(x);     <--- in C0

printf("%d",x);  <--- in C

In the latter, x is printed as a signed decimal integer format %d. There are many other ways printf can be used as shown below.

printf("%d\n",x);  <--- print x and then a newline
printf("%s", s);   <--- print s as a string
printf("%c", 65);  <--- print ASCII 65 as character A
printf("%u", x);   <--- print x as an unsigned integer
printf("%x", 65);  <--- print 65 in hexadecimal
printf("%s + %d", "me", 65);  <--- prints the string "me + 65"

Reading from standard input (stdin)

C provides the function scanf for reading from standard input. The scanf function prototype is given as

int scanf (const char* format, ... );

Here is an example of reading a signed integer in C

int x; <--- define a signed integer in C
scanf("%d",&x); <--- read into x using scanf. Note the &x as the second argument in scanf

Note that the second argument in scanf is the address of x. Since static memory for x is already assigned by the previous statement this works. Caution:Forgetting to write & in scanf will cause to program to display undefined behavior.

Here is another example of reading into a string.

string s = readline(); <--- reads a string in C0

scanf("%s",s);         <--- reads into C string s. Note that s is a char* and hence no & is used 

Caution: Unlike in C0, we need to have memory pre-allocated for reading into s. If no memory or insufficient memory is allocated for s, the program can have undefined behavior. These statements in C also can cause "buffer overflow" a serious vulnerability exploited by hackers. We provide more about input/output in the section on File I/O.

Freeing heap-allocated memory

In C0, we allocated memory using alloc and alloc_array. We did not worry about garbage collection as C0 takes care of that for us (so are other languages like Java and C#). But C is different. You must free any heap-allocated memory. A heap-allocated memory block is obtained by calling malloc/xmalloc, calloc/xcalloc or realloc. Failure to free any heap allocated memory may cause memory leaks in the program that can affect the system performance in a long running program situation. Therefore you are always encouraged to catch any memory leaks using valgrind. You are NEVER allowed to free any stack-allocated or read-only memory. Here are some examples.

int* ptr = xcalloc(10,sizeof(int));   <--- a heap-allocated memory to hold an integer array of size 10
/* some code */
free(ptr);                            <--- heap-memory is deallocated
int A[10];                            <--- a stack-allocated array of size 10
free(A);                              <--- ERROR, freeing of stack-allocated memory
char* s ="me";
strcpy(s,"to");                       <--- ERROR, no memory was explicitly allocated for s
free(s);                              <--- ERROR, freeing of READ-ONLY memory

We also note that a heap-allocated memory can be freed by any reference to the memory block (even an alias). But we must avoid double freeing of memory using a pointer and an alias.

We are still developing a more a detailed guide at From C0 to C that covers extra concepts like file input/output that you will need in later courses. If you have any comments, suggestions or questions, please send email to guna@cs.cmu.edu

Updated