Commits

Simon Claret committed c4fe339

add c/c++ projects

Comments (0)

Files changed (13)

Binary file added.
+/*
+ Background Information on Pipelining (source: csapp):
+
+ The instructions supported by a particular processor make up its Instruction Set Architecture.
+ Each instruction performs a primitive operation such as adding two numbers. The ISA model would
+ seem to imply sequential instruction execution, where each instruction is fetched and executed to
+ completion before the next one starts.  However, by executing different parts of multiple instructions
+ simultaneously, the processor can achieve higher performance than if it executed just one
+ instruction at a time.  Special mechanisms are used to make sure that the processor computes the same
+ result as it would with sequential execution.  Hence, modern processors use clever tricks to improve
+ performance while maintaining the functionality of a simpler and more abstract model.
+
+ It would be possible to design an X86 processor based on sequential operation which executes a complete
+ X86 instruction on every clock cycle, but the clock would have to run slowly enough to allow an entire
+ series of actions to complete within one clock cycle.  As a consequence, performance would be well below
+ what could be achieved with this much hardware!
+
+ By applying a series of transformations to this sequential processor, we can create a pipelined processor
+ which breaks the execution of each instruction into steps, each of which is handled by a separate section
+ (stage) of the hardware.  Instructions progress through the stages of the pipeline, with one instruction
+ entering the pipeline on each clock cycle.  Making this processor preserve the sequential behavior of the
+ X86 ISA requires handling a variety of hazard conditions, where the location or operands of one instruction
+ depend on those of other instructions that are still in the pipeline.
+
+ Stage 1: Fetch
+ The fetch stage reads the bytes of an instruction from memory, using the program counter (PC) as the memory
+ address.
+
+ Stage 2: Decode
+ The decode stage reads up to two operands from the register file.
+
+ Stage 3: Execute
+ In the execute stage the arithmetic/logic unit (ALU) either performs the operation specified by the
+ instruction, computes the effective address of a memory reference, or increments or decrements the stack
+ pointer.
+
+ Stage 4: Memory
+ The memory stage may write data to memory or read data from memory.
+
+ Stage 5: Write back
+ The write-back stage writes up to two results to the register file.
+
+ Stage 6: PC update
+ Th PC is set to the address of the next instruction.
+
+ Background Information on Caching (source: csapp):
+
+ The L1 and L2 caches are managed entirely by hardware logic built into the caches.  In most cases, they
+ operate automagically and do not require any explicit actions from the program.
+
+ Temporal locality:  Once a data item has been copied into the cache on the first miss, we can expect a
+ number of subsequent hits on that object.
+
+ Spatial locality: Cache blocks contain multiple data objects.  Because of spatial locality, we can expect
+ that the cost of copying a block after a miss will be amortized by subsequent references to other objects
+ within the block.
+
+ How does a cache work?  Well, suppose the CPU executes an instruction that reads a word w.  The CPU
+ requests the word from the L1 cache.  If the L1 cache has a cached copy of w, then we have an L1 cache
+ hit so the cache quickly extracts w and returns it to the CPU.  Otherwise, we have a cache miss and the
+ CPU must wait while the L1 cache requests a copy of the block containing w from main memory.  When the
+ requested block finally arrives from memory, the L1 cache stores the block in one of its cache lines,
+ extracts word w from the stored block, and returns it to the CPU.
+
+ The cache line size of a SUN UltraSPARC-III+ is 64 bytes
+ (Source: http://www.sun.com/processors/manuals/805-0087.pdf page 307)
+
+ Information on cache structure: http://en.wikipedia.org/wiki/CPU_cache
+ Information on out of order execution: http://en.wikipedia.org/wiki/Out-of-order_execution
+ Inspiration for methodology: http://ixbtlabs.com/articles2/amd-hammer-family/amd-hammer-family2-add1.html
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/time.h>
+#include <math.h>
+#include <pthread.h>
+#include <string.h>
+
+#define DEBUG 0 //1 = enable, 0 = disable
+#define NUMTESTCYCLES 10
+
+/*
+ * sets up the list of pointers (randomized)
+ *
+ */
+void setuplist (int randomize, int list[], int numelem)
+{
+	int i;
+
+	if (randomize == 1)
+	{
+		int ua[numelem]; //unique array
+
+		for(i=0; i<numelem; i++)
+		{
+			ua[i] = (int) &list[i];
+		}
+		int unused_ua = numelem;
+
+		int target;
+		int current_slot = 0;
+		while(unused_ua > 1)
+		{
+			target = rand()%numelem;
+			while(target == current_slot || ua[target] == -1)
+			{
+				target++;
+				if (target >= numelem) target = 0; //wrap
+			}
+			list[current_slot] = ua[target];
+			ua[current_slot] = -1;
+			unused_ua--;
+			current_slot = target;
+		}
+		list[current_slot] = (int) &list[0];
+	}
+	else {
+		for(i=0; i<numelem; i++)
+		{
+			list[i] = (int) &list[i+1];
+		}
+		list[i-1] = (int) &list[0];
+	}
+
+	if (DEBUG == 1) printf("List setup complete: %d elements\n", numelem);
+}
+
+/*
+ * Ensures that the list is consistent:
+ * -each element points to another element
+ * -the correct last element points back to the first element
+ */
+void verifylist(int list[], int numelem)
+{
+	int i;
+
+	if (DEBUG == 1)
+	{
+		printf("List layout:\n");
+		for(i=0;i<numelem;i++)
+		{
+			printf("\t%d: %d\n",&list[i], list[i]);
+		}
+	}
+
+	if (DEBUG == 1)  printf("Crawl list:\n");
+	int *p = &list[0];
+	if (DEBUG == 1) printf("\t%d: %d\n", p, *p);
+	for(i=1;i<numelem;i++)
+	{
+		p = (int *) *p;
+		if (DEBUG == 1) printf("\t%d: %d\n", p, *p);
+	}
+	if (*p == (int) &list[0])
+	{
+		printf("List is well formed: %d elements, last element points to first element.\n", i);
+		//printf("Last element: %d, %d\n", *p, &list[0]);
+	}
+	else
+	{
+		printf("Error: list malformed.\n");
+	}
+
+}
+
+int dotest_10(int list[], long int sample_size)
+{
+	long int i;
+	int p = list[0];
+	int z = 0;
+	for(i=0; i<sample_size; i++)
+	{
+		p = *(int *)p;p = *(int *)p;p = *(int *)p;p = *(int *)p;p = *(int *)p;
+		p = *(int *)p;p = *(int *)p;p = *(int *)p;p = *(int *)p;p = *(int *)p;
+	}
+	char bla[10];
+	sprintf( bla, "%d\n", p); //prevent compiler optimization
+	return p;
+}
+
+int dotest_20(int list[], long int sample_size)
+{
+	long int i;
+	int p = list[0];
+	int z = 0;
+	for(i=0; i<sample_size; i++)
+	{
+		p = *(int *)p;p = *(int *)p;p = *(int *)p;p = *(int *)p;p = *(int *)p;
+		p = *(int *)p;p = *(int *)p;p = *(int *)p;p = *(int *)p;p = *(int *)p;
+		p = *(int *)p;p = *(int *)p;p = *(int *)p;p = *(int *)p;p = *(int *)p;
+		p = *(int *)p;p = *(int *)p;p = *(int *)p;p = *(int *)p;p = *(int *)p;
+	}
+	char bla[10];
+	sprintf( bla, "%d\n", p); //prevent compiler optimization
+	return p;
+}
+
+double q1a_helper(int verbose, int slist[], int ssize, long int sample_size)
+{
+	clock_t t0, t1;
+	double t10_wall;
+	double t20_wall;
+	double seconds_per_memory_access;
+	double nanoseconds_per_memory_access;
+	double accum = 0;
+	int i;
+	for (i=0; i<NUMTESTCYCLES; i++)
+	{
+		t0 = clock();
+		dotest_10(slist, sample_size);
+		t1 = clock();
+		t10_wall = (double) (t1-t0)/CLOCKS_PER_SEC;
+		t0 = clock();
+		dotest_20(slist, sample_size);
+		t1 = clock();
+		t20_wall	= (double) (t1-t0)/CLOCKS_PER_SEC;
+		seconds_per_memory_access= (double)(t20_wall - t10_wall)/(sample_size*10);
+		nanoseconds_per_memory_access = seconds_per_memory_access * 1.0e9;
+		accum += nanoseconds_per_memory_access;
+		if (DEBUG == 1) printf("\trun %d: %1.15fns (%d samples)\n", i, nanoseconds_per_memory_access, sample_size * 10);
+	}
+	nanoseconds_per_memory_access = accum/NUMTESTCYCLES;
+	if (verbose == 1) printf("time per memory accesses: %1.15fns (%d samples)\n", nanoseconds_per_memory_access, sample_size * 10);
+	return nanoseconds_per_memory_access;
+}
+
+/*
+
+ METHODOLOGY:
+
+ In order to determine the latency of a local memory access (from the L1 cache) we build an
+ array of pointers and measure the amount of time that it takes to follow the pointers.  Since each step
+ (travelling to the next pointer in the list) depends on the previous step (reading the address of the
+ pointer), the processor will perform the loads in sequential fashion without pipelining them.
+
+ In order to ensure that we are only counting the time taken by loads (and not the overhead of the loop)
+ we will substract the overhead from our final results.
+
+ We will compare the performance when the pointers in the nodes are randomized to when they are not, in order
+ to assess whether or not performance is affected by prefetching.
+
+ RESULTS:
+
+ The results from test #1 allow us to conclude that the latency of an L1 cache access is approximately 1.9ms
+ on galiano and that when all elements fit into the L1 cache (in this case 2000 elements = 8000 bytes)
+ prefetching is not performed.
+
+ The results from test #2 allow us to conclude that the latency of a L2 cache access is approximately 8.3ms
+ on galiano (it is actually slightly greater because the L1 cache is probably accelerating some of these
+ accesses). We also observer that when there are more elements than can fit in the L1 cache (in this case
+ 50000 elements = 200000 bytes) then many elements have to be brought into the L1 cache from the L2 cache.
+ Interestingly, prefetching appears to halve the effective memory access time in this situation.
+
+ The results from test #3 allow us to estimate that the L1 cache size on galiano is ~64000 bytes (technical
+ documentation indicates that it is 64 kilobytes).
+
+ The results from test #4 allow us to conclude that the latency of an L1 cache access on the core 2 duo
+ is approximately 1.9ms and that when all elements fit into the L1 cache (in this case 2000 elements =
+ 8000 bytes) prefetching is not performed. They also allow us to estimate that the L1 cache size on the
+ core 2 duo is ~32000 bytes (technical documentation indicates that it is 32 kilobytes).
+
+ The results from test #5 are interesting when compared to test #2, because they show that prefetching
+ on the core 2 duo increases performance 5.675337 / 1.264509 = 4.49x, whereas it only increases
+ performance by 8.303333 / 4.153333 = 2.00x on the UltraSPARC-III+.
+
+ TEST #1 on galiano (UltraSPARC-III+)
+
+ galiano:~/cs448b/hw3> ./q1
+ Question 1a:
+ List is well formed: 2000 elements, last element points to first element.
+ Test with randomization:
+ time per memory accesses: 1.900000000000000ns (10000000 samples)
+ time per memory accesses: 1.920000000000000ns (50000000 samples)
+ time per memory accesses: 1.920000000000000ns (100000000 samples)
+ average time per memory access: 1.913333ns
+
+ List is well formed: 2000 elements, last element points to first element.
+ Test without randomization:
+ time per memory accesses: 1.900000000000000ns (10000000 samples)
+ time per memory accesses: 1.920000000000000ns (50000000 samples)
+ time per memory accesses: 1.890000000000000ns (100000000 samples)
+ average time per memory access: 1.903333ns
+
+ TEST 2 on galiano (UltraSPARC-III+)
+
+ galiano:~/cs448b/hw3> ./q1
+ Question 1a:
+ List is well formed: 50000 elements, last element points to first element.
+ Test with randomization:
+ time per memory accesses: 8.100000000000001ns (10000000 samples)
+ time per memory accesses: 8.400000000000000ns (50000000 samples)
+ time per memory accesses: 8.410000000000000ns (100000000 samples)
+ average time per memory access: 8.303333ns
+
+ List is well formed: 50000 elements, last element points to first element.
+ Test without randomization:
+ time per memory accesses: 4.100000000000000ns (10000000 samples)
+ time per memory accesses: 4.180000000000000ns (50000000 samples)
+ time per memory accesses: 4.180000000000000ns (100000000 samples)
+ average time per memory access: 4.153333ns
+
+ TEST 3 on galiano (UltraSPARC-III+)
+
+ galiano:~/cs448b/hw3> ./q1
+ amount of data: 12000 bytes (3000 elements), time per access: 1.700000ns
+ amount of data: 16000 bytes (4000 elements), time per access: 2.000000ns
+ amount of data: 20000 bytes (5000 elements), time per access: 1.900000ns
+ amount of data: 24000 bytes (6000 elements), time per access: 2.100000ns
+ amount of data: 28000 bytes (7000 elements), time per access: 2.100000ns
+ amount of data: 32000 bytes (8000 elements), time per access: 1.900000ns
+ amount of data: 36000 bytes (9000 elements), time per access: 2.100000ns
+ amount of data: 40000 bytes (10000 elements), time per access: 1.900000ns
+ amount of data: 44000 bytes (11000 elements), time per access: 2.100000ns
+ amount of data: 48000 bytes (12000 elements), time per access: 1.700000ns
+ amount of data: 52000 bytes (13000 elements), time per access: 2.100000ns
+ amount of data: 56000 bytes (14000 elements), time per access: 1.700000ns
+ amount of data: 60000 bytes (15000 elements), time per access: 1.900000ns
+ amount of data: 64000 bytes (16000 elements), time per access: 1.900000ns
+ amount of data: 68000 bytes (17000 elements), time per access: 2.500000ns
+ amount of data: 72000 bytes (18000 elements), time per access: 3.400000ns
+ amount of data: 76000 bytes (19000 elements), time per access: 3.900000ns
+ amount of data: 80000 bytes (20000 elements), time per access: 4.400000ns
+ amount of data: 84000 bytes (21000 elements), time per access: 5.000000ns
+ amount of data: 88000 bytes (22000 elements), time per access: 5.600000ns
+ amount of data: 92000 bytes (23000 elements), time per access: 5.600000ns
+ amount of data: 96000 bytes (24000 elements), time per access: 6.500000ns
+ amount of data: 100000 bytes (25000 elements), time per access: 6.200000ns
+ amount of data: 104000 bytes (26000 elements), time per access: 7.000000ns
+ amount of data: 108000 bytes (27000 elements), time per access: 7.500000ns
+ amount of data: 112000 bytes (28000 elements), time per access: 8.000000ns
+ amount of data: 116000 bytes (29000 elements), time per access: 8.200000ns
+ amount of data: 120000 bytes (30000 elements), time per access: 8.900000ns
+ amount of data: 124000 bytes (31000 elements), time per access: 9.200000ns
+ amount of data: 128000 bytes (32000 elements), time per access: 9.600000ns
+ amount of data: 132000 bytes (33000 elements), time per access: 9.900000ns
+ amount of data: 136000 bytes (34000 elements), time per access: 10.000000ns
+ amount of data: 140000 bytes (35000 elements), time per access: 9.800000ns
+ amount of data: 144000 bytes (36000 elements), time per access: 9.400000ns
+ amount of data: 148000 bytes (37000 elements), time per access: 9.500000ns
+ amount of data: 152000 bytes (38000 elements), time per access: 9.600000ns
+ amount of data: 156000 bytes (39000 elements), time per access: 9.100000ns
+ amount of data: 160000 bytes (40000 elements), time per access: 9.100000ns
+ amount of data: 164000 bytes (41000 elements), time per access: 9.300000ns
+ amount of data: 168000 bytes (42000 elements), time per access: 9.000000ns
+ amount of data: 172000 bytes (43000 elements), time per access: 8.800000ns
+ amount of data: 176000 bytes (44000 elements), time per access: 8.900000ns
+ amount of data: 180000 bytes (45000 elements), time per access: 8.900000ns
+ amount of data: 184000 bytes (46000 elements), time per access: 8.500000ns
+ amount of data: 188000 bytes (47000 elements), time per access: 8.600000ns
+ amount of data: 192000 bytes (48000 elements), time per access: 8.500000ns
+ amount of data: 196000 bytes (49000 elements), time per access: 8.400000ns
+ amount of data: 200000 bytes (50000 elements), time per access: 9.000000ns
+ amount of data: 204000 bytes (51000 elements), time per access: 9.000000ns
+ amount of data: 208000 bytes (52000 elements), time per access: 8.300000ns
+
+ TEST 4 on a 2.4 GHz Core 2 Duo processor
+ Question 1a:
+ List is well formed: 2000 elements, last element points to first element.
+ Test with randomization:
+ time per memory accesses: 1.255570000000000ns (10000000 samples)
+ time per memory accesses: 1.257628000000000ns (50000000 samples)
+ time per memory accesses: 1.258042000000000ns (100000000 samples)
+ average time per memory access: 1.257080ns
+
+ List is well formed: 2000 elements, last element points to first element.
+ Test without randomization:
+ time per memory accesses: 1.256310000000000ns (10000000 samples)
+ time per memory accesses: 1.256774000000000ns (50000000 samples)
+ time per memory accesses: 1.257166000000000ns (100000000 samples)
+ average time per memory access: 1.256750ns
+
+ amount of data: 12000 bytes (3000 elements), time per access: 1.254370ns
+ amount of data: 16000 bytes (4000 elements), time per access: 1.254740ns
+ amount of data: 20000 bytes (5000 elements), time per access: 1.254920ns
+ amount of data: 24000 bytes (6000 elements), time per access: 1.255370ns
+ amount of data: 28000 bytes (7000 elements), time per access: 1.256040ns
+ amount of data: 32000 bytes (8000 elements), time per access: 1.263780ns
+ amount of data: 36000 bytes (9000 elements), time per access: 1.660830ns
+ amount of data: 40000 bytes (10000 elements), time per access: 2.076180ns
+ amount of data: 44000 bytes (11000 elements), time per access: 2.401060ns
+ amount of data: 48000 bytes (12000 elements), time per access: 2.644780ns
+ amount of data: 52000 bytes (13000 elements), time per access: 2.948710ns
+ amount of data: 56000 bytes (14000 elements), time per access: 3.090310ns
+ amount of data: 60000 bytes (15000 elements), time per access: 3.304530ns
+ amount of data: 64000 bytes (16000 elements), time per access: 3.457200ns
+ amount of data: 68000 bytes (17000 elements), time per access: 3.642150ns
+ amount of data: 72000 bytes (18000 elements), time per access: 3.754190ns
+ amount of data: 76000 bytes (19000 elements), time per access: 3.943920ns
+ amount of data: 80000 bytes (20000 elements), time per access: 4.039840ns
+ amount of data: 84000 bytes (21000 elements), time per access: 4.230240ns
+ amount of data: 88000 bytes (22000 elements), time per access: 4.283910ns
+ amount of data: 92000 bytes (23000 elements), time per access: 4.388550ns
+ amount of data: 96000 bytes (24000 elements), time per access: 4.492130ns
+ amount of data: 100000 bytes (25000 elements), time per access: 4.582710ns
+ amount of data: 104000 bytes (26000 elements), time per access: 4.682420ns
+ amount of data: 108000 bytes (27000 elements), time per access: 4.758610ns
+ amount of data: 112000 bytes (28000 elements), time per access: 4.794040ns
+ amount of data: 116000 bytes (29000 elements), time per access: 4.903380ns
+ amount of data: 120000 bytes (30000 elements), time per access: 4.930490ns
+ amount of data: 124000 bytes (31000 elements), time per access: 5.020640ns
+ amount of data: 128000 bytes (32000 elements), time per access: 5.061200ns
+ amount of data: 132000 bytes (33000 elements), time per access: 5.093980ns
+ amount of data: 136000 bytes (34000 elements), time per access: 5.140480ns
+ amount of data: 140000 bytes (35000 elements), time per access: 5.239320ns
+ amount of data: 144000 bytes (36000 elements), time per access: 5.234170ns
+ amount of data: 148000 bytes (37000 elements), time per access: 5.280200ns
+ amount of data: 152000 bytes (38000 elements), time per access: 5.479730ns
+ amount of data: 156000 bytes (39000 elements), time per access: 5.369050ns
+ amount of data: 160000 bytes (40000 elements), time per access: 5.394150ns
+ amount of data: 164000 bytes (41000 elements), time per access: 5.431530ns
+ amount of data: 168000 bytes (42000 elements), time per access: 5.462120ns
+ amount of data: 172000 bytes (43000 elements), time per access: 5.610370ns
+ amount of data: 176000 bytes (44000 elements), time per access: 5.537440ns
+ amount of data: 180000 bytes (45000 elements), time per access: 5.577360ns
+ amount of data: 184000 bytes (46000 elements), time per access: 5.598780ns
+ amount of data: 188000 bytes (47000 elements), time per access: 5.646900ns
+ amount of data: 192000 bytes (48000 elements), time per access: 5.651190ns
+ amount of data: 196000 bytes (49000 elements), time per access: 5.658030ns
+ amount of data: 200000 bytes (50000 elements), time per access: 5.717500ns
+ amount of data: 204000 bytes (51000 elements), time per access: 5.721160ns
+ amount of data: 208000 bytes (52000 elements), time per access: 5.784160ns
+
+ TEST 5 on a 2.4 GHz Core 2 Duo processor
+ Question 1a:
+ List is well formed: 50000 elements, last element points to first element.
+ Test with randomization:
+ time per memory accesses: 5.671050000000001ns (10000000 samples)
+ time per memory accesses: 5.669240000000000ns (50000000 samples)
+ time per memory accesses: 5.685722000000000ns (100000000 samples)
+ average time per memory access: 5.675337ns
+
+ List is well formed: 50000 elements, last element points to first element.
+ Test without randomization:
+ time per memory accesses: 1.264890000000000ns (10000000 samples)
+ time per memory accesses: 1.266108000000000ns (50000000 samples)
+ time per memory accesses: 1.262529000000000ns (100000000 samples)
+ average time per memory access: 1.264509ns
+ */
+void q1a()
+{
+	printf("Question 1a: \n\n");
+
+	int ssize = 1500; //small list size
+	int slist[100000]; //small list
+
+	double avg = 0;
+
+	setuplist(1, slist, ssize);
+	verifylist(slist, ssize);
+
+	printf("Test with randomization:\n");
+	q1a_helper(0, slist, ssize, 1000000); //WARM UP THE CACHE
+	avg=0;
+	avg += q1a_helper(1, slist, ssize, 1000000);
+	avg += q1a_helper(1, slist, ssize, 5000000);
+	avg += q1a_helper(1, slist, ssize, 10000000);
+	printf("average time per memory access: %fns\n\n", avg/3);
+
+	setuplist(0, slist, ssize);
+	verifylist(slist, ssize);
+
+	printf("Test without randomization:\n");
+	q1a_helper(0, slist, ssize, 1000000); //WARM UP THE CACHE
+	avg=0;
+	avg += q1a_helper(1, slist, ssize, 1000000);
+	avg += q1a_helper(1, slist, ssize, 5000000);
+	avg += q1a_helper(1, slist, ssize, 10000000);
+	printf("average time per memory access: %fns\n\n", avg/3);
+
+}
+
+void figureoutcachesize()
+{
+	printf("\n\nFigure out cache size (personal interest): don't worry this maxes out at 40000 elements\n\n");
+	int i;
+	int ssize = 2000;
+	int slist[100000]; //small list
+
+	for(i=0; i<38; i++)
+	{
+		ssize += 1000;
+		setuplist(1, slist, ssize);
+		q1a_helper(0, slist, ssize, 10000);
+		printf("amount of data: %d bytes (%d elements), time per access: %fns\n", ssize*4, ssize, q1a_helper(0, slist, ssize, 1000000));
+
+	}
+}
+
+/*
+ reads 400 bytes per loop iteration
+ */
+int do_reads_400(int list[], int howmany)
+{
+	int counter = 0;
+	register int *lptr = list;
+	int i;
+	register int z = 0;
+	int a=10, b=6;
+	//each loop iteration reads 400 bytes
+	for(i=0; i<howmany; i++)
+	{
+
+#define GL(x) lptr[x] +
+		z +=
+		GL(248) GL(1727) GL(221) GL(3621) GL(3773) GL(910) GL(2588) GL(3347) GL(1303) GL(610)
+		GL(3323) GL(3677) GL(73) GL(3268) GL(2760) GL(254) GL(203) GL(1085) GL(1993) GL(2166)
+		GL(1727) GL(255) GL(38) GL(2752) GL(1390) GL(3821) GL(780) GL(396) GL(628) GL(1924)
+		GL(211) GL(3630) GL(3520) GL(2910) GL(3997) GL(3573) GL(1985) GL(3846) GL(1316) GL(1216)
+		GL(3477) GL(2483) GL(30) GL(2228) GL(1173) GL(2574) GL(1443) GL(787) GL(1538) GL(639)
+		GL(2816) GL(2493) GL(1300) GL(3076) GL(3978) GL(2729) GL(175) GL(753) GL(3489) GL(3100)
+		GL(3104) GL(3472) GL(2164) GL(767) GL(2551) GL(17) GL(3887) GL(1514) GL(2960) GL(2284)
+		GL(1372) GL(1855) GL(3944) GL(428) GL(1210) GL(265) GL(2176) GL(3451) GL(904) GL(2566)
+		GL(390) GL(1212) GL(1813) GL(1621) GL(1329) GL(3078) GL(1613) GL(327) GL(359) GL(2400)
+		GL(2762) GL(479) GL(1868) GL(543) GL(2833) GL(1975) GL(54) GL(3153) GL(1107)
+		lptr[2788];
+
+		lptr+=10*sizeof(int);
+		counter++;
+		if (counter > 20)
+		{
+			counter = 0;
+			lptr = list;
+		}
+	}
+	char bla[500];
+	sprintf(bla, "%d\n", z); //prevent compiler optimization
+	return z;
+}
+
+
+/*
+ reads 200 bytes per loop iteration
+ */
+int do_reads_200(int list[], int howmany)
+{
+	int counter = 0;
+	register int *lptr = list;
+	int i;
+	register int z = 0;
+	//each loop iteration reads 200 bytes
+	for(i=0; i<howmany; i++)
+	{
+#define GL(x) lptr[x] +
+		z +=
+		GL(248) GL(1727) GL(221) GL(3621) GL(3773) GL(910) GL(2588) GL(3347) GL(1303) GL(610)
+		GL(3323) GL(3677) GL(73) GL(3268) GL(2760) GL(254) GL(203) GL(1085) GL(1993) GL(2166)
+		GL(1727) GL(255) GL(38) GL(2752) GL(1390) GL(3821) GL(780) GL(396) GL(628) GL(1924)
+		GL(211) GL(3630) GL(3520) GL(2910) GL(3997) GL(3573) GL(1985) GL(3846) GL(1316) GL(1216)
+		GL(2762) GL(479) GL(1868) GL(543) GL(2833) GL(1975) GL(54) GL(3153) GL(1107)
+		lptr[2788];
+		lptr+=10*sizeof(int);
+		counter++;
+		if (counter > 20)
+		{
+			counter = 0;
+			lptr = list;
+		}
+	}
+	char bla[500];
+	sprintf(bla, "%d\n", z); //prevent compiler optimization
+	return z;
+}
+
+
+
+double measure_bw(int slist[], int numcycles)
+{
+	clock_t t0, t1;
+	double time_200s, time_400s, bw, timediff;
+	int i;
+
+	t0 = clock();
+	do_reads_200(slist,numcycles);
+	t1 = clock();
+	time_200s = (double) (t1-t0)/CLOCKS_PER_SEC;
+	printf("time_200s: %f\n", time_200s);
+	bw = 50.0*sizeof(int)/1024.0/1024.0*numcycles/time_200s;
+	printf("bandwidth based on this test (no overhead correction): %f megabytes/sec\n", bw);
+
+	t0 = clock();
+	do_reads_400(slist,numcycles);
+	t1 = clock();
+	time_400s = (double) (t1-t0)/CLOCKS_PER_SEC;
+	printf("time_400s: %f\n", time_400s);
+	bw = 100*sizeof(int)/1024.0/1024.0*numcycles/time_400s;
+	printf("bandwidth based on this test (no overhead correction): %f megabytes/sec\n", bw);
+
+	timediff = time_400s - time_200s;
+	printf("TIME: %f\n", timediff);
+
+	return timediff;
+}
+
+/*
+ * Bandwidth was measured by creating an array of integers and reading the values in a pipelined fashion.
+ * The time taken to read a specific number of values is measured and used to calculate the bandwidth.
+ *
+ * Results:
+ *
+ * Question 1b:
+ time_200s: 0.190000
+ bandwidth based on this test (no overhead correction): 2007.735403 megabytes/sec
+ time_400s: 0.370000
+ bandwidth based on this test (no overhead correction): 2061.998522 megabytes/sec
+ TIME: 0.180000
+ L1 Cache bandwidth: 2119.276259 megabytes/second
+ L1 Cache bandwidth: 2170138.888889 kilobytes/second
+ L1 Cache bandwidth: 2222222222.222222 bytes/second
+ *
+ */
+void q1b()
+{
+	printf("\n\nQuestion 1b: \n\n");
+
+	int ssize = 5000; //small list size
+	int slist[ssize]; //small list
+
+	int i;
+	for(i=0; i<ssize; i++)
+	{
+		slist[i] = rand();
+	}
+
+	int numcycles = 2000000;
+	double timediff = measure_bw(slist, numcycles);
+
+	double megabytespersec = (double) (numcycles*(200.0/1048576.0)) / timediff;
+	printf("L1 Cache bandwidth: %f megabytes/second\n", megabytespersec);
+
+	double kilobytespersec = (double) (numcycles*(200.0/1024.0)) / timediff;
+	printf("L1 Cache bandwidth: %f kilobytes/second\n", kilobytespersec);
+
+	double bytespersec = (double) (numcycles*200.0) / timediff;
+	printf("L1 Cache bandwidth: %f bytes/second\n", bytespersec);
+}
+
+
+#define NUMTHREADS 2
+#define NUMITERPERTHREAD 1000000
+//int record[NUMITERPERTHREAD*2]; //FOR TESTING
+pthread_mutex_t lock0 = PTHREAD_MUTEX_INITIALIZER;
+int flag = 0;
+//int globalcount = 0;//FOR TESTING
+
+void q1c_thread(void *threadid)
+{
+	int tid = *((int*)threadid);
+	int othertid = (tid == 0) ? 1 : 0;
+	int i = 0;
+
+	while(1)
+	{
+		if (flag == tid)
+		{
+			pthread_mutex_lock(&lock0);
+			flag = othertid;
+			//printf("Thread %d running\n", tid);//FOR TESTING
+			//record[globalcount] = tid;//FOR TESTING
+			pthread_mutex_unlock(&lock0);
+			//globalcount++;//FOR TESTING
+			if (++i>=NUMITERPERTHREAD) break;
+		}
+	}
+
+	pthread_exit(NULL);
+}
+
+/*
+ * In order to measure the time to transfer a pthreads lock from one thread to another, we create
+ * two threads that take turns grabbing a lock.  We then time how long it takes to transfer the lock
+ * 2 million times, and use that time measurement to calculate the time it takes to transfer a single
+ * lock etween two threads in nanoseconds.
+ *
+ * The threads get the lock from each other (and never from themselves) thanks to a flag variable:
+ *
+ *  Each thread sits in a tight loop checking wether flag=mythreadid.  It only acquires the lock if
+ *  this is true.  Once the lock is acquired, the thread changes flag to the TID of the other thread,
+ *  thereby enabling the other thread to grab the lock and change the flag back.
+ *
+ *  This tight polling is obviously not efficient and would not be used in a real program.  However,
+ *  since each thread runs on a separate core, the wasted cycles should not interfere with our results
+ *  in a significant way and they enable us to transfer the lock as soon as possible in order to
+ *  measure the quantity we are trying to measure.
+ *
+ *  Multiple runs of the test on Galiano yield the following results:
+ *
+ Question 1c:
+ Time taken: 3.470000s
+ Time to transfer a pthreads lock from one thread to another: 1735.000000ns
+ Time to transfer a pthreads lock from one thread to another: 1745.000000ns
+ Time to transfer a pthreads lock from one thread to another: 1730.000000ns
+
+ It is interesting to note that acquiring a pthreads lock takes an enormous amount of time compared
+ to a simple operation like a load from the L1 cache.  This should be taken into account when
+ considering the overhead of a parallel program which uses pthreads compared to a sequential
+ implementation.
+ */
+void q1c ( void )
+{
+	printf("\n\nQuestion 1c: \n\n");
+
+	pthread_t thread0, thread1;
+	clock_t t0, t1;
+
+	int name0 = 0;
+	int name1 = 1;
+
+	t0 = clock();
+	pthread_create(&thread0, NULL, (void *)q1c_thread, (void *)&name0);
+	pthread_create(&thread1, NULL, (void *)q1c_thread, (void *)&name1);
+	pthread_join(thread0, NULL);
+	pthread_join(thread1, NULL);
+	t1 = clock();
+
+	double time = (double) (t1-t0)/CLOCKS_PER_SEC;
+	printf("Time taken: %fs\n", time);
+
+	double tpt = (time * 1.0e9)/(2*NUMITERPERTHREAD);
+	printf("Time to transfer a pthreads lock from one thread to another: %fns\n", tpt);
+
+	//int i;//FOR TESTING
+	//	for(i=0;i<NUMITERPERTHREAD; i++)//FOR TESTING
+	//	{//FOR TESTING
+	//		printf("Record: %d\n",record[i]);//FOR TESTING
+	//	}//FOR TESTING
+
+}
+
+pthread_mutex_t lockd = PTHREAD_MUTEX_INITIALIZER;
+pthread_cond_t condvard = PTHREAD_COND_INITIALIZER;
+#define NUMITERPERTHREADD 90000
+
+void q1d_thread(void *threadid)
+{
+	int tid = *((int*)threadid);
+	int othertid = (tid == 0) ? 1 : 0;
+
+	printf("Thread %d started\n", tid);
+
+	int i;
+	for(i=0; i< NUMITERPERTHREADD; i++)
+	{
+		pthread_mutex_lock(&lockd);
+		//		printf("thread %d has the lock\n", tid);
+		if (flag == othertid)
+		{
+			pthread_cond_wait(&condvard, &lockd);
+			//			printf("%d stopped waiting\n", tid);
+		}
+		flag = othertid;
+		pthread_cond_signal(&condvard);
+		//		printf("%d just sent a signal out\n", tid);
+		pthread_mutex_unlock(&lockd);
+	}
+
+	pthread_exit(NULL);
+}
+
+/*
+ * The time to wake up a thread using a pthreads signal was measured by setting up
+ * two threads that take turns waiting for and holding a lock.  When a thread
+ * is running, its last action is to wake up the other thread by sending it a signal
+ * before going to sleep itself.  We time how long it takes to do 180000 signal
+ * wakeups and use that information to calculate the number of nanoseconds per signal.
+ *
+ * Results are as follows:
+ *
+ Question 1d:
+ Thread 0 started
+ Thread 1 started
+ Time taken: 4.100000s
+ Time to wake up a thread using a pthreads signal: 22777.777778ns
+ Time to wake up a thread using a pthreads signal: 22611.111111ns
+ Time to wake up a thread using a pthreads signal: 21944.444444ns
+
+ It is worth noting that this is an order of magnitude slower than the performance we
+ achieved with polling in part c.
+ *
+ */
+void q1d ( void )
+{
+	printf("\n\nQuestion 1d: \n\n");
+
+	flag = 0;
+	pthread_t thread0, thread1;
+	clock_t t0, t1;
+
+	int name0 = 0;
+	int name1 = 1;
+
+	t0 = clock();
+	pthread_create(&thread0, NULL, (void *)q1d_thread, (void *)&name0);
+	pthread_create(&thread1, NULL, (void *)q1d_thread, (void *)&name1);
+	pthread_join(thread0, NULL);
+	pthread_join(thread1, NULL);
+	t1 = clock();
+
+	double time = (double) (t1-t0)/CLOCKS_PER_SEC;
+	printf("Time taken: %fs\n", time);
+
+	double t = (time * 1.0e9)/(2*NUMITERPERTHREADD);
+	printf("Time to wake up a thread using a pthreads signal: %fns\n", t);
+}
+
+/*
+ *  In this test, we are going to measure the throughput (bytes/second) when reading data
+ *  that was written by a thread running on a different processor.
+ *
+ *  This is a tricky question, because the performance will vary depending on the method chosen
+ *  to copy the data.  For example, it could be argued that no data needs to be copied: all we
+ *  need to do is transfer a pointer to a piece of data from one thread to another.  This test
+ *  will assume that we want a producer thread to transfer the data to a consumer thread through
+ *  an intermediate buffer.
+ *
+ *  We will experiment with different buffer sizes to see which yields the best performance.
+ *
+ *  The goal will be to transfer 500 megabytes of data between the threads.
+ *
+ Question 1e:
+
+ producer done
+ consumer received 524288000 bytes (500.0 megabytes): done.
+ Time taken: 2.000000s
+ Thread reading throughput: 250.000000 megabytes/second
+ Thread reading throughput:  262144000.000000 bytes/second
+ galiano:~/cs448b/hw3>
+ *
+ */
+char * data;
+char * buf0;
+char * buf1;
+pthread_mutex_t locke = PTHREAD_MUTEX_INITIALIZER;
+pthread_cond_t conde = PTHREAD_COND_INITIALIZER;
+int current_thread; //0=producer, 1=consumer
+
+char *buf;
+int buf_bytes_used;
+
+void q1eproducer(  int * );
+void q1econsumer(  int * );
+
+#define BYTESOFDATA 500*1024*1024
+char *data, *received_data;
+int chunksize;
+
+void q1e ( void )
+{
+	printf("\n\nQuestion 1e:\n\n");
+
+	data = malloc(BYTESOFDATA); //allocate 500 megabytes of memory
+	received_data = malloc(BYTESOFDATA); //allocate 500 megabytes of memory
+	if (data == NULL || received_data == NULL)
+	{
+		printf ("Malloc error! Hasta la vista.\n");
+		return;
+	}
+
+	chunksize = 10*1024*1024;
+
+	buf = malloc(500*1024*1024);
+	current_thread = 0;
+
+	pthread_t producer, consumer;
+	clock_t t0, t1;
+
+	t0 = clock();
+	pthread_create(&producer, NULL, (void *)q1eproducer, &chunksize);
+	pthread_create(&consumer, NULL, (void *)q1econsumer, &chunksize);
+	pthread_join(producer, NULL);
+	pthread_join(consumer, NULL);
+	t1 = clock();
+
+	double time = (double) (t1-t0)/CLOCKS_PER_SEC;
+	printf("Time taken: %fs\n", time);
+
+	double megabytespersec = (double) BYTESOFDATA/1024/1024/time;
+	printf("Thread reading throughput: %f megabytes/second\n", megabytespersec);
+
+	double bytespersec = (double) BYTESOFDATA/time;
+	printf("Thread reading throughput:  %f bytes/second\n", bytespersec);
+
+}
+
+void q1eproducer( int * chunksize )
+{
+	long int bytesremaining = BYTESOFDATA;
+	char * dptr = data;
+
+	while(bytesremaining > 0)
+	{
+		pthread_mutex_lock(&locke);
+		while (current_thread == 1) pthread_cond_wait(&conde, &locke);
+
+		if (bytesremaining >= *chunksize) buf_bytes_used = *chunksize;
+		else buf_bytes_used = bytesremaining;
+		bytesremaining -= buf_bytes_used;
+		memcpy( buf, dptr, buf_bytes_used );
+		dptr += buf_bytes_used;
+
+		pthread_cond_signal(&conde);
+		pthread_mutex_unlock(&locke);
+	}
+	printf("producer done\n");
+	pthread_exit(NULL);
+}
+
+void q1econsumer(  int * chunksize  )
+{
+	char *rdptr = received_data;
+	long int bytes_received = 0;
+
+	while(bytes_received < BYTESOFDATA)
+	{
+		pthread_mutex_lock(&locke);
+		while (current_thread == 10) pthread_cond_wait(&conde, &locke);
+		memcpy( rdptr, buf, buf_bytes_used );
+		rdptr += buf_bytes_used;
+		bytes_received += buf_bytes_used;
+		//		printf("consumer copied %d bytes into received_data\n", cons_buf_bytes_used);
+		pthread_cond_signal(&conde);
+		pthread_mutex_unlock(&locke);
+	}
+	printf("consumer received %d bytes (%.1f megabytes): done.\n", bytes_received, (double) bytes_received/1024/1024);
+	pthread_exit(NULL);
+}
+
+main()
+{
+	q1a();
+	figureoutcachesize();
+	q1b();
+	q1c();
+	q1d();
+	q1e();
+}
+
+#include "mpi.h"
+#include <stdio.h>
+#include <stdlib.h>
+
+#define ROOTPID 0
+
+struct {
+	double currentExpansion;
+	int   N;
+} out, in;
+
+double timeval_diff( struct timeval *t1, struct timeval *t0 )
+{
+	return ((t1->tv_sec - t0->tv_sec) + (t1->tv_usec - t0->tv_usec)*1.0e-6);
+}
+
+/*
+ * Note: there may be an opportunity to perform memoization inside this function.
+ * I
+ */
+double maxExpand(long long int N)
+{
+	long long int ai = N;
+	long long int largestai = N;
+	while (ai != 1)
+	{
+		if (ai%2 == 0) ai /= 2;
+		else ai = 3 * ai + 1;
+		if (ai > largestai) largestai = ai;
+	}
+	return (double) largestai/N;
+}
+
+int main(int argc, char **argv) {
+
+	int M = atoi(argv[1]);
+	if (M < 1)
+	{
+		printf("error: argument must be an integer greater than 0.  hasta luego.\n");
+		exit(1);
+	}
+
+	int numProcs, myID;
+
+	MPI_Init(&argc, &argv);
+	MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
+	MPI_Comm_rank(MPI_COMM_WORLD, &myID);
+
+	char myHost[100];
+	gethostname(myHost, 100);
+
+	struct timeval t0, t1;
+	if(myID == ROOTPID) {
+		printf("Root (id: %d), is running on %s\n---> Finding the value of N between 1 and %d that maximizes expansion(N)\n", myID, myHost, M);
+		gettimeofday(&t0, NULL);
+	}
+
+	long long int i;
+	long long int i2 = myID + 1;
+	double currentExpansion;
+
+	out.currentExpansion = 1;
+	out.N = 1;
+
+//	printf("Worker %d (%s) is responsible for every %dth number starting with %d\n", myID, myHost, numProcs, i2);
+
+	for (i=i2; i<= M; i++)
+	{
+		if (i == i2)
+		{
+			currentExpansion = maxExpand(i);
+			if ( currentExpansion > out.currentExpansion)
+			{
+				out.currentExpansion = currentExpansion;
+				out.N = i;
+			}
+			i2 += numProcs;
+		}
+	}
+	printf("Worker %d (%s): the largest expansion(N) for N=%d that I have seen is %f\n", myID, myHost, out.N, out.currentExpansion);
+
+	MPI_Reduce(&out, &in, 1, MPI_DOUBLE_INT, MPI_MAXLOC, ROOTPID, MPI_COMM_WORLD);
+
+	if (myID == ROOTPID)
+	{
+		printf("The global max is: %f when N=%d\n", in.currentExpansion, in.N);
+		printf("{%d,%f}\n", in.N, in.currentExpansion);
+		gettimeofday(&t1, NULL);
+		printf("Time taken: %fs\n", timeval_diff(&t1, &t0));
+	}
+
+	MPI_Finalize();
+	return (0);
+}

graphics/.DS_Store

Binary file added.

graphics/Makefile

+#makefile for CPSC 314 project 2
+
+LIBLINUX = -lGL -lGLU -lglut
+LIBMAC = -framework OpenGL -framework GLUT
+
+OBJ = p2.o Vec3f.o
+default: p2
+
+run:
+	p2
+
+p2: $(OBJ)
+	g++ $(LIBMAC) $(OBJ) -o p2
+	
+p2.o: p2.cpp defs.h
+
+Vec3f.o: Vec3f.cpp Vec3f.h
+	g++ -c Vec3f.cpp
+
+clean:
+	rm -f *.o *~* p2
+	

graphics/Vec3f.cpp

+// Vector.cpp: implementation of the Vector class.
+//
+//////////////////////////////////////////////////////////////////////
+
+//#include "stdafx.h"
+#include "Vec3f.h"
+
+//////////////////////////////////////////////////////////////////////
+// Construction/Destruction
+//////////////////////////////////////////////////////////////////////
+
+
+
+Vec3f::~Vec3f()
+{
+
+}
+// Vector.h: interface for the Vector class.
+//
+//////////////////////////////////////////////////////////////////////
+
+#ifndef VEC_3F_H
+#define VEC_3F_H
+
+//#if _MSC_VER > 1000
+//#pragma once
+//#endif // _MSC_VER > 1000
+#include <math.h>
+class Vec3f {
+public:
+    float coords[3];
+    //   float x;
+    // float y;
+    // float z;
+    float* getCoords() {return coords;}  //const
+    const  Vec3f & operator=(const  Vec3f & rhs) {
+        if (this != &rhs) {
+            for (int i = 0; i < 3;i++) {
+                coords[i] = rhs.coords[i];
+            }
+        }
+        return *this;
+    }
+    float& operator[](int i) {return coords[i];} // const
+    Vec3f( const float * cs)
+    { coords[0] = cs[0];  coords[1] = cs[1]; coords[2]= cs[2]; }
+    Vec3f( const float _x = 0.0f,
+            const float _y = 0.0f,
+            const float _z = 0.0f )
+    { coords[0] = _x;  coords[1] = _y;   coords[2] = _z; }
+    Vec3f( const Vec3f &vec )
+    { coords[0] = vec.coords[0];  coords[1]= vec.coords[1];  coords[2] = vec.coords[2]; }
+    
+    virtual ~Vec3f();
+    // Vector Addition
+    Vec3f operator+( const Vec3f & vec ) const  // returns this + vec
+    { return Vec3f(coords[0]+vec.coords[0], coords[1]+vec.coords[1], coords[2]+vec.coords[2]); }
+    Vec3f &operator+=( const Vec3f & vec )      // this += vec
+    { coords[0]+=vec.coords[0]; coords[1]+=vec.coords[1]; coords[2]+=vec.coords[2];
+      return *this; }
+    
+    // Vector Subtraction
+    Vec3f operator-() const                     // returns -this
+    { return Vec3f(-coords[0], -coords[1], -coords[2]); }
+    Vec3f operator-( const Vec3f & vec ) const  // returns this - vec
+    { return Vec3f(coords[0]-vec.coords[0], coords[1]-vec.coords[1], coords[2]-vec.coords[2]); }
+    Vec3f &operator-=( const Vec3f & vec )      // this -= vec
+    { coords[0]-=vec.coords[0]; coords[1]-=vec.coords[1]; coords[2]-=vec.coords[2];
+      return *this; }
+    
+    // Scaling Multiplication
+    Vec3f operator*( const float f ) const      // returns this * f
+    { return Vec3f(coords[0]*f, coords[1]*f, coords[2]*f); }
+    friend Vec3f operator*( const float f, const Vec3f &vec )  // returns f * this
+    { return Vec3f(vec.coords[0]*f, vec.coords[1]*f, vec.coords[2]*f); }
+    Vec3f &operator*=( const float f )          // this *= f
+    { coords[0]*=f; coords[1]*=f; coords[2]*=f;
+      return *this; }
+    
+    // Scaling Division
+    Vec3f operator/( const float f ) const      // returns this / f
+    { return Vec3f(coords[0]/f, coords[1]/f, coords[2]/f); }
+    Vec3f &operator/=( const float f )          // this /= f
+    { coords[0]/=f; coords[1]/=f; coords[2]/=f;
+      return *this; }
+    
+    // Vector Multiplication
+    Vec3f operator*( const Vec3f & vec ) const  // !WARNING (MEMBERWISE): returns this.? * vec.?
+    { return Vec3f(coords[0]*vec.coords[0], coords[1]*vec.coords[1], coords[2]*vec.coords[2]); }
+    Vec3f &operator*=( const Vec3f & vec )      // !WARNING (MEMBERWISE): this.? *= vec.?
+    { coords[0]*=vec.coords[0]; coords[1]*=vec.coords[1]; coords[2]*=vec.coords[2];
+      return *this; }
+    
+    // Vector Division
+    Vec3f operator/( const Vec3f & vec ) const  // !WARNING (MEMBERWISE): returns this.? / vec.?
+    { return Vec3f(coords[0]/vec.coords[0], coords[1]/vec.coords[1], coords[2]/vec.coords[2]); }
+    Vec3f &operator/=( const Vec3f & vec )      // !WARNING (MEMBERWISE): this.? /= vec.?
+    { coords[0]/=vec.coords[0]; coords[1]/=vec.coords[1]; coords[2]/=vec.coords[2];
+      return *this; }
+    
+    // Dot Product
+    float dot( const Vec3f & vec ) const        // returns dot value of this and vec
+    { return coords[0]*vec.coords[0] + coords[1]*vec.coords[1] + coords[2]*vec.coords[2]; }
+    
+    // Cross Product
+    Vec3f cross( const Vec3f & vec ) const      // returns cross product of this and vec
+    { return Vec3f(coords[1]*vec.coords[2] - coords[2]*vec.coords[1],
+              coords[2]*vec.coords[0] - coords[0]*vec.coords[2],
+              coords[0]*vec.coords[1]- coords[1]*vec.coords[0]); }
+    Vec3f &cross_equals( const Vec3f & vec )    // this = this cross vec
+    { float _x = coords[1]*vec.coords[2] - coords[2]*vec.coords[1];
+      float _y = coords[2]*vec.coords[0] - coords[0]*vec.coords[2];
+      float _z = coords[0]*vec.coords[1] - coords[1]*vec.coords[0];
+      coords[0] = _x;  coords[1] = _y;  coords[2] = _z;
+      return *this; }
+    
+    // Length
+    float Length() const                // returns length
+    { return (float)sqrt(coords[0]*coords[0] + coords[1]*coords[1] + coords[2]*coords[2]); }
+    float length_square() const         // returns length squared
+    { return (coords[0]*coords[0] + coords[1]*coords[1] + coords[2]*coords[2]); }
+    
+    // Normalize
+    Vec3f &Normalize()                  // normalizes this
+    { float f=(float)sqrt(coords[0]*coords[0] + coords[1]*coords[1] + coords[2]*coords[2]);
+      if (f > .00001) {
+          coords[0]/=f; coords[1]/=f; coords[2]/=f;}
+      return *this; }
+    Vec3f norm_value() const            // returns normalized value, does not modify this
+    { float f=Length();
+      return Vec3f(coords[0]/f, coords[1]/f, coords[2]/f); }
+    
+    
+};
+
+#endif // !defined(AFX_VECTOR_H__C80EF4CA_9FBE_41E3_B109_F13F0DCC980C__INCLUDED_)
+// CPSC 314 Vjan2007
+// Project 2 Sample Code
+
+//defs.h
+//definitions and constants
+
+#ifndef DEFS_H
+#define DEFS_H
+
+//#include "stdafx.h"
+#include "Vec3f.h"
+#include <GLUT/glut.h>
+//#include <GLUT/glut.h>
+#include <iostream>
+#include <vector>
+#include <stdio.h>
+
+
+#define PI 3,1415926535
+
+struct ProgramState {
+    bool shading;
+    int cameraMode;
+    
+    int windowWidth;
+    int windowHeight;
+};
+
+//class that stores the current state of speed
+class SpeedState {
+public:
+    double tincrement, aincrement;
+    SpeedState(double tincrement, double aincrement) {
+        this->tincrement = tincrement;
+        this->aincrement = aincrement;
+    }
+    void IncreaseSpeed(void) {
+        tincrement += 0.05;
+        aincrement += 0.5;
+        PrintState();
+    }
+    void DecreaseSpeed(void) {
+        tincrement -= 0.05;
+        aincrement -= 0.5;
+        PrintState();
+    }
+    void PrintState(void) {
+        printf("Increment size: %.2f\n", tincrement);
+        printf("Angular increment size: %.2f\n", aincrement);
+    }
+};
+
+//class that stores the state of x, y and z variables during translation or rotation
+class TripleState {
+public:
+    double ox, oy, oz; //original values
+    double x, y, z;
+    TripleState(double x, double y, double z) {
+        this->ox = x;
+        this->oy = y;
+        this->oz = z;
+        this->x = x;
+        this->y = y;
+        this->z = z;
+    }
+    void Reset() {
+        x = ox;
+        y = oy;
+        z = oz;
+    }
+};
+
+//class that stores the state of x, y and z variables during translation or rotation
+class MouseState {
+public:
+    //start position
+    int state, sx, sy, ex, ey;
+    MouseState(int sx, int sy, int ex, int ey) {
+        this->sx = sx;
+        this->sy = sy;
+        this->ex = ex;
+        this->ey = ey;
+        state = 1; //1 = "mouse is up", 0="mouse is down"
+    }
+    void PrintState(void){
+//        Vec3f vx = Vec3f(ex - sx, 0, 0);
+//        Vec3f vy = Vec3f(0, ex - sy, 0);
+//        printf("sx: %d, sy: %d, ex: %d, ey: %d\n", sx, sy, ex, ey);
+//        printf("xlength: %d\nylength: %d\n\n", ex-sx, sy-ey);
+    }
+    int GetXLength() {
+        return ex-sx;
+    }
+    int GetYLength() {
+        return sy-ey;
+    }
+};
+
+#endif

graphics/mouse.cpp

+#include <GLUT/glut.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <iostream>
+
+int dumpPPM(int frameNum);
+unsigned char camera = 'r';
+
+int iCount = 0;       // used for numbering the PPM image files
+int Width = 400;      // window width (pixels)
+int Height = 400;     // window height (pixels)
+bool Dump=false;      // flag set to true when dumping animation frames
+
+int animation_system_started_up = 0; //set to 1 once animate() has been called for the first time
+int animation_enabled = 0;
+
+//class that stores the current state of an object
+class ObjectState  
+{
+private:
+    int open_angle;
+    int closed_angle;
+    int opening_direction_sign; // +1 or -1
+    int current_angle;
+    int current_direction; //1 opening, -1 closing
+    
+    //jump to the end of the animation sequence - called when the animation
+    //is turned off and we revert to jumpcut
+    void jumpToEnd()
+    {
+        current_angle = (current_direction == -1) ? closed_angle : open_angle;
+    }
+
+public:
+    ObjectState(int closed_angle_in, int open_angle_in, int opening_direction_sign_in)
+    {
+        closed_angle = closed_angle_in;
+        open_angle = open_angle_in;       
+        opening_direction_sign = opening_direction_sign_in;
+        current_angle = closed_angle;
+        current_direction = -1;
+    }
+    
+    //switches the direction that we are going in
+    void toggleDirection()
+    {
+        current_direction = (current_direction == 1) ? -1 : 1;
+    }
+    
+    //return the current angle of the object
+    int getCurrentAngle()
+    {
+        return current_angle;
+    }
+    
+    //called at regular intervals by glutTimerFuc to advance the position of the object
+    //returns 0 if nothing was updated (redraw not required)
+    //returns 1 if something updated (redraw required)
+    int tick()
+    {
+        if (current_direction == 1) //we are opening
+        {
+            if (current_angle != open_angle)
+            {
+                current_angle += opening_direction_sign;
+                if (animation_enabled == 0) jumpToEnd();
+                return 1;
+            }
+        } 
+        else //we are closing
+        {
+            if (current_angle != closed_angle)
+            {
+                current_angle += (-1)*opening_direction_sign;
+                if (animation_enabled == 0) jumpToEnd();
+                return 1;
+            }
+        }
+        return 0;
+    }
+};
+
+//set up state tracking for joints 
+ObjectState os_head(0, -30, -1);
+ObjectState os_whiskers(13, 25, 1);
+ObjectState os_leg_rearright(-50, 15, 1);
+ObjectState os_leg_rearleft(-50, 15, 1);
+ObjectState os_leg_frontright(-50, 15, 1);
+ObjectState os_leg_frontleft(-50, 15, 1);
+ObjectState os_situp(0, 45, 1);
+ObjectState os_tail(-20, -45, -1);
+
+//executes every 50 milliseconds to update the angles on various body parts
+//and redraw if necessary
+void animate(int value)
+{    
+    int redraw_required = os_head.tick() + 
+    os_whiskers.tick() +
+    os_leg_rearright.tick() +
+    os_leg_rearleft.tick() +
+    os_leg_frontright.tick() +
+    os_leg_frontleft.tick() +
+    os_situp.tick() +
+    os_tail.tick();
+    
+    if (redraw_required > 0)
+    {
+        glutPostRedisplay();
+    }
+    
+    glutTimerFunc(50, animate, 0);
+}
+
+//Toggles between animation and jumpcutmode
+void toggleAnimation()
+{
+    animation_enabled = (animation_enabled == 0) ? 1:0;
+}
+
+//draws a unit cube
+void drawCube()
+{
+    glutSolidCube(1);
+}
+
+//draws a single ear
+void drawEar()
+{
+    glPushMatrix();
+    glScalef(0.06, 0.39, 0.39);
+    glRotatef(45, 1, 0, 0);
+    drawCube();
+    glPopMatrix();
+}
+
+//draws a single eye
+void drawEye()
+{
+    glPushMatrix();
+    glColor3f(1.0, 1.0, 0.0);
+    glScalef(0.05, 0.15, 0.15);
+    drawCube();
+    glScalef(1.25, 0.5, 0.5);
+    glColor3f(0, 0, 0);
+    drawCube();
+    glPopMatrix();
+}
+
+//draws a single whisker
+//angle denotes the angle between the middle whisker and the whisker in question
+//positive angles are above, negative angles are below
+//the side is set to -1 for left, 1 for right: this is necessary for the rotation
+void drawWhisker(int angle, int side)
+{
+    glPushMatrix();
+    glTranslatef(0.25+0.1, -0.1, 0);
+    glTranslatef(0,0,side*0.15);
+    glRotatef(-angle, 1, 0, 0);
+    glTranslatef(0,0,-side*0.15);
+    glScalef(0.035,0.035,0.3);    
+    drawCube();
+    glPopMatrix();
+}
+
+//draws a single leg
+void drawLeg(ObjectState os)
+{
+  glColor3f( 0.4, 0.4, 0.4 );
+  glRotatef(os.getCurrentAngle(), 0, 0, 1);
+  
+  glPushMatrix();
+  glScalef(0.135, 0.275, 0.12);
+  drawCube(); //upper leg
+  glPopMatrix();
+  
+  glTranslatef(0.0625+0.1-0.03, -0.1375-0.03, 0);
+  glPushMatrix();
+  glScalef(0.2, 0.1, 0.12);
+  drawCube(); //lower leg
+  glPopMatrix();
+  
+  glColor3f(0.7, 0.7, 0.7);
+  glTranslatef(0.1+0.1, -0.08, 0);
+  glTranslatef(-0.135, 0, 0);//join paw to lower leg
+  glRotatef(50, 0, 0, 1);
+  glTranslatef(0.1,0,0);
+  glScalef(0.2, 0.05, 0.17);                                        
+  drawCube(); //paw
+}
+
+//draws a single segment in a tail
+void drawTailSegment()
+{
+    glTranslatef(-0.4, 0, 0);
+    glScalef(1, 0.7, 0.7); 
+    glTranslatef(0.2,0,0);
+    
+    glRotatef(os_tail.getCurrentAngle(), 0, 0, 1);
+    glTranslatef(-0.2,0,0);
+    glPushMatrix();
+    glScalef(0.4,0.13,0.13);
+    drawCube();
+    glPopMatrix();
+   
+}
+
+void keyboardCallback(unsigned char c, int x, int y) {
+  switch (c) {
+  case ' ':
+    toggleAnimation();  
+    break;
+  case 'h':
+    os_head.toggleDirection();
+    break;  
+  case 'w':
+    os_whiskers.toggleDirection();
+    break;
+  case 'o':
+    os_leg_rearright.toggleDirection();
+    break;
+  case 'n':
+    os_leg_rearleft.toggleDirection();
+    break;
+  case 'm':
+    os_leg_frontright.toggleDirection();
+    break;
+  case 'l':
+    os_leg_frontleft.toggleDirection();
+    break;
+  case 's':
+    os_situp.toggleDirection();
+    break;
+  case 't':
+    os_tail.toggleDirection();
+    break;
+  case 'q':
+    exit (0);
+    break;
+  case 'p':
+    camera = 'p';
+    break;
+  case 'f':
+    camera = 'f';
+    break;
+  case 'b':
+    camera = 'b';
+    break;
+  case 'a':
+    camera = 'a';
+    break;
+  case 'u':
+    camera = 'u';
+    break;
+  case 'r':
+    camera = 'r';
+    break;
+  case 'i':
+    dumpPPM(iCount);
+    iCount++;
+    break;
+  case 'd':               // dump animation PPM frames
+    iCount = 0;         // image file count
+    Dump = !Dump;
+    break;
+  }
+  glutPostRedisplay();
+}
+
+void reshapeCallback(int w, int h)
+{
+   Width = w;
+   Height = h;
+   glViewport(0, 0, w, h);
+}
+
+void displayCallback()
+{   
+  if (animation_system_started_up == 0)
+  {
+      animation_system_started_up = 1;
+      animate(0);
+  }
+  // clear the color buffer
+  glClear( GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT );
+
+  // set up camera
+  glMatrixMode( GL_PROJECTION );
+  glLoadIdentity();
+  gluPerspective( 45, 1.0, 0.1, 200.0 );
+
+  glMatrixMode( GL_MODELVIEW );
+  glLoadIdentity();
+  glTranslatef( 0.0, 0.0, -5.0 );
+  switch (camera) {
+  case 'p':
+    glRotatef( 180, 0.0, 1.0, 0.0 );
+    break;
+  case 'f':
+    glRotatef( -90, 0.0, 1.0, 0.0 );
+    break;
+  case 'b':
+    glRotatef( 90, 0.0, 1.0, 0.0 );
+    break;
+  case 'a':
+    glRotatef( 90, 1.0, 0.0, 0.0 );
+    break;
+  case 'u':
+    glRotatef( -90, 1.0, 0.0, 0.0 );
+    break;
+  case 'r':
+    break;
+  }
+
+  //create rear legs
+  glPushMatrix();   
+  glTranslatef(-0.35+0.1, -0.3+0.06, 0.3);
+  drawLeg(os_leg_rearright);
+  glPopMatrix();
+  glPushMatrix();   
+  glTranslatef(-0.35+0.1, -0.3+0.06, -0.3);
+  drawLeg(os_leg_rearleft);
+  glPopMatrix();
+  
+  glTranslatef(-0.35+0.1,-0.3+0.06,0);
+  glRotatef(os_situp.getCurrentAngle(), 0, 0, 1); //sit up
+  glTranslatef(0.35-0.1,0.3-0.06,0);
+  
+   //create aft body
+  glColor3f(0.2, 0.2, 0.2);
+  glPushMatrix();
+  glScalef(0.7, 0.6, 0.6);
+  drawCube();
+  glPopMatrix();
+  
+  //create front legs
+  glPushMatrix();   
+  glTranslatef(0.35+0.34, -0.3+0.06, 0.3-0.02);
+  drawLeg(os_leg_frontright);
+  glPopMatrix();
+  glPushMatrix();
+  glTranslatef(0.35+0.34, -0.3+0.06, -0.3+0.02);
+  drawLeg(os_leg_frontleft);
+  glPopMatrix();
+  
+  //create tail
+  glColor3f( 0.7, 0.7, 0.7 );
+  glPushMatrix();
+  glTranslatef(-0.35-0.20, 0, 0);
+  glPushMatrix();
+  glScalef(0.4,0.13,0.13);
+  drawCube();
+  glPopMatrix();
+  drawTailSegment();
+  drawTailSegment();
+  drawTailSegment();
+  glPopMatrix();
+  
+  //create fore body
+  glColor3f(0.2, 0.2, 0.2);
+  glTranslatef(0.35+0.25,0,0);
+  glPushMatrix();
+  glScalef(0.5, 0.5, 0.46);
+  drawCube();
+  glPopMatrix();
+  
+  //create head
+  glColor3f( 0.4, 0.4, 0.4 );
+  glTranslatef(0.25+0.25-0.1,0.38,0);
+  glTranslatef(-0.25,-0.25,0);//move origin back and to the bottom for head nod
+  
+  //keep head level while sitting up
+  glPushMatrix();
+  glRotatef((-1)*os_situp.getCurrentAngle(), 0, 0, 1); //sit up
+  
+  glRotatef(os_head.getCurrentAngle(), 0, 0, 1); //head nod
+  glTranslatef(0.25,0.25,0); //undo origin move
+  glPushMatrix();
+  glScalef(0.5,0.5,0.6);
+  drawCube();
+  glPopMatrix();
+  
+  //create ears
+  glColor3f( 0.5, 0.5, 0.5 );
+  glPushMatrix();
+  glTranslatef(0, 0.25, 0.25);
+  drawEar();
+  glTranslatef(0, 0, -0.5);
+  drawEar();
+  glPopMatrix();
+  
+  //create nose
+  glPushMatrix();
+  glTranslatef(0.25, -0.1, 0);
+  glScalef(0.25, 0.15, 0.18);
+  drawCube();
+  glPopMatrix();
+  
+  //create whiskers
+  glColor3f(0.7, 0.7, 0.7);
+  glPushMatrix();
+  glTranslatef(0, 0, 0.15);
+  drawWhisker(os_whiskers.getCurrentAngle(),-1);//left top
+  drawWhisker(0, -1);//left middle
+  drawWhisker(-os_whiskers.getCurrentAngle(), -1);//left bottom
+  glTranslatef(0, 0, -0.3);
+  drawWhisker(os_whiskers.getCurrentAngle(),1);//left top
+  drawWhisker(0, 1);//left middle
+  drawWhisker(-os_whiskers.getCurrentAngle(), 1);//left bottom
+  glPopMatrix();
+  
+  //create eyes
+  glPushMatrix();
+  glTranslatef(0.25, 0.1, -0.15);
+  drawEye();
+  glTranslatef(0, 0, 0.3);
+  drawEye();
+  glPopMatrix();
+  
+  glPopMatrix(); // end keep head level while sitting up
+
+  // draw the buffer to the screen
+  glutSwapBuffers();
+
+  if (Dump) {               // save images to file
+    dumpPPM(iCount);
+    iCount++;
+  }
+
+  GLenum error = glGetError();
+  if(error != GL_NO_ERROR)
+    printf("ERROR: %s\n", gluErrorString(error));
+}
+
+//---------------------------------------------------------------
+
+int dumpPPM(int frameNum)
+{
+  FILE *fp;
+  const int maxVal=255;
+  register int y;
+  unsigned char *pixels;
+
+  glReadBuffer( GL_FRONT );
+  char fname[100];
+  sprintf(fname,"./ppm/img%03d.ppm",frameNum);
+  fp = fopen(fname,"wb");
+  if (!fp) {
+	printf("Unable to open file '%s'\n",fname);
+	return 1;
+  }
+  printf("Saving image `%s`\n",fname);
+  fprintf(fp, "P6 ");
+  fprintf(fp, "%d %d ", Width, Height);
+  fprintf(fp, "%d", maxVal);
+  putc(13,fp);
+  pixels = new unsigned char [3*Width];
+
+  y = 0;
+  glReadPixels(0,Height-1,Width,1,GL_RGB,GL_UNSIGNED_BYTE, (GLvoid *) pixels);
+
+  for ( y = Height-1; y >=0; y-- ) {
+	glReadPixels(0,y,Width,1,GL_RGB,GL_UNSIGNED_BYTE, (GLvoid *) pixels);
+	for (int n=0; n<3*Width; n++) {
+		putc(pixels[n],fp);
+	}
+  }
+  fclose(fp);
+  delete [] pixels;
+  return 0;
+}
+
+//---------------------------------------------------------------
+
+int main(int argc, char **argv)
+{  
+    
+  // create window and rendering context
+  glutInit( &argc, argv );
+  glutInitDisplayMode( GLUT_DEPTH | GLUT_RGB | GLUT_DOUBLE );
+  glutInitWindowSize( Width, Height );
+  glutCreateWindow( "Mouse" );
+  
+  // register display callback
+  glutDisplayFunc( displayCallback );
+  glutKeyboardFunc( keyboardCallback );
+  
+  glViewport( 0, 0, Width, Height );
+  glEnable( GL_DEPTH_TEST );
+  glEnable( GL_NORMALIZE );
+  
+  // lighting stuff
+  GLfloat ambient[] = {0.0, 0.0, 0.0, 1.0};
+  GLfloat diffuse[] = {0.9, 0.9, 0.9, 1.0};
+  GLfloat specular[] = {0.7, 0.7, 0.7, 1.0};
+  GLfloat position0[] = {1.0, 1.0, 1.0, 0.0};
+  glLightfv( GL_LIGHT0, GL_POSITION, position0 );
+  glLightfv( GL_LIGHT0, GL_AMBIENT, ambient );
+  glLightfv( GL_LIGHT0, GL_DIFFUSE, diffuse );
+  glLightfv( GL_LIGHT0, GL_SPECULAR, specular );
+  GLfloat position1[] = {-1.0, -1.0, -1.0, 0.0};
+  glLightfv( GL_LIGHT1, GL_POSITION, position1 );
+  glLightfv( GL_LIGHT1, GL_AMBIENT, ambient );
+  glLightfv( GL_LIGHT1, GL_DIFFUSE, diffuse );
+  glLightfv( GL_LIGHT1, GL_SPECULAR, specular );
+  
+  glEnable( GL_LIGHTING );
+  glEnable( GL_LIGHT0 );
+  glEnable( GL_LIGHT1 );
+  glEnable( GL_COLOR_MATERIAL );
+  
+  // pass control over to GLUT
+  glutMainLoop();
+  
+  return 0;       // never reached
+}
+// CPSC 314 Vjan2007
+// Project 2 Sample Code
+
+//p2.cpp
+//main function
+
+#include "defs.h"
+
+//global struct to store the program state
+ProgramState state;
+
+//speed tracking
+SpeedState ss(0.05, 0.5);
+
+//state tracking for part 1
+TripleState ts(0, 0, 0); //translate state
+TripleState rs(0, 0, 0); //rotate state
+
+//state tracking for part 2
+TripleState es(50, 10, 10); //eye point
+TripleState ls( 50, 10, -75); //lookat point
+TripleState us(0, 1, 0); //up vector
+
+//mouse state tracking for parts 3 and 4
+MouseState ms(0, 0, 0, 0);
+
+//state tracking for part 4
+int just_warped = 0;
+
+int lastFrameTime = 0;
+float Shininess = 50;
+GLdouble mvm[16];
+GLdouble original_mvm[16];
+int mvm_cache_initialized = 0;
+int original_mvm_initialized = 0;
+
+//a utility used in part 4)
+void WarpToCenter(void)
+{
+    int xpos = state.windowWidth/2;
+    int ypos = state.windowHeight/2;
+    glutWarpPointer((int) xpos, (int) ypos);
+    ms.sx = xpos;
+    ms.sy = ypos;
+    just_warped = 1;
+}
+
+//a utility function used in part 3
+void PostmultiplyAndSave() {
+    glMultMatrixd(mvm);
+    glGetDoublev(GL_MODELVIEW_MATRIX, mvm);
+}
+
+void TimerFunc(int value) {
+    glutTimerFunc(50, TimerFunc, 0);
+    if (state.cameraMode == 3)
+    {
+        if (ms.state == 0) {
+            //change yaw
+            if(mvm_cache_initialized == 1) {
+                glLoadIdentity();
+                glRotated(0.005*ms.GetXLength(), 0, 1, 0);
+                PostmultiplyAndSave();
+            }
+            //change pitch
+            if(mvm_cache_initialized == 1) {
+                glLoadIdentity();
+                glRotated(-0.005*ms.GetYLength(), 1, 0, 0);
+                PostmultiplyAndSave();
+            }
+
+        }
+    }
+}
+
+void MouseFunc(int button, int state, int x, int y) {
+    if (button == 0) {
+        if (state == 0) {
+            ms.sx = x;
+            ms.sy = y;
+            ms.state = 0;
+        }
+        else if (state == 1) {
+            ms.ex = x;
+            ms.ey = y;
+            ms.state = 1;
+        }
+    }
+}
+
+void MotionFunc(int x, int y) {
+    ms.ex = x;
+    ms.ey = y;
+}
+
+void PassiveMotionFunc(int x, int y) {
+    if (state.cameraMode == 4) {
+        if (just_warped == 0) {
+            ms.ex = x;
+            ms.ey = y;
+            //change yaw
+            if(mvm_cache_initialized == 1) {
+                glLoadIdentity();
+                glRotated(0.05*ms.GetXLength(), 0, 1, 0);
+                PostmultiplyAndSave();
+            }
+            //change pitch
+            if(mvm_cache_initialized == 1) {
+                glLoadIdentity();
+                glRotated(-0.05*ms.GetYLength(), 1, 0, 0);
+                PostmultiplyAndSave();
+            }
+            WarpToCenter();
+        }
+        else {
+            just_warped = 0;
+        }
+    }
+}
+
+//resets the state trackers to their default values
+void ResetDefaults() {
+    ts.Reset();
+    rs.Reset();
+
+    es.Reset();
+    ls.Reset();
+    us.Reset();
+
+    mvm_cache_initialized = 0;
+
+    if (original_mvm_initialized == 1){
+        glLoadMatrixd(original_mvm);
+        original_mvm_initialized = 0;
+    }
+
+    WarpToCenter();
+
+    glutSetCursor(GLUT_CURSOR_INHERIT);
+    printf("Reset\n");
+}
+
+void DrawGrid(float width, int subdivision) {
+    //Draws a grid of GL_LINES as a reference point
+    glDisable(GL_LIGHTING);
+    
+    glBegin(GL_LINES);
+    
+    for(int i = 0; i<=subdivision; i++) {
+        float stepsize = width/((float) (subdivision));
+        float ratio = 1/((float) subdivision);
+        
+        glColor3f(1-i*ratio, i*ratio, 0);
+        glVertex3f(-width/2 + i*stepsize, 0, width/2);
+        glColor3f(i*ratio, 1-i*ratio, 0);
+        glVertex3f(-width/2 + i*stepsize, 0, -width/2);
+        
+        glColor3f(0, i*ratio, 1-i*ratio);
+        glVertex3f(width/2, 0, -width/2 + i*stepsize);
+        glColor3f(0, 1-i*ratio, i*ratio);
+        glVertex3f(-width/2, 0, -width/2 + i*stepsize);
+    }
+    glEnd();
+    
+    glEnable(GL_LIGHTING);
+}
+
+void DrawCube(float x, float y, float z, float sx, float sy, float sz) {
+    //draw a parameterised rectangular prism
+    glPushMatrix();
+    glTranslatef(x, y, z);
+    glScalef(sx, sy, sz);
+    glutSolidCube(1.0);
+    glPopMatrix();
+}
+
+void DrawPost(float x, float y, float z) {
+    //draws a chess rook at the specified coordinates
+    //used to provide wayPoints in the scene
+    
+    glPushMatrix();
+    glTranslatef(x, y, z);
+    
+    glColor3f(0.0, 0.0, 1.0);
+    
+    DrawCube(0, 0, 0, 2, 1.5, 2);
+    DrawCube(0, 2.2, 0, 1.5, 4, 1.5);
+    DrawCube(0, 4.1, 0, 1.7, 0.7, 1.7);
+    
+    glPopMatrix();
+}
+
+
+void DisplayFunc() {
+    
+    //clear screen
+    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
+    
+    //Set up Projection matrix
+    glMatrixMode( GL_PROJECTION );
+    glLoadIdentity();
+    float aspect = (float) state.windowWidth / (float) state.windowHeight;
+    gluPerspective(45, aspect, 1, 1000);
+    
+    //Set up Modelview matrix
+    glMatrixMode( GL_MODELVIEW );
+    
+    if (state.cameraMode == 2) {
+        glLoadIdentity();
+        gluLookAt(es.x, es.y, es.z, ls.x, ls.y, ls.z, us.x, us.y, us.z);
+    }
+    
+    if (state.cameraMode == 3 || state.cameraMode == 4) {
+        //apply incremental transformation to the current transformation from
+        //the last frame we drew
+        if (mvm_cache_initialized == 1) {
+            glLoadMatrixd(mvm);
+        }
+        if (original_mvm_initialized == 0)
+        {
+            //cache in case we have to reset
+            glGetDoublev(GL_MODELVIEW_MATRIX, original_mvm);
+            original_mvm_initialized = 1;
+        }
+        //store the model-view matrix
+        glGetDoublev(GL_MODELVIEW_MATRIX, mvm);
+        mvm_cache_initialized = 1;
+    }
+    
+    
+    //push, so our modeling transforms do not disrupt the Modelview Matrix
+    //if we do not preserve the modelview matrix, we cannot do camera-centric
+    //control
+    glPushMatrix();
+    
+    if (state.cameraMode == 1) {
+        glTranslated(-ts.x, -ts.y, -ts.z);
+        glRotated(-rs.x, 1, 0, 0);
+        glRotated(-rs.y, 0, 1, 0);
+        glRotated(-rs.z, 0, 0, 1);
+    }
+    
+    glEnable(GL_LIGHTING);
+    DrawPost(-50, 0, -50);
+    DrawPost(-50, 0, 0);
+    DrawPost(-50, 0, 50);
+    DrawPost(0, 0, -50);
+    DrawPost(0, 0, 50);
+    DrawPost(50, 0, -50);
+    DrawPost(50, 0, 0);
+    DrawPost(50, 0, -50);
+    DrawGrid(100, 100);
+    
+    glPopMatrix();
+    
+    // draw the buffer to the screen
+    glFlush();
+    glutSwapBuffers();
+    
+}
+
+void IdleFunc() {
+    glutPostRedisplay();
+}
+
+void ReshapeFunc(int w, int h) {
+    //saves the window width and height when the window changes shape
+    //this is used to compute the aspect when setting the projection matrix
+    glViewport( 0, 0, w, h );
+    state.windowWidth = w;
+    state.windowHeight = h;
+    glutPostRedisplay();
+}
+
+void KeyboardFunc(unsigned char key, int x, int y) {
+    //global commands
+    switch (key) {
+        case '`':
+            exit(0);
+            break;
+        case '1':
+            printf("Camera Mode 1\n");
+            state.cameraMode = 1;
+            ResetDefaults();
+            break;
+        case '2':
+            printf("Camera Mode 2\n");
+            state.cameraMode = 2;
+            ResetDefaults();
+            glTranslated(0, 0, ss.tincrement);
+            glMultMatrixd(mvm);
+            
+            break;
+        case '3':
+            printf("Camera Mode 3\n");
+            state.cameraMode = 3;
+            ResetDefaults();
+            break;
+        case '4':
+            printf("Camera Mode 4\n");
+            state.cameraMode = 4;
+            ResetDefaults();
+            glutSetCursor(GLUT_CURSOR_NONE);
+            break;
+        case '=':
+            ss.IncreaseSpeed();
+            break;
+        case '-':
+            ss.DecreaseSpeed();
+            break;
+        case 'r':
+            ResetDefaults();
+            break;
+    }
+    
+    if (state.cameraMode == 1) {
+        switch (key) {
+            case 'x':
+                ts.x += ss.tincrement;
+                break;
+            case 'X':
+                ts.x -= ss.tincrement;
+                break;
+            case 'y':
+                ts.y += ss.tincrement;
+                break;
+            case 'Y':
+                ts.y -= ss.tincrement;
+                break;
+            case 'z':
+                ts.z += ss.tincrement;
+                break;
+            case 'Z':
+                ts.z -= ss.tincrement;
+                break;
+            case 'a':
+                rs.x += ss.aincrement;
+                break;
+            case 'A':
+                rs.x -= ss.aincrement;
+                break;
+            case 'b':
+                rs.y += ss.aincrement;
+                break;
+            case 'B':
+                rs.y -= ss.aincrement;
+                break;
+            case 'c':
+                rs.z += ss.aincrement;
+                break;
+            case 'C':
+                rs.z -= ss.aincrement;
+                break;
+        }
+    }
+    if (state.cameraMode == 2) {
+        switch (key) {
+            case 'x':
+                es.x += ss.tincrement;
+                break;
+            case 'X':
+                es.x -= ss.tincrement;
+                break;
+            case 'y':
+                es.y += ss.tincrement;
+                break;
+            case 'Y':
+                es.y -= ss.tincrement;
+                break;
+            case 'z':
+                es.z += ss.tincrement;
+                break;
+            case 'Z':
+                es.z -= ss.tincrement;
+                break;
+            case 'a':
+                ls.x += ss.tincrement;
+                break;
+            case 'A':
+                ls.x -= ss.tincrement;
+                break;
+            case 'b':
+                ls.y += ss.tincrement;
+                break;
+            case 'B':
+                ls.y -= ss.tincrement;
+                break;
+            case 'c':
+                ls.z += ss.tincrement;
+                break;
+            case 'C':
+                ls.z -= ss.tincrement;
+                break;
+            case 'd':
+                us.x += ss.tincrement;
+                break;
+                
+            case 'D':
+                us.x -= ss.tincrement;
+                break;
+            case 'e':
+                us.y += ss.tincrement;
+                break;
+            case 'E':
+                us.y -= ss.tincrement;
+                break;
+            case 'f':
+                us.z += ss.tincrement;
+                break;
+            case 'F':
+                us.z -= ss.tincrement;
+                break;
+        }
+    }
+    
+    if (state.cameraMode == 3 || state.cameraMode == 4) {
+        switch (key) {
+            case 'w':
+                if(mvm_cache_initialized == 1) {
+                    glLoadIdentity();
+                    glTranslated(0, 0, ss.tincrement);
+                    PostmultiplyAndSave();
+                }
+                break;
+            case 's':
+                if(mvm_cache_initialized == 1) {
+                    glLoadIdentity();
+                    glTranslated(0, 0, -ss.tincrement);
+                    PostmultiplyAndSave();
+                }
+                break;
+        }
+    }
+    
+    if (state.cameraMode == 3) {
+        switch (key) {
+            case 'a':
+                if(mvm_cache_initialized == 1) {
+                    glLoadIdentity();
+                    glRotated(-ss.aincrement, 0, 0, 1);
+                    PostmultiplyAndSave();
+                }
+                break;
+            case 'd':
+                if(mvm_cache_initialized == 1) {
+                    glLoadIdentity();
+                    glRotated(ss.aincrement, 0, 0, 1);
+                    PostmultiplyAndSave();
+                }
+                break;
+        }
+    }
+    
+    if (state.cameraMode == 4) {
+        switch (key) {
+            case 'a':
+                if(mvm_cache_initialized == 1) {
+                    glLoadIdentity();
+                    glTranslated(0.2*ss.aincrement, 0, 0);
+                    PostmultiplyAndSave();
+                }
+                break;
+            case 'd':
+                if(mvm_cache_initialized == 1) {
+                    glLoadIdentity();
+                    glTranslated(0.2*-ss.aincrement, 0, 0);
+                    PostmultiplyAndSave();
+                }
+                break;
+            case 'k':
+                if(mvm_cache_initialized == 1) {
+                    glLoadIdentity();
+                    glTranslated(0, 0.2*-ss.aincrement, 0);