Wiki

Clone wiki

CS5220-S14 / cpus

C4 instructional node CPUs

The instructional nodes

The instructional nodes (group cs on C4) are dual Intel Xeon E5504 boxes based on the Nehalem microarchitecture. Each CPU has four cores, for a total of eight available cores. The clock speed is 2 GHz. Each core has 32 KB L1 data and instruction caches and a 256 KB L2 cache; in addition, each chip has a shared 4 MB L3 cache. This is a 64-bit processor, and it supports SSE through version 4.2; but since it is a little older (launched 2009), it does not support the more recent AVX instructions.

The C4 cluster is heterogeneous, so several of the nodes are more recent CPUs with more memory (the pac nodes are particularly nice). In particular, the head node is a more recent CPU (an X5672). This means that codes tuned for the head node might suffer a performance hit on the instructional nodes! Plan accordingly.

Peak flop rates

The ideal peak flop rate per core on C4 is 8 GFlop/s in double precision:

(2e9 cycles/sec)*(2 SSE ops/cycle)*(2 double ops/SSE op)

Understanding the instruction issue rate and SIMD parallelism is critical! These are relatively old processors; newer instructions are capable of more parallel floating point operations per clock.

How to investigate your CPU

If you are using Linux, the simplest way to get the basic specs for your CPU is to run

cat /proc/cpuinfo

This gives you a good starting point: the CPU model name (and family and stepping numbers), the clock speed, the number of cores per chip, and the capacity of the last-level cache. If you want more information, it pays to do some searching around. For optimization purposes, key features we would like to find include the micro-architecture (Nehalem, for us) and the L1 and L2 cache parameters.

Updated