Wiki
Clone wikiCS5220-S14 / cpus
C4 instructional node CPUs
The instructional nodes
The instructional nodes (group cs
on C4) are dual
Intel Xeon E5504 boxes based on the Nehalem microarchitecture.
Each CPU has four cores, for a total of eight available cores. The clock speed
is 2 GHz. Each core has 32 KB L1 data and instruction caches and a 256 KB L2
cache; in addition, each chip has a shared 4 MB L3 cache. This is a 64-bit
processor, and it supports SSE through version 4.2; but since it is a little
older (launched 2009), it does not support the more recent AVX instructions.
The C4 cluster is heterogeneous, so several of the nodes are more recent CPUs
with more memory (the pac
nodes are particularly nice). In particular,
the head node is a more recent CPU (an X5672). This means that
codes tuned for the head node might suffer a performance hit on the
instructional nodes! Plan accordingly.
Peak flop rates
The ideal peak flop rate per core on C4 is 8 GFlop/s in double precision:
(2e9 cycles/sec)*(2 SSE ops/cycle)*(2 double ops/SSE op)
Understanding the instruction issue rate and SIMD parallelism is critical! These are relatively old processors; newer instructions are capable of more parallel floating point operations per clock.
How to investigate your CPU
If you are using Linux, the simplest way to get the basic specs for your CPU is to run
cat /proc/cpuinfo
This gives you a good starting point: the CPU model name (and family and stepping numbers), the clock speed, the number of cores per chip, and the capacity of the last-level cache. If you want more information, it pays to do some searching around. For optimization purposes, key features we would like to find include the micro-architecture (Nehalem, for us) and the L1 and L2 cache parameters.
Updated