FMA4 patch for GROMACS 4.5.5

  Overwrite .s files on src/gmxlib/nonbonded/nb_kernel_x86_64_sse and _sse2.
  Then compile as usual GROMACS installation.

  GCC 4.6 or later
  CPU supporting FMA4 and XOP (AMD "Bulldozer" CPU family; AMD-FX series or AMD Opteron "Interlagos" or "Valencia")

Known issues & TODOs:
  Does not support 32-bit versions (nb_kenel_ia32_*, maybe I can generate it, but I do not have machines to test on).
  Does not support 4.6 (may work on group cutoff scheme, but it's not tested. For Verlet scheme some work is needed, though it should be a lot easier than handling asm directly. :P)
  Does not support Intel syntax
  Most of scalar operations are not FMA-ized.
  Cpuid check is not implemented.
Bug reporting:
  Send email to "shun.sakuraba" on GMail.