Grid/general_build.md at 38f93f7e1289f7aee4d3f828b0fe653bf55d85f8

portelli/Grid

Fork 0

mirror of https://github.com/paboyle/Grid.git synced 2025-06-13 12:47:05 +01:00

Files

Guido Cossu 38f93f7e12 Adding pdf generation capabilities

2018-03-15 17:01:59 +00:00

6.3 KiB

Raw Blame History

title, author_profile, excerpt, header, permalink, sidebar

title

author_profile

excerpt

header

permalink

sidebar

Documentation

false

Building on Intel and AMD targets

overlay_color
#5DADE2

/docs/general_build/

nav
docs

{% include base_path %} The information included in this page has been updated on March 2018 and it is valid for the release version 0.7.0.

{% include toc icon="gears" title="Contents" %}

Building for the Intel Knights Landing

The following configuration is recommended for the Intel Knights Landing platform:

../configure --enable-precision=double\
             --enable-simd=KNL        \
             --enable-comms=mpi-auto  \
             --with-gmp=<path>        \
             --with-mpfr=<path>       \
             --enable-mkl             \
             CXX=icpc MPICXX=mpiicpc

where <path> is the UNIX prefix where GMP and MPFR are installed. If you are working on a Cray machine that does not use the mpiicpc wrapper, please use:

../configure --enable-precision=double\
             --enable-simd=KNL        \
             --enable-comms=mpi       \
             --with-gmp=<path>        \
             --with-mpfr=<path>       \
             --enable-mkl             \
             CXX=CC CC=cc

Building for the Intel Haswell

The following configuration is recommended for the Intel Haswell platform:

  ../configure --enable-precision=double\
             --enable-simd=AVX2       \
             --enable-comms=mpi-auto \
             --enable-mkl             \
             CXX=icpc MPICXX=mpiicpc

The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.

If gmp and mpfr are NOT in standard places (/usr/) these flags may be needed:

               --with-gmp=<path>        \
               --with-mpfr=<path>

where <path> is the UNIX prefix where GMP and MPFR are installed.

If you are working on a Cray machine that does not use the mpiicpc wrapper, please use:

  ../configure --enable-precision=double\
             --enable-simd=AVX2       \
             --enable-comms=mpi      \
             --enable-mkl             \
             CXX=CC CC=cc

If using the Intel MPI library, threads should be pinned to NUMA domains using:

        export I_MPI_PIN=1

This is the default.

Building for the Intel Skylake

The following configuration is recommended for the Intel Skylake platform:

  ../configure --enable-precision=double\
             --enable-simd=AVX512     \
             --enable-comms=mpi-auto  \
             --enable-mkl             \
             CXX=mpiicpc

The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.

If gmp and mpfr are NOT in standard places (/usr/) these flags may be needed:

               --with-gmp=<path>        \
               --with-mpfr=<path>       \

where <path> is the UNIX prefix where GMP and MPFR are installed.

If you are working on a Cray machine that does not use the mpiicpc wrapper, please use:

  ../configure --enable-precision=double\
             --enable-simd=AVX512     \
             --enable-comms=mpi       \
             --enable-mkl             \
             CXX=CC CC=cc

If using the Intel MPI library, threads should be pinned to NUMA domains using:

        export I_MPI_PIN=1

This is the default.

Building for the AMD Epyc

The AMD EPYC is a multichip module comprising 32 cores spread over four distinct chips each with 8 cores. So, even with a single socket node there is a quad-chip module. Dual socket nodes with 64 cores total are common. Each chip within the module exposes a separate NUMA domain. There are four NUMA domains per socket and we recommend one MPI rank per NUMA domain. MPI-3 is recommended with the use of four ranks per socket, and 8 threads per rank.

The following configuration is recommended for the AMD EPYC platform:

  ../configure --enable-precision=double\
             --enable-simd=AVX2       \
             --enable-comms=mpi3 \
             CXX=mpicxx

If gmp and mpfr are NOT in standard places (/usr/) these flags may be needed::

               --with-gmp=<path>        \
               --with-mpfr=<path>

where <path> is the UNIX prefix where GMP and MPFR are installed.

Using MPICH and g++ v4.9.2, best performance can be obtained using explicit GOMP_CPU_AFFINITY flags for each MPI rank. This can be done by invoking MPI on a wrapper script omp_bind.sh to handle this.

It is recommended to run 8 MPI ranks on a single dual socket AMD EPYC, with 8 threads per rank using MPI3 and shared memory to communicate within this node:

  mpirun -np 8 ./omp_bind.sh ./Benchmark_dwf --mpi 2.2.2.1 --dslash-unroll --threads 8 --grid 16.16.16.16 --cacheblocking 4.4.4.4

Where omp_bind.sh does the following:

  #!/bin/bash

  numanode=` expr $PMI_RANK % 8 `
  basecore=`expr $numanode \* 16`
  core0=`expr $basecore + 0 `
  core1=`expr $basecore + 2 `
  core2=`expr $basecore + 4 `
  core3=`expr $basecore + 6 `
  core4=`expr $basecore + 8 `
  core5=`expr $basecore + 10 `
  core6=`expr $basecore + 12 `
  core7=`expr $basecore + 14 `

  export GOMP_CPU_AFFINITY="$core0 $core1 $core2 $core3 $core4 $core5 $core6 $core7"
  echo GOMP_CUP_AFFINITY $GOMP_CPU_AFFINITY

  $@

Build setup for laptops, other compilers, non-cluster builds

Many versions of g++ and clang++ work with Grid, and involve merely replacing CXX (and MPICXX), and omit the enable-mkl flag.

Single node, non MPI builds are enabled with:

  --enable-comms=none

FFTW support that is not in the default search path may then enabled with:

  --with-fftw=<installpath>

BLAS will not be compiled in by default, and Lanczos will default to Eigen diagonalisation.

Notes

GMP is the GNU Multiple Precision Library.
MPFR is a C library for multiple-precision floating-point computations with correct rounding.
Both libaries are necessary for the RHMC support.

{% include paginator.html %}

6.3 KiB Raw Blame History

Building for the Intel Knights Landing

Building for the Intel Haswell

Building for the Intel Skylake

Building for the AMD Epyc

Build setup for laptops, other compilers, non-cluster builds

Notes

6.3 KiB

Raw Blame History