# Grid
Last stable release | |
Development branch |
`: setup Grid for the SIMD target `` (default: `GEN`). A list of possible SIMD targets is detailed in a section below.
- `--enable-gen-simd-width=`: select the size (in bytes) of the generic SIMD vector type (default: 32 bytes).
- `--enable-precision={single|double}`: set the default precision (default: `double`).
- `--enable-precision=`: Use `` for message passing (default: `none`). A list of possible SIMD targets is detailed in a section below.
- `--enable-rng={sitmo|ranlux48|mt19937}`: choose the RNG (default: `sitmo `).
- `--disable-timers`: disable system dependent high-resolution timers.
- `--enable-chroma`: enable Chroma regression tests.
- `--enable-doxygen-doc`: enable the Doxygen documentation generation (build with `make doxygen-doc`)
### Possible communication interfaces
The following options can be use with the `--enable-comms=` option to target different communication interfaces:
| `` | Description |
| -------------- | ------------------------------------------------------------- |
| `none` | no communications |
| `mpi[-auto]` | MPI communications |
| `mpi3[-auto]` | MPI communications using MPI 3 shared memory |
| `shmem ` | Cray SHMEM communications |
For the MPI interfaces the optional `-auto` suffix instructs the `configure` scripts to determine all the necessary compilation and linking flags. This is done by extracting the informations from the MPI wrapper specified in the environment variable `MPICXX` (if not specified `configure` will scan though a list of default names). The `-auto` suffix is not supported by the Cray environment wrapper scripts. Use the standard versions instead.
### Possible SIMD types
The following options can be use with the `--enable-simd=` option to target different SIMD instruction sets:
| `` | Description |
| ----------- | -------------------------------------- |
| `GEN` | generic portable vector code |
| `SSE4` | SSE 4.2 (128 bit) |
| `AVX` | AVX (256 bit) |
| `AVXFMA` | AVX (256 bit) + FMA |
| `AVXFMA4` | AVX (256 bit) + FMA4 |
| `AVX2` | AVX 2 (256 bit) |
| `AVX512` | AVX 512 bit |
| `NEONv8` | [ARM NEON](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch07s03.html) (128 bit) |
| `QPX` | IBM QPX (256 bit) |
Alternatively, some CPU codenames can be directly used:
| `` | Description |
| ----------- | -------------------------------------- |
| `KNL` | [Intel Xeon Phi codename Knights Landing](http://ark.intel.com/products/codename/48999/Knights-Landing) |
| `BGQ` | Blue Gene/Q |
#### Notes:
- We currently support AVX512 only for the Intel compiler. Support for GCC and clang will appear in future versions of Grid when the AVX512 support within GCC and clang will be more advanced.
- For BG/Q only [bgclang](http://trac.alcf.anl.gov/projects/llvm-bgq) is supported. We do not presently plan to support more compilers for this platform.
- BG/Q performances are currently rather poor. This is being investigated for future versions.
- The vector size for the `GEN` target can be specified with the `configure` script option `--enable-gen-simd-width`.
### Build setup for Intel Knights Landing platform
The following configuration is recommended for the Intel Knights Landing platform:
``` bash
../configure --enable-precision=double\
--enable-simd=KNL \
--enable-comms=mpi-auto \
--enable-mkl \
CXX=icpc MPICXX=mpiicpc
```
The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.
If you are working on a Cray machine that does not use the `mpiicpc` wrapper, please use:
``` bash
../configure --enable-precision=double\
--enable-simd=KNL \
--enable-comms=mpi \
--enable-mkl \
CXX=CC CC=cc
```
If gmp and mpfr are NOT in standard places (/usr/) these flags may be needed:
``` bash
--with-gmp= \
--with-mpfr= \
```
where `` is the UNIX prefix where GMP and MPFR are installed.
Knight's Landing with Intel Omnipath adapters with two adapters per node
presently performs better with use of more than one rank per node, using shared memory
for interior communication. This is the mpi3 communications implementation.
We recommend four ranks per node for best performance, but optimum is local volume dependent.
``` bash
../configure --enable-precision=double\
--enable-simd=KNL \
--enable-comms=mpi3-auto \
--enable-mkl \
CC=icpc MPICXX=mpiicpc
```
### Build setup for Intel Haswell Xeon platform
The following configuration is recommended for the Intel Haswell platform:
``` bash
../configure --enable-precision=double\
--enable-simd=AVX2 \
--enable-comms=mpi3-auto \
--enable-mkl \
CXX=icpc MPICXX=mpiicpc
```
The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.
If gmp and mpfr are NOT in standard places (/usr/) these flags may be needed:
``` bash
--with-gmp= \
--with-mpfr= \
```
where `` is the UNIX prefix where GMP and MPFR are installed.
If you are working on a Cray machine that does not use the `mpiicpc` wrapper, please use:
``` bash
../configure --enable-precision=double\
--enable-simd=AVX2 \
--enable-comms=mpi3 \
--enable-mkl \
CXX=CC CC=cc
```
Since Dual socket nodes are commonplace, we recommend MPI-3 as the default with the use of
one rank per socket. If using the Intel MPI library, threads should be pinned to NUMA domains using
```
export I_MPI_PIN=1
```
This is the default.
### Build setup for Intel Skylake Xeon platform
The following configuration is recommended for the Intel Skylake platform:
``` bash
../configure --enable-precision=double\
--enable-simd=AVX512 \
--enable-comms=mpi3 \
--enable-mkl \
CXX=mpiicpc
```
The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.
If gmp and mpfr are NOT in standard places (/usr/) these flags may be needed:
``` bash
--with-gmp= \
--with-mpfr= \
```
where `` is the UNIX prefix where GMP and MPFR are installed.
If you are working on a Cray machine that does not use the `mpiicpc` wrapper, please use:
``` bash
../configure --enable-precision=double\
--enable-simd=AVX512 \
--enable-comms=mpi3 \
--enable-mkl \
CXX=CC CC=cc
```
Since Dual socket nodes are commonplace, we recommend MPI-3 as the default with the use of
one rank per socket. If using the Intel MPI library, threads should be pinned to NUMA domains using
```
export I_MPI_PIN=1
```
This is the default.