Grid/README.md

# Grid
<table>
<tr>
    <td>Last stable release</td>
    <td><a href="https://travis-ci.org/paboyle/Grid">
    <img src="https://travis-ci.org/paboyle/Grid.svg?branch=master"></a>
    </td>
</tr>
<tr>
    <td>Development branch</td>
    <td><a href="https://travis-ci.org/paboyle/Grid">
    <img src="https://travis-ci.org/paboyle/Grid.svg?branch=develop"></a>
    </td>
</tr>
</table>

**Data parallel C++ mathematical object library.**

License: GPL v2.

Last update June 2017.

_Please do not send pull requests to the `master` branch which is reserved for releases._

### Compilers

Intel ICPC v16.0.3 and later

Clang v3.5 and later (need 3.8 and later for OpenMP)

GCC   v4.9.x (recommended)

GCC   v6.3 and later

### Important: 

Some versions of GCC appear to have a bug under high optimisation (-O2, -O3).

The safety of these compiler versions cannot be guaranteed at this time. Follow Issue 100 for details and updates.

GCC   v5.x

GCC   v6.1, v6.2

### Bug report

_To help us tracking and solving more efficiently issues with Grid, please report problems using the issue system of GitHub rather than sending emails to Grid developers._

When you file an issue, please go though the following checklist:

1. Check that the code is pointing to the `HEAD` of `develop` or any commit in `master` which is tagged with a version number. 
2. Give a description of the target platform (CPU, network, compiler). Please give the full CPU part description, using for example `cat /proc/cpuinfo | grep 'model name' | uniq` (Linux) or `sysctl machdep.cpu.brand_string` (macOS) and the full output the `--version` option of your compiler.
3. Give the exact `configure` command used.
4. Attach `config.log`.
5. Attach `grid.config.summary`.
6. Attach the output of `make V=1`.
7. Describe the issue and any previous attempt to solve it. If relevant, show how to reproduce the issue using a minimal working example.


### Description
This library provides data parallel C++ container classes with internal memory layout
that is transformed to map efficiently to SIMD architectures. CSHIFT facilities
are provided, similar to HPF and cmfortran, and user control is given over the mapping of
array indices to both MPI tasks and SIMD processing elements.

* Identically shaped arrays then be processed with perfect data parallelisation.
* Such identically shaped arrays are called conformable arrays.

The transformation is based on the observation that Cartesian array processing involves
identical processing to be performed on different regions of the Cartesian array.

The library will both geometrically decompose into MPI tasks and across SIMD lanes.
Local vector loops are parallelised with OpenMP pragmas.

Data parallel array operations can then be specified with a SINGLE data parallel paradigm, but
optimally use MPI, OpenMP and SIMD parallelism under the hood. This is a significant simplification
for most programmers.

The layout transformations are parametrised by the SIMD vector length. This adapts according to the architecture.
Presently SSE4, ARM NEON (128 bits) AVX, AVX2, QPX (256 bits), IMCI and AVX512 (512 bits) targets are supported.

These are presented as `vRealF`, `vRealD`, `vComplexF`, and `vComplexD` internal vector data types. 
The corresponding scalar types are named `RealF`, `RealD`, `ComplexF` and `ComplexD`.

MPI, OpenMP, and SIMD parallelism are present in the library.
Please see [this paper](https://arxiv.org/abs/1512.03487) for more detail.

### Required libraries
Grid requires [GMP](https://gmplib.org/), [MPFR](http://www.mpfr.org/) and optionally [HDF5](https://support.hdfgroup.org/HDF5/) and [LIME](http://usqcd-software.github.io/c-lime/) (for ILDG file format support) to be installed.

### Quick start
First, start by cloning the repository:

``` bash
git clone https://github.com/paboyle/Grid.git
```

Then enter the cloned directory and set up the build system:

``` bash
cd Grid
./bootstrap.sh
```

Now you can execute the `configure` script to generate makefiles (here from a build directory):

``` bash
mkdir build; cd build
../configure --enable-precision=double --enable-simd=AVX --enable-comms=mpi-auto --prefix=<path>
```

where `--enable-precision=` set the default precision,
`--enable-simd=` set the SIMD type, `--enable-
comms=`, and `<path>` should be replaced by the prefix path where you want to
install Grid. Other options are detailed in the next section, you can also use `configure
--help` to display them. Like with any other program using GNU autotool, the
`CXX`, `CXXFLAGS`, `LDFLAGS`, ... environment variables can be modified to
customise the build.

Finally, you can build, check, and install Grid:

``` bash
make; make check; make install
```

To minimise the build time, only the tests at the root of the `tests` directory are built by default. If you want to build tests in the sub-directory `<subdir>` you can execute:

``` bash
make -C tests/<subdir> tests
```
If you want to build all the tests at once just use `make tests`.

### Build configuration options

- `--prefix=<path>`: installation prefix for Grid.
- `--with-gmp=<path>`: look for GMP in the UNIX prefix `<path>`
- `--with-mpfr=<path>`: look for MPFR in the UNIX prefix `<path>`
- `--with-fftw=<path>`: look for FFTW in the UNIX prefix `<path>`
- `--enable-lapack[=<path>]`: enable LAPACK support in Lanczos eigensolver. A UNIX prefix containing the library can be specified (optional).
- `--enable-mkl[=<path>]`: use Intel MKL for FFT (and LAPACK if enabled) routines. A UNIX prefix containing the library can be specified (optional).
- `--enable-numa`: enable NUMA first touch optimisation
- `--enable-simd=<code>`: setup Grid for the SIMD target `<code>` (default: `GEN`). A list of possible SIMD targets is detailed in a section below.
- `--enable-gen-simd-width=<size>`: select the size (in bytes) of the generic SIMD vector type (default: 32 bytes).
- `--enable-precision={single|double}`: set the default precision (default: `double`).
- `--enable-precision=<comm>`: Use `<comm>` for message passing (default: `none`). A list of possible SIMD targets is detailed in a section below.
- `--enable-rng={sitmo|ranlux48|mt19937}`: choose the RNG (default: `sitmo `).
- `--disable-timers`: disable system dependent high-resolution timers.
- `--enable-chroma`: enable Chroma regression tests.
- `--enable-doxygen-doc`: enable the Doxygen documentation generation (build with `make doxygen-doc`)

### Possible communication interfaces

The following options can be use with the `--enable-comms=` option to target different communication interfaces:

| `<comm>`       | Description                                                   |
| -------------- | ------------------------------------------------------------- |
| `none`         | no communications                                             |
| `mpi[-auto]`   | MPI communications                                            |
| `mpi3[-auto]`  | MPI communications using MPI 3 shared memory                  |
| `shmem `       | Cray SHMEM communications                                     |

For the MPI interfaces the optional `-auto` suffix instructs the `configure` scripts to determine all the necessary compilation and linking flags. This is done by extracting the informations from the MPI wrapper specified in the environment variable `MPICXX` (if not specified `configure` will scan though a list of default names). The `-auto` suffix is not supported by the Cray environment wrapper scripts. Use the standard versions instead.  

### Possible SIMD types

The following options can be use with the `--enable-simd=` option to target different SIMD instruction sets:

| `<code>`    | Description                            |
| ----------- | -------------------------------------- |
| `GEN`       | generic portable vector code           |
| `SSE4`      | SSE 4.2 (128 bit)                      |
| `AVX`       | AVX (256 bit)                          |
| `AVXFMA`    | AVX (256 bit) + FMA                    |
| `AVXFMA4`   | AVX (256 bit) + FMA4                   |
| `AVX2`      | AVX 2 (256 bit)                        |
| `AVX512`    | AVX 512 bit                            |
| `NEONv8`    | [ARM NEON](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch07s03.html) (128 bit)                     |
| `QPX`       | IBM QPX (256 bit)                      |

Alternatively, some CPU codenames can be directly used:

| `<code>`    | Description                            |
| ----------- | -------------------------------------- |
| `KNL`       | [Intel Xeon Phi codename Knights Landing](http://ark.intel.com/products/codename/48999/Knights-Landing) |
| `BGQ`       | Blue Gene/Q                            |

#### Notes:
- We currently support AVX512 only for the Intel compiler. Support for GCC and clang will appear in future versions of Grid when the AVX512 support within GCC and clang will be more advanced.
- For BG/Q only [bgclang](http://trac.alcf.anl.gov/projects/llvm-bgq) is supported. We do not presently plan to support more compilers for this platform.
- BG/Q performances are currently rather poor. This is being investigated for future versions.
- The vector size for the `GEN` target can be specified with the `configure` script option `--enable-gen-simd-width`.

### Build setup for Intel Knights Landing platform

The following configuration is recommended for the Intel Knights Landing platform:

``` bash
../configure --enable-precision=double\
             --enable-simd=KNL        \
             --enable-comms=mpi-auto  \
             --enable-mkl             \
             CXX=icpc MPICXX=mpiicpc
```
The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.

If you are working on a Cray machine that does not use the `mpiicpc` wrapper, please use:

``` bash
../configure --enable-precision=double\
             --enable-simd=KNL        \
             --enable-comms=mpi       \
             --enable-mkl             \
             CXX=CC CC=cc
```

If gmp and mpfr are NOT in standard places (/usr/) these flags may be needed:
```            --with-gmp=<path>        \
               --with-mpfr=<path>       \
```
where `<path>` is the UNIX prefix where GMP and MPFR are installed. 

Knight's Landing with Intel Omnipath adapters with two adapters per node 
presently performs better with use of more than one rank per node, using shared memory 
for interior communication. This is the mpi3 communications implementation. 
We recommend four ranks per node for best performance, but optimum is local volume dependent.

``` bash
../configure --enable-precision=double\
             --enable-simd=KNL        \
             --enable-comms=mpi3      \
             --enable-mkl             \
             CXX=mpiicpc
```

### Build setup for Intel Haswell Xeon platform

The following configuration is recommended for the Intel Knights Landing platform:

``` bash
../configure --enable-precision=double\
             --enable-simd=AVX2       \
             --enable-comms=mpi3      \
             --enable-mkl             \
             CXX=mpiicpc
```
The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.

If gmp and mpfr are NOT in standard places (/usr/) these flags may be needed:
```            --with-gmp=<path>        \
               --with-mpfr=<path>       \
```
where `<path>` is the UNIX prefix where GMP and MPFR are installed. 

If you are working on a Cray machine that does not use the `mpiicpc` wrapper, please use:

``` bash
../configure --enable-precision=double\
             --enable-simd=AVX2       \
             --enable-comms=mpi3      \
             --enable-mkl             \
             CXX=CC CC=cc
```
Since Dual socket nodes are commonplace, we recommend MPI-3 as the default with the use of 
one rank per socket. If using the Intel MPI library, threads should be pinned to NUMA domains using
```
        export I_MPI_PIN=1
```
This is the default.

### Build setup for Intel Skylake Xeon platform

The following configuration is recommended for the Intel Knights Landing platform:

``` bash
../configure --enable-precision=double\
             --enable-simd=AVX512     \
             --enable-comms=mpi3      \
             --enable-mkl             \
             CXX=mpiicpc
```
The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.

If gmp and mpfr are NOT in standard places (/usr/) these flags may be needed:
```            --with-gmp=<path>        \
               --with-mpfr=<path>       \
```
where `<path>` is the UNIX prefix where GMP and MPFR are installed. 

If you are working on a Cray machine that does not use the `mpiicpc` wrapper, please use:

``` bash
../configure --enable-precision=double\
             --enable-simd=AVX512     \
             --enable-comms=mpi3      \
             --enable-mkl             \
             CXX=CC CC=cc
```
Since Dual socket nodes are commonplace, we recommend MPI-3 as the default with the use of 
one rank per socket. If using the Intel MPI library, threads should be pinned to NUMA domains using
```
        export I_MPI_PIN=1
```
This is the default.
README update 2016-08-03 12:38:54 +01:00			`# Grid`
			`<table>`
			`<tr>`
			`<td>Last stable release</td>`
			`<td><a href="https://travis-ci.org/paboyle/Grid">`
			`<img src="https://travis-ci.org/paboyle/Grid.svg?branch=master"></a>`
			`</td>`
			`</tr>`
			`<tr>`
			`<td>Development branch</td>`
link fix in README 2016-08-03 12:40:56 +01:00			`<td><a href="https://travis-ci.org/paboyle/Grid">`
README update 2016-08-03 12:38:54 +01:00			`<img src="https://travis-ci.org/paboyle/Grid.svg?branch=develop"></a>`
			`</td>`
			`</tr>`
			`</table>`

			`Data parallel C++ mathematical object library.`

			`License: GPL v2.`

README file update 2017-06-29 11:42:25 +01:00			`Last update June 2017.`
README update 2016-10-31 18:21:52 +00:00
README update 2016-11-03 13:48:20 +00:00			_Please do not send pull requests to the `master` branch which is reserved for releases._
README update 2016-10-31 18:21:52 +00:00
README update 2017-05-06 18:39:54 +01:00			`### Compilers`

Small change in the readme about the intel compiler 2017-05-09 15:38:59 +01:00			`Intel ICPC v16.0.3 and later`
Formatting 2017-05-06 18:40:55 +01:00
README update 2017-05-06 18:39:54 +01:00			`Clang v3.5 and later (need 3.8 and later for OpenMP)`
Formatting 2017-05-06 18:40:55 +01:00
README update 2017-05-06 18:39:54 +01:00			`GCC v4.9.x (recommended)`
Formatting 2017-05-06 18:40:55 +01:00
README update 2017-05-06 18:39:54 +01:00			`GCC v6.3 and later`

formattign 2017-05-06 18:41:27 +01:00			`### Important:`
README update 2017-05-06 18:39:54 +01:00
More info on gcc bug 2017-05-06 18:42:11 +01:00			`Some versions of GCC appear to have a bug under high optimisation (-O2, -O3).`
Formatting 2017-05-06 18:40:55 +01:00
More info on gcc bug 2017-05-06 18:42:11 +01:00			`The safety of these compiler versions cannot be guaranteed at this time. Follow Issue 100 for details and updates.`
Formatting 2017-05-06 18:40:55 +01:00
README update 2017-05-06 18:39:54 +01:00			`GCC v5.x`
Formatting 2017-05-06 18:40:55 +01:00
README update 2017-05-06 18:39:54 +01:00			`GCC v6.1, v6.2`

README update 2016-10-31 18:21:52 +00:00			`### Bug report`

			`_To help us tracking and solving more efficiently issues with Grid, please report problems using the issue system of GitHub rather than sending emails to Grid developers._`

			`When you file an issue, please go though the following checklist:`

			1. Check that the code is pointing to the `HEAD` of `develop` or any commit in `master` which is tagged with a version number.
README update 2016-11-03 13:48:20 +00:00			2. Give a description of the target platform (CPU, network, compiler). Please give the full CPU part description, using for example `cat /proc/cpuinfo \| grep 'model name' \| uniq` (Linux) or `sysctl machdep.cpu.brand_string` (macOS) and the full output the `--version` option of your compiler.
README update 2016-10-31 18:21:52 +00:00			3. Give the exact `configure` command used.
			4. Attach `config.log`.
Small change in the readme about the intel compiler 2017-05-09 15:38:59 +01:00			5. Attach `grid.config.summary`.
README update 2016-10-31 18:21:52 +00:00			6. Attach the output of `make V=1`.
			`7. Describe the issue and any previous attempt to solve it. If relevant, show how to reproduce the issue using a minimal working example.`


README update 2016-08-03 12:38:54 +01:00
			`### Description`
Update README.md 2015-03-07 07:20:12 +00:00			`This library provides data parallel C++ container classes with internal memory layout`
			`that is transformed to map efficiently to SIMD architectures. CSHIFT facilities`
			`are provided, similar to HPF and cmfortran, and user control is given over the mapping of`
			`array indices to both MPI tasks and SIMD processing elements.`

			`* Identically shaped arrays then be processed with perfect data parallelisation.`
README update 2016-11-03 13:48:20 +00:00			`* Such identically shaped arrays are called conformable arrays.`
Update README.md 2015-03-07 07:20:12 +00:00
			`The transformation is based on the observation that Cartesian array processing involves`
			`identical processing to be performed on different regions of the Cartesian array.`

Update README.md 2015-04-18 12:21:37 +01:00			`The library will both geometrically decompose into MPI tasks and across SIMD lanes.`
			`Local vector loops are parallelised with OpenMP pragmas.`
Update README.md 2015-03-07 07:20:12 +00:00
			`Data parallel array operations can then be specified with a SINGLE data parallel paradigm, but`
			`optimally use MPI, OpenMP and SIMD parallelism under the hood. This is a significant simplification`
			`for most programmers.`

			`The layout transformations are parametrised by the SIMD vector length. This adapts according to the architecture.`
README file update 2017-06-29 11:42:25 +01:00			`Presently SSE4, ARM NEON (128 bits) AVX, AVX2, QPX (256 bits), IMCI and AVX512 (512 bits) targets are supported.`
Update README.md 2015-03-07 07:20:12 +00:00
README file update 2017-06-29 11:42:25 +01:00			These are presented as `vRealF`, `vRealD`, `vComplexF`, and `vComplexD` internal vector data types.
README update 2016-08-03 12:38:54 +01:00			The corresponding scalar types are named `RealF`, `RealD`, `ComplexF` and `ComplexD`.
Update README.md 2015-03-07 07:20:12 +00:00
README update 2016-08-03 12:38:54 +01:00			`MPI, OpenMP, and SIMD parallelism are present in the library.`
Update README.md 2017-06-29 11:48:11 +01:00			`Please see [this paper](https://arxiv.org/abs/1512.03487) for more detail.`
Update README.md 2015-03-07 07:20:12 +00:00
README file update 2017-06-29 11:42:25 +01:00			`### Required libraries`
			`Grid requires [GMP](https://gmplib.org/), [MPFR](http://www.mpfr.org/) and optionally [HDF5](https://support.hdfgroup.org/HDF5/) and [LIME](http://usqcd-software.github.io/c-lime/) (for ILDG file format support) to be installed.`

README update 2016-10-31 18:21:52 +00:00			`### Quick start`
README update 2016-08-03 12:38:54 +01:00			`First, start by cloning the repository:`
Update README.md 2015-03-07 07:20:12 +00:00
README update 2016-08-03 12:38:54 +01:00			``` bash
			`git clone https://github.com/paboyle/Grid.git`
			```
Update README.md 2015-03-07 07:20:12 +00:00
README update 2016-08-03 12:38:54 +01:00			`Then enter the cloned directory and set up the build system:`

			``` bash
			`cd Grid`
			`./bootstrap.sh`
			```

			Now you can execute the `configure` script to generate makefiles (here from a build directory):

			``` bash
			`mkdir build; cd build`
README update 2016-08-04 16:27:02 +01:00			`../configure --enable-precision=double --enable-simd=AVX --enable-comms=mpi-auto --prefix=<path>`
README update 2016-08-03 12:38:54 +01:00			```

README update 2016-10-31 18:21:52 +00:00			where `--enable-precision=` set the default precision,
			`--enable-simd=` set the SIMD type, `--enable-
			comms=`, and `<path>` should be replaced by the prefix path where you want to
			install Grid. Other options are detailed in the next section, you can also use `configure
README update 2016-08-04 16:27:02 +01:00			--help` to display them. Like with any other program using GNU autotool, the
			`CXX`, `CXXFLAGS`, `LDFLAGS`, ... environment variables can be modified to
			`customise the build.`
README update 2016-08-03 12:38:54 +01:00
README update 2017-05-06 18:39:54 +01:00			`Finally, you can build, check, and install Grid:`
Update README.md 2015-03-07 07:20:12 +00:00
README update 2016-08-03 12:38:54 +01:00			``` bash
README update 2017-05-06 18:39:54 +01:00			`make; make check; make install`
README update 2016-08-03 12:38:54 +01:00			```
Update README.md 2015-03-07 07:20:12 +00:00
README update 2016-08-03 12:38:54 +01:00			To minimise the build time, only the tests at the root of the `tests` directory are built by default. If you want to build tests in the sub-directory `<subdir>` you can execute:
Merging with upstream 2015-05-19 05:36:03 +01:00
README update 2016-08-03 12:38:54 +01:00			``` bash
			`make -C tests/<subdir> tests`
			```
homemade test recusrive target for old autotools versions 2016-11-04 22:32:25 +00:00			If you want to build all the tests at once just use `make tests`.
Update README.md 2015-03-07 07:20:12 +00:00
README update 2016-10-31 18:21:52 +00:00			`### Build configuration options`

			- `--prefix=<path>`: installation prefix for Grid.
			- `--with-gmp=<path>`: look for GMP in the UNIX prefix `<path>`
			- `--with-mpfr=<path>`: look for MPFR in the UNIX prefix `<path>`
			- `--with-fftw=<path>`: look for FFTW in the UNIX prefix `<path>`
			- `--enable-lapack[=<path>]`: enable LAPACK support in Lanczos eigensolver. A UNIX prefix containing the library can be specified (optional).
			- `--enable-mkl[=<path>]`: use Intel MKL for FFT (and LAPACK if enabled) routines. A UNIX prefix containing the library can be specified (optional).
README update 2016-11-19 12:17:35 +00:00			- `--enable-numa`: enable NUMA first touch optimisation
README update 2016-10-31 18:21:52 +00:00			- `--enable-simd=<code>`: setup Grid for the SIMD target `<code>` (default: `GEN`). A list of possible SIMD targets is detailed in a section below.
README update 2016-11-19 12:17:35 +00:00			- `--enable-gen-simd-width=<size>`: select the size (in bytes) of the generic SIMD vector type (default: 32 bytes).
README update 2016-10-31 18:21:52 +00:00			- `--enable-precision={single\|double}`: set the default precision (default: `double`).
			- `--enable-precision=<comm>`: Use `<comm>` for message passing (default: `none`). A list of possible SIMD targets is detailed in a section below.
README update 2017-05-06 18:39:54 +01:00			- `--enable-rng={sitmo\|ranlux48\|mt19937}`: choose the RNG (default: `sitmo `).
README update 2016-10-31 18:21:52 +00:00			- `--disable-timers`: disable system dependent high-resolution timers.
			- `--enable-chroma`: enable Chroma regression tests.
README update 2016-11-19 12:17:35 +00:00			- `--enable-doxygen-doc`: enable the Doxygen documentation generation (build with `make doxygen-doc`)
README update 2016-10-31 18:21:52 +00:00
			`### Possible communication interfaces`

README typo 2016-11-08 14:07:59 +00:00			The following options can be use with the `--enable-comms=` option to target different communication interfaces:
README update 2016-10-31 18:21:52 +00:00
README update 2016-11-03 13:48:20 +00:00			\| `<comm>` \| Description \|
			`\| -------------- \| ------------------------------------------------------------- \|`
			\| `none` \| no communications \|
			\| `mpi[-auto]` \| MPI communications \|
			\| `mpi3[-auto]` \| MPI communications using MPI 3 shared memory \|
			\| `shmem ` \| Cray SHMEM communications \|
README update 2016-10-31 18:21:52 +00:00
Added some details on the mpi flags for Cray machines 2016-11-26 18:30:53 +00:00			For the MPI interfaces the optional `-auto` suffix instructs the `configure` scripts to determine all the necessary compilation and linking flags. This is done by extracting the informations from the MPI wrapper specified in the environment variable `MPICXX` (if not specified `configure` will scan though a list of default names). The `-auto` suffix is not supported by the Cray environment wrapper scripts. Use the standard versions instead.
README update 2016-10-31 18:21:52 +00:00
README update 2016-08-03 12:38:54 +01:00			`### Possible SIMD types`
Update README.md 2015-03-07 07:20:12 +00:00
README update 2016-08-03 12:38:54 +01:00			The following options can be use with the `--enable-simd=` option to target different SIMD instruction sets:
Correcting some compilation errors for clang-sse 2016-02-10 02:37:03 +00:00
README update 2016-10-31 18:21:52 +00:00			\| `<code>` \| Description \|
README update 2016-08-03 12:38:54 +01:00			`\| ----------- \| -------------------------------------- \|`
			\| `GEN` \| generic portable vector code \|
			\| `SSE4` \| SSE 4.2 (128 bit) \|
			\| `AVX` \| AVX (256 bit) \|
README update 2016-10-31 18:21:52 +00:00			\| `AVXFMA` \| AVX (256 bit) + FMA \|
			\| `AVXFMA4` \| AVX (256 bit) + FMA4 \|
README update 2016-08-03 12:38:54 +01:00			\| `AVX2` \| AVX 2 (256 bit) \|
			\| `AVX512` \| AVX 512 bit \|
Update README.md 2017-06-29 11:48:11 +01:00			\| `NEONv8` \| [ARM NEON](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch07s03.html) (128 bit) \|
README file update 2017-06-29 11:42:25 +01:00			\| `QPX` \| IBM QPX (256 bit) \|
Correcting some compilation errors for clang-sse 2016-02-10 02:37:03 +00:00
README update 2016-08-03 12:38:54 +01:00			`Alternatively, some CPU codenames can be directly used:`
Correcting some compilation errors for clang-sse 2016-02-10 02:37:03 +00:00
README update 2016-10-31 18:21:52 +00:00			\| `<code>` \| Description \|
README update 2016-08-03 12:38:54 +01:00			`\| ----------- \| -------------------------------------- \|`
README update 2016-10-31 18:21:52 +00:00			\| `KNL` \| [Intel Xeon Phi codename Knights Landing](http://ark.intel.com/products/codename/48999/Knights-Landing) \|
			\| `BGQ` \| Blue Gene/Q \|

			`#### Notes:`
README update 2016-11-03 13:48:20 +00:00			`- We currently support AVX512 only for the Intel compiler. Support for GCC and clang will appear in future versions of Grid when the AVX512 support within GCC and clang will be more advanced.`
README update 2016-10-31 18:21:52 +00:00			`- For BG/Q only [bgclang](http://trac.alcf.anl.gov/projects/llvm-bgq) is supported. We do not presently plan to support more compilers for this platform.`
			`- BG/Q performances are currently rather poor. This is being investigated for future versions.`
README update 2016-11-19 12:17:35 +00:00			- The vector size for the `GEN` target can be specified with the `configure` script option `--enable-gen-simd-width`.
README update 2016-10-31 18:21:52 +00:00
			`### Build setup for Intel Knights Landing platform`

			`The following configuration is recommended for the Intel Knights Landing platform:`

			``` bash
			`../configure --enable-precision=double\`
			`--enable-simd=KNL \`
Update README.md 2017-07-01 09:48:00 +01:00			`--enable-comms=mpi-auto \`
README update 2016-10-31 18:21:52 +00:00			`--enable-mkl \`
			`CXX=icpc MPICXX=mpiicpc`
			```
Update README.md 2017-07-01 09:48:00 +01:00			`The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.`
README update 2016-10-31 18:21:52 +00:00
Update README.md 2017-07-01 09:48:00 +01:00			If you are working on a Cray machine that does not use the `mpiicpc` wrapper, please use:
README update 2016-10-31 18:21:52 +00:00
			``` bash
			`../configure --enable-precision=double\`
			`--enable-simd=KNL \`
README update 2016-11-03 13:48:20 +00:00			`--enable-comms=mpi \`
README update 2016-10-31 18:21:52 +00:00			`--enable-mkl \`
			`CXX=CC CC=cc`
Update README.md 2017-06-29 11:48:11 +01:00			```
Update README.md 2017-07-01 09:48:00 +01:00
			`If gmp and mpfr are NOT in standard places (/usr/) these flags may be needed:`
			``` --with-gmp=<path> \
			`--with-mpfr=<path> \`
			```
			where `<path>` is the UNIX prefix where GMP and MPFR are installed.

			`Knight's Landing with Intel Omnipath adapters with two adapters per node`
			`presently performs better with use of more than one rank per node, using shared memory`
			`for interior communication. This is the mpi3 communications implementation.`
			`We recommend four ranks per node for best performance, but optimum is local volume dependent.`

			``` bash
			`../configure --enable-precision=double\`
			`--enable-simd=KNL \`
			`--enable-comms=mpi3 \`
			`--enable-mkl \`
			`CXX=mpiicpc`
			```

			`### Build setup for Intel Haswell Xeon platform`

			`The following configuration is recommended for the Intel Knights Landing platform:`

			``` bash
			`../configure --enable-precision=double\`
			`--enable-simd=AVX2 \`
			`--enable-comms=mpi3 \`
			`--enable-mkl \`
			`CXX=mpiicpc`
			```
			`The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.`

			`If gmp and mpfr are NOT in standard places (/usr/) these flags may be needed:`
			``` --with-gmp=<path> \
			`--with-mpfr=<path> \`
			```
			where `<path>` is the UNIX prefix where GMP and MPFR are installed.

			If you are working on a Cray machine that does not use the `mpiicpc` wrapper, please use:

			``` bash
			`../configure --enable-precision=double\`
			`--enable-simd=AVX2 \`
			`--enable-comms=mpi3 \`
			`--enable-mkl \`
			`CXX=CC CC=cc`
			```
			`Since Dual socket nodes are commonplace, we recommend MPI-3 as the default with the use of`
			`one rank per socket. If using the Intel MPI library, threads should be pinned to NUMA domains using`
			```
			`export I_MPI_PIN=1`
			```
			`This is the default.`

			`### Build setup for Intel Skylake Xeon platform`

			`The following configuration is recommended for the Intel Knights Landing platform:`

			``` bash
			`../configure --enable-precision=double\`
			`--enable-simd=AVX512 \`
			`--enable-comms=mpi3 \`
			`--enable-mkl \`
			`CXX=mpiicpc`
			```
			`The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.`

			`If gmp and mpfr are NOT in standard places (/usr/) these flags may be needed:`
			``` --with-gmp=<path> \
			`--with-mpfr=<path> \`
			```
			where `<path>` is the UNIX prefix where GMP and MPFR are installed.

			If you are working on a Cray machine that does not use the `mpiicpc` wrapper, please use:

			``` bash
			`../configure --enable-precision=double\`
			`--enable-simd=AVX512 \`
			`--enable-comms=mpi3 \`
			`--enable-mkl \`
			`CXX=CC CC=cc`
			```
			`Since Dual socket nodes are commonplace, we recommend MPI-3 as the default with the use of`
			`one rank per socket. If using the Intel MPI library, threads should be pinned to NUMA domains using`
			```
			`export I_MPI_PIN=1`
			```
			`This is the default.`