1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-16 23:07:05 +01:00

Adding Dslash flags explanation

This commit is contained in:
Guido Cossu
2016-11-04 19:20:14 +00:00
parent 9fdb209b7e
commit ff057145b4

View File

@ -14,6 +14,8 @@ sidebar:
These are few suggestions in order to get the best performance on the Intel Knights Landing (KNL). These are few suggestions in order to get the best performance on the Intel Knights Landing (KNL).
### Bind the memory allocation to the MCDRAM NUMA node ### Bind the memory allocation to the MCDRAM NUMA node
The KNL has two memory systems, the DDR4 (~90 GFlops/s) and the High Bandwidth Memory (MCDRAM, ~400 Gflops/s). The KNL has two memory systems, the DDR4 (~90 GFlops/s) and the High Bandwidth Memory (MCDRAM, ~400 Gflops/s).
@ -58,6 +60,20 @@ A typical setting for the best performance on a single node is to use **62 cores
export KMP_HW_SUBSETS=62c,1t export KMP_HW_SUBSETS=62c,1t
``` ```
### Using the optimised Wilson Dslash kernels
Beside the generic implementation using stencils, GRID has optimised version of the Dslash kernels (for Wilson and DWF fermions).
Flags at runtime can be used for the optimised paths
| Flag | Description |
| ----------- | -------------------------------------- |
| `--dslash-generic` | This is the default option and used the implementation with stencils |
| `--dslash-unroll` | This explicitly unroll the colour loops. It is tied to `Nc=3` |
| `--dslash-asm` | This is specific for AVX512-F architectures and `Nc=3` |
The information included in this page has been updated on *November 2016* and it is valid for the release version 0.6.0. The information included in this page has been updated on *November 2016* and it is valid for the release version 0.6.0.
{: .notice} {: .notice}