fb45eb2eb2
5d ls vec rename of impl class
2016-07-14 23:57:26 +01:00
a307274c96
Fermion impl rename for ls vectorised 5d approaches
2016-07-14 23:56:13 +01:00
3f2c44a5fe
Updating the class to 5d selection based on impl type
2016-07-14 23:55:26 +01:00
48fb1cdc11
Update domain 5d vectorised impl type, move the type over to 4d redblack with
...
the dense OO inverse
2016-07-14 23:54:35 +01:00
8a79e93cc2
Rename the 5d domain wall fermion vectorised Ls impl class
2016-07-14 23:53:00 +01:00
adbc7c1188
Adding files for multiple implementations (cache opt) and Ls vectorisation
...
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.
The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.
This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.
Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00
ef97e32152
Adding persistent communicators
2016-07-08 17:16:08 +01:00
a0676beeb1
Open up dependency on Eigen and FFTW
2016-07-07 22:31:07 +01:00
680645f849
Merge branch 'release/v0.5.0'
2016-06-30 15:15:03 -07:00
712b9a3489
Asm only for avx512
2016-06-30 14:35:02 -07:00
bdaa5b1767
Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking.
2016-06-30 14:35:02 -07:00
8fcefc021a
Improved the prefetching when using cache blocking codes
2016-06-30 14:35:02 -07:00
05c884a62a
Prefetch change
2016-06-30 14:35:01 -07:00
2d8bb4c594
Tweaks
2016-06-30 14:35:01 -07:00
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
5e02392f9c
Fixed compilation error for benchmark_dwf
...
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
87418e7df1
Slightly faster prefetching perf.
2016-06-13 02:32:52 -07:00
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
d9408893b3
Prefetching in the normal kernel implementation.
2016-06-08 05:43:48 -07:00
8ac021de73
Added a test an fixed it for red black precon Ls innermost vectorised DWF
2016-06-07 13:16:56 -07:00
e503ef5590
Cleaned up
2016-06-07 00:11:36 +01:00
a7682b0060
Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS
2016-06-06 23:48:21 +01:00
53d06046b0
Compiling updates for KNL
2016-06-03 03:47:54 -07:00
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00
c698b16d75
function to generate Chroma-style gamma matrix products
2016-05-01 18:30:35 -07:00
5341977948
IMCI fixes. Thought I had committed these. The "real" disambiguation
...
between std::real and Grid::real shouldn't have been necessary and I don't
know why only the icpc v16.0 on babbage hits it.
May need a longer term rename of Grid::real or some careful EnableIf work.
2016-04-30 03:34:16 -07:00
f6c53e5039
Merge commit '1e554350acae0e67fa7177ed0db9d4f684a54af2'
2016-04-30 00:17:52 -07:00
6aa000176f
Fermion <-> Propagator functions
2016-04-30 00:14:33 -07:00
1e554350ac
The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug.
2016-04-29 16:49:18 -07:00
c79ea0dcef
Fixingn IMCI
2016-04-22 21:52:54 -07:00
8fd8bc25e9
simd 5th dim with rotation
2016-04-19 15:39:00 -07:00
ba427abde9
simd 5d
2016-04-19 15:38:39 -07:00
9b6ab6db16
simd in 5th dimension support
2016-04-19 15:38:01 -07:00
806a83d38b
simd in fifth dim support for dwf
2016-04-19 15:36:19 -07:00
b1192a8908
Benchmark_zmm added
2016-04-06 03:00:07 -07:00
e8dddb1596
Adding extra benchmark
2016-04-06 10:32:54 +01:00
e67fc2be18
Adding a trial for openmp overhead minimisation
2016-03-31 16:00:37 +01:00
8052556275
Cleaning up the single/double kernel implementation switch
2016-03-31 14:51:32 +01:00
60d965f79e
AVX512 improvements; sigfpe trapping too
2016-03-30 08:42:34 +01:00
1ecbf9794d
Merge branch 'master' of https://github.com/paboyle/Grid
2016-03-30 08:37:55 +01:00
c77b7ee897
AddSub based alternate SU3 routine
2016-03-28 17:55:22 -06:00
1e355a51e1
Interface change
2016-03-27 23:46:55 -07:00
21abaf7e91
Gamma sign change
2016-03-28 00:35:45 -06:00
165bffc2e7
Avx512 changes for assembler kernels
2016-03-26 22:25:45 -06:00
644fd6d32e
Build avx512 clean
2016-03-25 09:35:33 -07:00
60d4564151
ICC no compile fix
2016-03-16 02:30:40 -07:00
090e7aa930
Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
...
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
325e745daa
Merge branch 'master' of https://github.com/paboyle/Grid
2016-03-02 07:04:03 -08:00
61413565d0
Back off the inlined spin proj as not working
2016-03-02 07:03:09 -08:00
497e7e4c53
BG/Q compatibility fix
2016-02-23 15:57:38 +00:00