1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-10 19:36:56 +01:00
Commit Graph

443 Commits

Author SHA1 Message Date
a307274c96 Fermion impl rename for ls vectorised 5d approaches 2016-07-14 23:56:13 +01:00
3f2c44a5fe Updating the class to 5d selection based on impl type 2016-07-14 23:55:26 +01:00
48fb1cdc11 Update domain 5d vectorised impl type, move the type over to 4d redblack with
the dense OO inverse
2016-07-14 23:54:35 +01:00
8a79e93cc2 Rename the 5d domain wall fermion vectorised Ls impl class 2016-07-14 23:53:00 +01:00
adbc7c1188 Adding files for multiple implementations (cache opt) and Ls vectorisation
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.

The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.

This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.

Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00
ef97e32152 Adding persistent communicators 2016-07-08 17:16:08 +01:00
a0676beeb1 Open up dependency on Eigen and FFTW 2016-07-07 22:31:07 +01:00
680645f849 Merge branch 'release/v0.5.0' 2016-06-30 15:15:03 -07:00
712b9a3489 Asm only for avx512 2016-06-30 14:35:02 -07:00
bdaa5b1767 Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking. 2016-06-30 14:35:02 -07:00
8fcefc021a Improved the prefetching when using cache blocking codes 2016-06-30 14:35:02 -07:00
05c884a62a Prefetch change 2016-06-30 14:35:01 -07:00
2d8bb4c594 Tweaks 2016-06-30 14:35:01 -07:00
6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
5e02392f9c Fixed compilation error for benchmark_dwf
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
87418e7df1 Slightly faster prefetching perf. 2016-06-13 02:32:52 -07:00
55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
d9408893b3 Prefetching in the normal kernel implementation. 2016-06-08 05:43:48 -07:00
8ac021de73 Added a test an fixed it for red black precon Ls innermost vectorised DWF 2016-06-07 13:16:56 -07:00
e503ef5590 Cleaned up 2016-06-07 00:11:36 +01:00
a7682b0060 Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS 2016-06-06 23:48:21 +01:00
53d06046b0 Compiling updates for KNL 2016-06-03 03:47:54 -07:00
139cc5f1ae Large change with KNL preparation 2016-06-03 03:24:26 -07:00
c698b16d75 function to generate Chroma-style gamma matrix products 2016-05-01 18:30:35 -07:00
5341977948 IMCI fixes. Thought I had committed these. The "real" disambiguation
between std::real and Grid::real shouldn't have been necessary and I don't
know why only the icpc v16.0 on babbage hits it.
May need a longer term rename of Grid::real or some careful EnableIf work.
2016-04-30 03:34:16 -07:00
f6c53e5039 Merge commit '1e554350acae0e67fa7177ed0db9d4f684a54af2' 2016-04-30 00:17:52 -07:00
6aa000176f Fermion <-> Propagator functions 2016-04-30 00:14:33 -07:00
1e554350ac The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug. 2016-04-29 16:49:18 -07:00
c79ea0dcef Fixingn IMCI 2016-04-22 21:52:54 -07:00
8fd8bc25e9 simd 5th dim with rotation 2016-04-19 15:39:00 -07:00
ba427abde9 simd 5d 2016-04-19 15:38:39 -07:00
9b6ab6db16 simd in 5th dimension support 2016-04-19 15:38:01 -07:00
806a83d38b simd in fifth dim support for dwf 2016-04-19 15:36:19 -07:00
b1192a8908 Benchmark_zmm added 2016-04-06 03:00:07 -07:00
e8dddb1596 Adding extra benchmark 2016-04-06 10:32:54 +01:00
e67fc2be18 Adding a trial for openmp overhead minimisation 2016-03-31 16:00:37 +01:00
8052556275 Cleaning up the single/double kernel implementation switch 2016-03-31 14:51:32 +01:00
60d965f79e AVX512 improvements; sigfpe trapping too 2016-03-30 08:42:34 +01:00
1ecbf9794d Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-30 08:37:55 +01:00
c77b7ee897 AddSub based alternate SU3 routine 2016-03-28 17:55:22 -06:00
1e355a51e1 Interface change 2016-03-27 23:46:55 -07:00
21abaf7e91 Gamma sign change 2016-03-28 00:35:45 -06:00
165bffc2e7 Avx512 changes for assembler kernels 2016-03-26 22:25:45 -06:00
644fd6d32e Build avx512 clean 2016-03-25 09:35:33 -07:00
60d4564151 ICC no compile fix 2016-03-16 02:30:40 -07:00
090e7aa930 Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
325e745daa Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-02 07:04:03 -08:00
61413565d0 Back off the inlined spin proj as not working 2016-03-02 07:03:09 -08:00
497e7e4c53 BG/Q compatibility fix 2016-02-23 15:57:38 +00:00
6aeaf6f568 Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
turned up problems on the BlueWaters Cray.

Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00