5c0c8efb9e
Updated file list
2016-07-15 00:02:11 +01:00
dfd714e1ef
Multiple implementations for the 5d hopping terms, depending on cache friendly
...
ops and/or the 5th direction being vectorised
All use 4d redblack.
2016-07-15 00:00:09 +01:00
79a8ca1a62
Rewrite for performance. Impl dependent instantiations give
...
4d linalg impls of the 5d hopping terms (and inverse)
Cache friendly loop orderings of the above
Dense matrix stored and apply to the above
-- Switch to Ls vectorised, and use dense matrix approach for the MooeeInv
and rotate/shift of the Mooee M5D routines.
2016-07-14 23:58:15 +01:00
fb45eb2eb2
5d ls vec rename of impl class
2016-07-14 23:57:26 +01:00
a307274c96
Fermion impl rename for ls vectorised 5d approaches
2016-07-14 23:56:13 +01:00
3f2c44a5fe
Updating the class to 5d selection based on impl type
2016-07-14 23:55:26 +01:00
48fb1cdc11
Update domain 5d vectorised impl type, move the type over to 4d redblack with
...
the dense OO inverse
2016-07-14 23:54:35 +01:00
8a79e93cc2
Rename the 5d domain wall fermion vectorised Ls impl class
2016-07-14 23:53:00 +01:00
dd62a61c5c
Added broadcast and rotation of simd vectors
2016-07-14 23:49:00 +01:00
8f47d0b5ab
Rotation needed for hopping term in fifth dim with Ls vectorised fields
2016-07-14 23:45:36 +01:00
42af132dab
Fix for chris kellys request to peek poke on checkerboarded fields
2016-07-14 23:44:48 +01:00
adbc7c1188
Adding files for multiple implementations (cache opt) and Ls vectorisation
...
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.
The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.
This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.
Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00
62601bb649
Bug fix
2016-07-08 20:46:29 +01:00
ef97e32152
Adding persistent communicators
2016-07-08 17:16:08 +01:00
a0676beeb1
Open up dependency on Eigen and FFTW
2016-07-07 22:31:07 +01:00
fc4a043663
Colors and banner clean up
2016-07-02 16:15:38 +01:00
680645f849
Merge branch 'release/v0.5.0'
2016-06-30 15:15:03 -07:00
712b9a3489
Asm only for avx512
2016-06-30 14:35:02 -07:00
bdaa5b1767
Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking.
2016-06-30 14:35:02 -07:00
8fcefc021a
Improved the prefetching when using cache blocking codes
2016-06-30 14:35:02 -07:00
1445189361
COntrol the prefetch strategy
2016-06-30 14:35:02 -07:00
05c884a62a
Prefetch change
2016-06-30 14:35:01 -07:00
a25bec87d9
Prefetch during save
2016-06-30 14:35:01 -07:00
2d8bb4c594
Tweaks
2016-06-30 14:35:01 -07:00
51cb2d4328
update file lists
2016-06-30 14:35:01 -07:00
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
5e02392f9c
Fixed compilation error for benchmark_dwf
...
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
86187d7cca
Removed write to stdout in constructor for MPI CartesianCommunicator
2016-06-14 15:34:20 +01:00
87418e7df1
Slightly faster prefetching perf.
2016-06-13 02:32:52 -07:00
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
d9408893b3
Prefetching in the normal kernel implementation.
2016-06-08 05:43:48 -07:00
8ac021de73
Added a test an fixed it for red black precon Ls innermost vectorised DWF
2016-06-07 13:16:56 -07:00
e503ef5590
Cleaned up
2016-06-07 00:11:36 +01:00
a7682b0060
Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS
2016-06-06 23:48:21 +01:00
d4c9d71fc8
Merge branch 'master' of https://github.com/paboyle/Grid
2016-06-06 07:06:54 -07:00
786ca52c43
Problems remain in the red black preconditioning of the Ls vectorisation
2016-06-06 07:05:51 -07:00
f78d89bcbe
Update Lebesgue.cc
...
kill verbose
2016-06-03 13:33:42 +01:00
53d06046b0
Compiling updates for KNL
2016-06-03 03:47:54 -07:00
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00
1c0e922585
Merge pull request #35 from aportelli/master
...
empty SIMD fix
2016-05-27 16:49:13 +01:00
9d5f693cbe
empty SIMD fix
2016-05-24 10:56:27 +01:00
5c90c3b457
Merge pull request #34 from aportelli/master
...
Polymorphic lattices & various small updates
2016-05-24 10:50:04 +01:00
91e04056f9
fix of the empty SIMD
2016-05-12 19:24:10 +01:00
3789e3f31c
additional fixed in slice functions
2016-05-12 18:35:38 +01:00
0c66719210
const fix in slice functions
2016-05-12 13:01:35 +01:00
3a5b5c8bec
Save an old tar of tree
2016-05-12 03:20:17 -07:00
4bc21ec7cb
thread CL argument fix
2016-05-11 15:21:29 +01:00
e3083b6dfc
Merge commit 'ab894186589224d570e0ecef8eea06443194a8ab'
2016-05-11 15:20:41 +01:00
ab89418658
Precision change going in; useful for mixed precision algorithms for example.
2016-05-11 15:18:47 +01:00
28cd99882c
Subslicing
2016-05-11 15:06:54 +01:00