1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-04-19 02:01:02 +01:00
Commit Graph

269 Commits

Author SHA1 Message Date
paboyle fb45eb2eb2 5d ls vec rename of impl class 2016-07-14 23:57:26 +01:00
paboyle a307274c96 Fermion impl rename for ls vectorised 5d approaches 2016-07-14 23:56:13 +01:00
paboyle 3f2c44a5fe Updating the class to 5d selection based on impl type 2016-07-14 23:55:26 +01:00
paboyle 48fb1cdc11 Update domain 5d vectorised impl type, move the type over to 4d redblack with
the dense OO inverse
2016-07-14 23:54:35 +01:00
paboyle 8a79e93cc2 Rename the 5d domain wall fermion vectorised Ls impl class 2016-07-14 23:53:00 +01:00
paboyle adbc7c1188 Adding files for multiple implementations (cache opt) and Ls vectorisation
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.

The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.

This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.

Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00
paboyle ef97e32152 Adding persistent communicators 2016-07-08 17:16:08 +01:00
paboyle a0676beeb1 Open up dependency on Eigen and FFTW 2016-07-07 22:31:07 +01:00
paboyle 680645f849 Merge branch 'release/v0.5.0' 2016-06-30 15:15:03 -07:00
paboyle 712b9a3489 Asm only for avx512 2016-06-30 14:35:02 -07:00
paboyle bdaa5b1767 Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking. 2016-06-30 14:35:02 -07:00
paboyle 8fcefc021a Improved the prefetching when using cache blocking codes 2016-06-30 14:35:02 -07:00
paboyle 05c884a62a Prefetch change 2016-06-30 14:35:01 -07:00
paboyle 2d8bb4c594 Tweaks 2016-06-30 14:35:01 -07:00
paboyle 6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
Guido Cossu 5e02392f9c Fixed compilation error for benchmark_dwf
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
paboyle 87418e7df1 Slightly faster prefetching perf. 2016-06-13 02:32:52 -07:00
paboyle 55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
Azusa Yamaguchi d9408893b3 Prefetching in the normal kernel implementation. 2016-06-08 05:43:48 -07:00
paboyle 8ac021de73 Added a test an fixed it for red black precon Ls innermost vectorised DWF 2016-06-07 13:16:56 -07:00
paboyle e503ef5590 Cleaned up 2016-06-07 00:11:36 +01:00
paboyle a7682b0060 Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS 2016-06-06 23:48:21 +01:00
paboyle 53d06046b0 Compiling updates for KNL 2016-06-03 03:47:54 -07:00
paboyle 139cc5f1ae Large change with KNL preparation 2016-06-03 03:24:26 -07:00
paboyle 5341977948 IMCI fixes. Thought I had committed these. The "real" disambiguation
between std::real and Grid::real shouldn't have been necessary and I don't
know why only the icpc v16.0 on babbage hits it.
May need a longer term rename of Grid::real or some careful EnableIf work.
2016-04-30 03:34:16 -07:00
paboyle 1e554350ac The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug. 2016-04-29 16:49:18 -07:00
paboyle c79ea0dcef Fixingn IMCI 2016-04-22 21:52:54 -07:00
paboyle ba427abde9 simd 5d 2016-04-19 15:38:39 -07:00
paboyle 9b6ab6db16 simd in 5th dimension support 2016-04-19 15:38:01 -07:00
paboyle 806a83d38b simd in fifth dim support for dwf 2016-04-19 15:36:19 -07:00
paboyle b1192a8908 Benchmark_zmm added 2016-04-06 03:00:07 -07:00
paboyle e8dddb1596 Adding extra benchmark 2016-04-06 10:32:54 +01:00
paboyle e67fc2be18 Adding a trial for openmp overhead minimisation 2016-03-31 16:00:37 +01:00
paboyle 8052556275 Cleaning up the single/double kernel implementation switch 2016-03-31 14:51:32 +01:00
paboyle 60d965f79e AVX512 improvements; sigfpe trapping too 2016-03-30 08:42:34 +01:00
paboyle 1ecbf9794d Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-30 08:37:55 +01:00
paboyle c77b7ee897 AddSub based alternate SU3 routine 2016-03-28 17:55:22 -06:00
paboyle 1e355a51e1 Interface change 2016-03-27 23:46:55 -07:00
paboyle 21abaf7e91 Gamma sign change 2016-03-28 00:35:45 -06:00
paboyle 165bffc2e7 Avx512 changes for assembler kernels 2016-03-26 22:25:45 -06:00
paboyle 644fd6d32e Build avx512 clean 2016-03-25 09:35:33 -07:00
paboyle 090e7aa930 Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
paboyle 325e745daa Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-02 07:04:03 -08:00
paboyle 61413565d0 Back off the inlined spin proj as not working 2016-03-02 07:03:09 -08:00
Antonin Portelli 497e7e4c53 BG/Q compatibility fix 2016-02-23 15:57:38 +00:00
Peter Boyle 6aeaf6f568 Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
turned up problems on the BlueWaters Cray.

Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00
Jung 9f0d9ade68 Added configure flag for LAPACK. Tested ImplicitlyRestartedLanczos::calc()
Checking in before cleaning up
2016-02-20 02:50:32 -05:00
paboyle 3425751cb8 Missing return value 2016-02-19 01:06:03 +00:00
Peter Boyle 22422a84d9 Small problem in compressor fix 2016-02-17 19:03:09 -06:00
Peter Boyle c9fadf97a5 Simplify the compressor interface again. 2016-02-17 18:16:45 -06:00