1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-04-19 02:01:02 +01:00
Commit Graph

437 Commits

Author SHA1 Message Date
Christopher Kelly 4774a3bcd2 Generalized HotConfiguration and functions it calls to accept gauge fields with precision other than the default. 2016-07-06 18:01:08 -04:00
paboyle 680645f849 Merge branch 'release/v0.5.0' 2016-06-30 15:15:03 -07:00
paboyle 712b9a3489 Asm only for avx512 2016-06-30 14:35:02 -07:00
paboyle bdaa5b1767 Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking. 2016-06-30 14:35:02 -07:00
paboyle 8fcefc021a Improved the prefetching when using cache blocking codes 2016-06-30 14:35:02 -07:00
paboyle 05c884a62a Prefetch change 2016-06-30 14:35:01 -07:00
paboyle 2d8bb4c594 Tweaks 2016-06-30 14:35:01 -07:00
paboyle 6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
Guido Cossu 5e02392f9c Fixed compilation error for benchmark_dwf
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
paboyle 87418e7df1 Slightly faster prefetching perf. 2016-06-13 02:32:52 -07:00
paboyle 55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
Azusa Yamaguchi d9408893b3 Prefetching in the normal kernel implementation. 2016-06-08 05:43:48 -07:00
paboyle 8ac021de73 Added a test an fixed it for red black precon Ls innermost vectorised DWF 2016-06-07 13:16:56 -07:00
paboyle e503ef5590 Cleaned up 2016-06-07 00:11:36 +01:00
paboyle a7682b0060 Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS 2016-06-06 23:48:21 +01:00
paboyle 53d06046b0 Compiling updates for KNL 2016-06-03 03:47:54 -07:00
paboyle 139cc5f1ae Large change with KNL preparation 2016-06-03 03:24:26 -07:00
portelli c698b16d75 function to generate Chroma-style gamma matrix products 2016-05-01 18:30:35 -07:00
paboyle 5341977948 IMCI fixes. Thought I had committed these. The "real" disambiguation
between std::real and Grid::real shouldn't have been necessary and I don't
know why only the icpc v16.0 on babbage hits it.
May need a longer term rename of Grid::real or some careful EnableIf work.
2016-04-30 03:34:16 -07:00
portelli f6c53e5039 Merge commit '1e554350acae0e67fa7177ed0db9d4f684a54af2' 2016-04-30 00:17:52 -07:00
portelli 6aa000176f Fermion <-> Propagator functions 2016-04-30 00:14:33 -07:00
paboyle 1e554350ac The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug. 2016-04-29 16:49:18 -07:00
paboyle c79ea0dcef Fixingn IMCI 2016-04-22 21:52:54 -07:00
paboyle 8fd8bc25e9 simd 5th dim with rotation 2016-04-19 15:39:00 -07:00
paboyle ba427abde9 simd 5d 2016-04-19 15:38:39 -07:00
paboyle 9b6ab6db16 simd in 5th dimension support 2016-04-19 15:38:01 -07:00
paboyle 806a83d38b simd in fifth dim support for dwf 2016-04-19 15:36:19 -07:00
paboyle b1192a8908 Benchmark_zmm added 2016-04-06 03:00:07 -07:00
paboyle e8dddb1596 Adding extra benchmark 2016-04-06 10:32:54 +01:00
paboyle e67fc2be18 Adding a trial for openmp overhead minimisation 2016-03-31 16:00:37 +01:00
paboyle 8052556275 Cleaning up the single/double kernel implementation switch 2016-03-31 14:51:32 +01:00
paboyle 60d965f79e AVX512 improvements; sigfpe trapping too 2016-03-30 08:42:34 +01:00
paboyle 1ecbf9794d Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-30 08:37:55 +01:00
paboyle c77b7ee897 AddSub based alternate SU3 routine 2016-03-28 17:55:22 -06:00
paboyle 1e355a51e1 Interface change 2016-03-27 23:46:55 -07:00
paboyle 21abaf7e91 Gamma sign change 2016-03-28 00:35:45 -06:00
paboyle 165bffc2e7 Avx512 changes for assembler kernels 2016-03-26 22:25:45 -06:00
paboyle 644fd6d32e Build avx512 clean 2016-03-25 09:35:33 -07:00
paboyle 60d4564151 ICC no compile fix 2016-03-16 02:30:40 -07:00
paboyle 090e7aa930 Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
paboyle 325e745daa Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-02 07:04:03 -08:00
paboyle 61413565d0 Back off the inlined spin proj as not working 2016-03-02 07:03:09 -08:00
Antonin Portelli 497e7e4c53 BG/Q compatibility fix 2016-02-23 15:57:38 +00:00
Peter Boyle 6aeaf6f568 Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
turned up problems on the BlueWaters Cray.

Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00
Peter Boyle 40f2db9bc0 Disable metropolis step until 10 traj covered. Should move to exposing these
in XML input and start having "applications" directory.
2016-02-21 08:01:44 -06:00
Jung 9f0d9ade68 Added configure flag for LAPACK. Tested ImplicitlyRestartedLanczos::calc()
Checking in before cleaning up
2016-02-20 02:50:32 -05:00
paboyle 3425751cb8 Missing return value 2016-02-19 01:06:03 +00:00
Peter Boyle 22422a84d9 Small problem in compressor fix 2016-02-17 19:03:09 -06:00
Peter Boyle c9fadf97a5 Simplify the compressor interface again. 2016-02-17 18:16:45 -06:00
Peter Boyle 81395e85d1 Regressing to not overlap comms and compute becasue bluewaters, edison, and cori are so rubbish at it. 2016-02-16 13:56:44 -06:00