1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-04-20 02:31:01 +01:00
Commit Graph

253 Commits

Author SHA1 Message Date
paboyle 87418e7df1 Slightly faster prefetching perf. 2016-06-13 02:32:52 -07:00
paboyle 55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
Azusa Yamaguchi d9408893b3 Prefetching in the normal kernel implementation. 2016-06-08 05:43:48 -07:00
paboyle 8ac021de73 Added a test an fixed it for red black precon Ls innermost vectorised DWF 2016-06-07 13:16:56 -07:00
paboyle e503ef5590 Cleaned up 2016-06-07 00:11:36 +01:00
paboyle a7682b0060 Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS 2016-06-06 23:48:21 +01:00
paboyle 53d06046b0 Compiling updates for KNL 2016-06-03 03:47:54 -07:00
paboyle 139cc5f1ae Large change with KNL preparation 2016-06-03 03:24:26 -07:00
paboyle 5341977948 IMCI fixes. Thought I had committed these. The "real" disambiguation
between std::real and Grid::real shouldn't have been necessary and I don't
know why only the icpc v16.0 on babbage hits it.
May need a longer term rename of Grid::real or some careful EnableIf work.
2016-04-30 03:34:16 -07:00
paboyle 1e554350ac The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug. 2016-04-29 16:49:18 -07:00
paboyle c79ea0dcef Fixingn IMCI 2016-04-22 21:52:54 -07:00
paboyle ba427abde9 simd 5d 2016-04-19 15:38:39 -07:00
paboyle 9b6ab6db16 simd in 5th dimension support 2016-04-19 15:38:01 -07:00
paboyle 806a83d38b simd in fifth dim support for dwf 2016-04-19 15:36:19 -07:00
paboyle b1192a8908 Benchmark_zmm added 2016-04-06 03:00:07 -07:00
paboyle e8dddb1596 Adding extra benchmark 2016-04-06 10:32:54 +01:00
paboyle e67fc2be18 Adding a trial for openmp overhead minimisation 2016-03-31 16:00:37 +01:00
paboyle 8052556275 Cleaning up the single/double kernel implementation switch 2016-03-31 14:51:32 +01:00
paboyle 60d965f79e AVX512 improvements; sigfpe trapping too 2016-03-30 08:42:34 +01:00
paboyle 1ecbf9794d Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-30 08:37:55 +01:00
paboyle c77b7ee897 AddSub based alternate SU3 routine 2016-03-28 17:55:22 -06:00
paboyle 1e355a51e1 Interface change 2016-03-27 23:46:55 -07:00
paboyle 21abaf7e91 Gamma sign change 2016-03-28 00:35:45 -06:00
paboyle 165bffc2e7 Avx512 changes for assembler kernels 2016-03-26 22:25:45 -06:00
paboyle 644fd6d32e Build avx512 clean 2016-03-25 09:35:33 -07:00
paboyle 090e7aa930 Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
paboyle 325e745daa Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-02 07:04:03 -08:00
paboyle 61413565d0 Back off the inlined spin proj as not working 2016-03-02 07:03:09 -08:00
Antonin Portelli 497e7e4c53 BG/Q compatibility fix 2016-02-23 15:57:38 +00:00
Peter Boyle 6aeaf6f568 Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
turned up problems on the BlueWaters Cray.

Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00
Jung 9f0d9ade68 Added configure flag for LAPACK. Tested ImplicitlyRestartedLanczos::calc()
Checking in before cleaning up
2016-02-20 02:50:32 -05:00
paboyle 3425751cb8 Missing return value 2016-02-19 01:06:03 +00:00
Peter Boyle 22422a84d9 Small problem in compressor fix 2016-02-17 19:03:09 -06:00
Peter Boyle c9fadf97a5 Simplify the compressor interface again. 2016-02-17 18:16:45 -06:00
Peter Boyle 81395e85d1 Regressing to not overlap comms and compute becasue bluewaters, edison, and cori are so rubbish at it. 2016-02-16 13:56:44 -06:00
Peter Boyle a0fc47c6f9 Cheaper implementation 2016-02-15 16:02:36 -06:00
paboyle e2f73e3ead Updates for shmem 2016-02-10 16:50:32 -08:00
neo 6371676a75 Correcting some compilation errors for clang-sse 2016-02-10 11:37:03 +09:00
Jung bd84c23298 definitions reconciled. 2016-01-25 16:30:59 -05:00
Jung 7aa8d5e8af Faiing to compile, comparing with master 2016-01-25 16:03:02 -05:00
Jung 6012b0ec23 Checking in changes before changing to chulwoo-dec12-2015 2016-01-25 09:40:58 -05:00
Jung 411ac49dd7 GparityWilsonTM typedef added. Not yet tested
Conflicts:
	configure
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-25 01:36:28 -05:00
Jung 5c57d4f403 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-11 11:36:45 -05:00
paboyle fc6ad65751 Pushed the overlap comms tweaks 2016-01-11 06:34:22 -08:00
paboyle dafc74020c Overlap comms compute improvements in hand op kernels, and better timing from Edison and Cori 2016-01-10 16:54:27 -08:00
paboyle d19321dfde Overlap comms compute changes 2016-01-10 19:20:16 +00:00
Jung 5924e5a562 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	configure
	lib/qcd/action/Actions.h
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-06 03:44:57 -05:00
paboyle c99d748da6 Timing reports in benchmarks now reflect the asynch comms thread statistics 2016-01-04 14:42:16 +00:00
paboyle 02452afd36 Optional overlap of comms with compute 2016-01-04 14:18:40 +00:00
paboyle 331768dcff Added overlap comms compute mode 2016-01-03 01:38:11 +00:00