1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-05-21 01:24:16 +01:00
Commit Graph

67 Commits

Author SHA1 Message Date
Peter Boyle 3d21297bbb Call the fast path compressor for wilson kernels to avoid if else on projector 2016-12-27 11:23:13 +00:00
Peter Boyle 25efefc5b4 Back to original thread policy post test 2016-12-23 09:49:04 +00:00
Peter Boyle b8cdb3e90a Debug hack; raises from 62GF/s to 72 GF/s per node on BG/Q 2016-12-22 17:50:14 +00:00
Peter Boyle fb8d4b2357 Lots of debug on performance Mobius 2016-12-08 17:28:28 +00:00
Guido Cossu ae9688e343 Reporting also the total mflops 2016-11-28 11:37:02 +00:00
portelli ca21003f01 Merge branch 'feature/fft-opt' into feature/feynman-rules
# Conflicts:
#	lib/FFT.h
#	lib/qcd/action/fermion/WilsonFermion5D.h
#	tests/core/Test_fft.cc
2016-10-26 18:44:47 +01:00
azusayamaguchi c190221fd3 Internal SHM comms in non-simd directions working
Need to fix simd directions
2016-10-22 18:14:27 +01:00
azusayamaguchi 6a9eae6b6b Reporting improvements 2016-10-21 13:36:18 +01:00
portelli 997fd882ff Merge branch 'develop' into feature/feynman-rules
# Conflicts:
#	lib/Threads.h
#	lib/qcd/action/fermion/WilsonFermion.cc
#	lib/qcd/action/fermion/WilsonFermion.h
#	lib/qcd/utils/SUn.h
#	lib/simd/Grid_avx.h
#	lib/simd/Intel512common.h
2016-10-19 18:35:18 +01:00
azusayamaguchi 81f2aeaece KNL streaming stores, and KNL performance coutners 2016-10-12 11:45:22 +01:00
paboyle 96f1d1b828 Debugged Domain wall and Overlap feynman rules (infinite Ls, finite mass). 2016-10-10 23:46:45 +01:00
Guido Cossu 2e453dfbf5 Added some instrumentation to benchmark the force computation 2016-10-06 17:52:45 +01:00
paboyle 4089984431 Timing hooks 2016-10-06 09:25:12 +01:00
paboyle b6713ecb60 Momentum space rules for Overlap, DWF untested to date 2016-09-26 09:39:09 +01:00
paboyle 48fb1cdc11 Update domain 5d vectorised impl type, move the type over to 4d redblack with
the dense OO inverse
2016-07-14 23:54:35 +01:00
paboyle 6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
paboyle 55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
paboyle 8ac021de73 Added a test an fixed it for red black precon Ls innermost vectorised DWF 2016-06-07 13:16:56 -07:00
paboyle 53d06046b0 Compiling updates for KNL 2016-06-03 03:47:54 -07:00
paboyle 139cc5f1ae Large change with KNL preparation 2016-06-03 03:24:26 -07:00
paboyle 1e554350ac The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug. 2016-04-29 16:49:18 -07:00
paboyle 9b6ab6db16 simd in 5th dimension support 2016-04-19 15:38:01 -07:00
paboyle e8dddb1596 Adding extra benchmark 2016-04-06 10:32:54 +01:00
paboyle e67fc2be18 Adding a trial for openmp overhead minimisation 2016-03-31 16:00:37 +01:00
paboyle 165bffc2e7 Avx512 changes for assembler kernels 2016-03-26 22:25:45 -06:00
paboyle 090e7aa930 Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
paboyle 61413565d0 Back off the inlined spin proj as not working 2016-03-02 07:03:09 -08:00
Peter Boyle 6aeaf6f568 Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
turned up problems on the BlueWaters Cray.

Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00
Peter Boyle 22422a84d9 Small problem in compressor fix 2016-02-17 19:03:09 -06:00
Peter Boyle c9fadf97a5 Simplify the compressor interface again. 2016-02-17 18:16:45 -06:00
Peter Boyle 81395e85d1 Regressing to not overlap comms and compute becasue bluewaters, edison, and cori are so rubbish at it. 2016-02-16 13:56:44 -06:00
Peter Boyle a0fc47c6f9 Cheaper implementation 2016-02-15 16:02:36 -06:00
paboyle e2f73e3ead Updates for shmem 2016-02-10 16:50:32 -08:00
Jung 411ac49dd7 GparityWilsonTM typedef added. Not yet tested
Conflicts:
	configure
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-25 01:36:28 -05:00
Jung 5c57d4f403 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-11 11:36:45 -05:00
paboyle fc6ad65751 Pushed the overlap comms tweaks 2016-01-11 06:34:22 -08:00
paboyle dafc74020c Overlap comms compute improvements in hand op kernels, and better timing from Edison and Cori 2016-01-10 16:54:27 -08:00
paboyle d19321dfde Overlap comms compute changes 2016-01-10 19:20:16 +00:00
Jung 5924e5a562 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	configure
	lib/qcd/action/Actions.h
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-06 03:44:57 -05:00
paboyle c99d748da6 Timing reports in benchmarks now reflect the asynch comms thread statistics 2016-01-04 14:42:16 +00:00
paboyle 02452afd36 Optional overlap of comms with compute 2016-01-04 14:18:40 +00:00
paboyle 331768dcff Added overlap comms compute mode 2016-01-03 01:38:11 +00:00
paboyle aae8bf31a7 Global edit adding copyright and license info to every source file. 2016-01-02 14:51:32 +00:00
paboyle 34a0fde2ad Fixes to fermion force terms after sign of gamma_mu (0...3) change.
Thought I had already committed these.

Believe I have got the Gparity fermion force working.

* tests/Test_gpdwf_force.cc     -- correctly predicts dS for two flavour pseudofermion
                                   based on a small dt update of U field.

* tests/Test_hmc_EODWFRatio_Gparity.cc -- ran 1 trajectory on 8^4 with dH=0.21.

Need to accumulate a full plaquette log to believe fully which will take some hours of run time.
2015-12-15 23:14:12 +00:00
Jung f2b4edc090 Fixes for Gparity comparison with CPS (Instantiation, Gamma matrix convention) 2015-12-07 02:04:57 -05:00
paboyle b2c02a6106 Runs fastst on cori 2015-11-28 16:58:16 -08:00
paboyle e9ff25b06b Small threading change makes a difference on Cori. 2015-11-07 00:07:05 -08:00
paboyle 899ca41cb8 Merge branch 'master' of github.com:paboyle/Grid
Conflicts:
	lib/qcd/action/fermion/WilsonFermion5D.cc
2015-11-06 03:50:04 -08:00
paboyle a2ff068e29 Asm and threading for many core 2015-11-06 03:47:14 -08:00
Peter Boyle 28022755ae Stencil class name global change to StencilImpl typedef 2015-11-06 05:30:17 -06:00