1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-19 16:27:05 +01:00
Commit Graph

89 Commits

Author SHA1 Message Date
99220f6531 Fixes and better timing 2017-04-26 17:24:11 -04:00
fd1eb7de13 Clean implementation of the exterior faces listing only those points on the boudary 2017-04-26 02:34:52 -04:00
ab66bac4e6 Think I'm getting on top of the reduced cost exterior precomputed list of links 2017-04-25 08:50:26 +01:00
56277a11c8 Build a list of whats on the surface 2017-04-24 17:06:15 +01:00
e3d0e31525 Debugged assemply split phase with interior suppression 2017-04-23 19:29:27 -04:00
b722889234 Try a better load balancing loop 2017-04-22 19:27:41 +01:00
736bf3c866 Major rework of stencil. Half precision and MPI3 now working. 2017-04-22 11:33:50 +01:00
fc4ab9ccd5 Working half precision comms 2017-04-20 11:20:26 +01:00
4a340aa5ca Massive compressor rework to support reduced precision comms 2017-04-20 09:28:27 +01:00
4b17e8eba8 Merge branch 'develop' into feature/bgq-asm
Conflicts:
	lib/qcd/action/fermion/Fermion.h
	lib/qcd/action/fermion/WilsonFermion.cc
	lib/util/Init.cc
	tests/Test_cayley_even_odd_vec.cc
2017-03-28 04:49:30 -04:00
18bde08d1b Merge branch 'feature/staggering' into develop 2017-03-28 15:25:55 +09:00
af230a1fb8 Average the time across the whole machine for outliers 2017-02-28 17:05:22 -05:00
e099dcdae7 Merge branch 'develop' into feature/bgq-asm 2017-02-23 00:25:29 +00:00
4e7ab3166f Refactoring header layout 2017-02-22 18:09:33 +00:00
3ae92fa2e6 Global changes to parallel_for structure.
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
2c246551d0 Overlap comms and compute options in wilson kernels 2017-02-07 01:37:10 -05:00
a0cfbb6e88 Merge branch 'feature/gammas' into feature/hadrons
# Conflicts:
#	.gitignore
#	lib/qcd/spin/Dirac.cc
#	scripts/filelist
2017-01-30 09:10:49 -08:00
fad743fbb1 Build system sanity check: corrected several headers not in the <Grid/*> format 2017-01-26 17:00:41 -08:00
a37e71f362 New automatic implementation of gamma matrices, Meson and SeqGamma are broken 2017-01-23 19:13:43 -08:00
03c81bd902 Merge branch 'feature/bgq-asm' of https://github.com/paboyle/Grid into feature/bgq-asm 2016-12-27 11:25:35 +00:00
a869addef1 Stats switch off 2016-12-27 11:25:22 +00:00
3d21297bbb Call the fast path compressor for wilson kernels to avoid if else on projector 2016-12-27 11:23:13 +00:00
25efefc5b4 Back to original thread policy post test 2016-12-23 09:49:04 +00:00
b8cdb3e90a Debug hack; raises from 62GF/s to 72 GF/s per node on BG/Q 2016-12-22 17:50:14 +00:00
eabc577940 Assembler possibly working 2016-12-16 16:55:36 +00:00
fb8d4b2357 Lots of debug on performance Mobius 2016-12-08 17:28:28 +00:00
ae9688e343 Reporting also the total mflops 2016-11-28 11:37:02 +00:00
ca21003f01 Merge branch 'feature/fft-opt' into feature/feynman-rules
# Conflicts:
#	lib/FFT.h
#	lib/qcd/action/fermion/WilsonFermion5D.h
#	tests/core/Test_fft.cc
2016-10-26 18:44:47 +01:00
c190221fd3 Internal SHM comms in non-simd directions working
Need to fix simd directions
2016-10-22 18:14:27 +01:00
6a9eae6b6b Reporting improvements 2016-10-21 13:36:18 +01:00
997fd882ff Merge branch 'develop' into feature/feynman-rules
# Conflicts:
#	lib/Threads.h
#	lib/qcd/action/fermion/WilsonFermion.cc
#	lib/qcd/action/fermion/WilsonFermion.h
#	lib/qcd/utils/SUn.h
#	lib/simd/Grid_avx.h
#	lib/simd/Intel512common.h
2016-10-19 18:35:18 +01:00
81f2aeaece KNL streaming stores, and KNL performance coutners 2016-10-12 11:45:22 +01:00
96f1d1b828 Debugged Domain wall and Overlap feynman rules (infinite Ls, finite mass). 2016-10-10 23:46:45 +01:00
2e453dfbf5 Added some instrumentation to benchmark the force computation 2016-10-06 17:52:45 +01:00
4089984431 Timing hooks 2016-10-06 09:25:12 +01:00
b6713ecb60 Momentum space rules for Overlap, DWF untested to date 2016-09-26 09:39:09 +01:00
48fb1cdc11 Update domain 5d vectorised impl type, move the type over to 4d redblack with
the dense OO inverse
2016-07-14 23:54:35 +01:00
6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
8ac021de73 Added a test an fixed it for red black precon Ls innermost vectorised DWF 2016-06-07 13:16:56 -07:00
53d06046b0 Compiling updates for KNL 2016-06-03 03:47:54 -07:00
139cc5f1ae Large change with KNL preparation 2016-06-03 03:24:26 -07:00
1e554350ac The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug. 2016-04-29 16:49:18 -07:00
9b6ab6db16 simd in 5th dimension support 2016-04-19 15:38:01 -07:00
e8dddb1596 Adding extra benchmark 2016-04-06 10:32:54 +01:00
e67fc2be18 Adding a trial for openmp overhead minimisation 2016-03-31 16:00:37 +01:00
165bffc2e7 Avx512 changes for assembler kernels 2016-03-26 22:25:45 -06:00
090e7aa930 Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
61413565d0 Back off the inlined spin proj as not working 2016-03-02 07:03:09 -08:00
6aeaf6f568 Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
turned up problems on the BlueWaters Cray.

Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00