1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-13 01:05:36 +00:00
Commit Graph

100 Commits

Author SHA1 Message Date
paboyle
f68b5de9c8 No compile fix on Clang 2017-08-25 19:35:21 +01:00
Peter Boyle
c3b1263e75 Benchmark prep 2017-08-25 09:25:54 +01:00
paboyle
a446d95c33 Trying to pass TeamCity and Travis 2017-08-20 01:10:50 +01:00
Peter Boyle
14d53e1c9e Threaded MPI calls patches 2017-07-29 13:08:10 -04:00
paboyle
54e94360ad Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit 2017-06-24 23:10:24 +01:00
Guido Cossu
3344788fa1 Merge branch 'develop' into feature/hmc_generalise 2017-05-01 12:13:56 +01:00
Peter Boyle
99220f6531 Fixes and better timing 2017-04-26 17:24:11 -04:00
Peter Boyle
fd1eb7de13 Clean implementation of the exterior faces listing only those points on the boudary 2017-04-26 02:34:52 -04:00
paboyle
ab66bac4e6 Think I'm getting on top of the reduced cost exterior precomputed list of links 2017-04-25 08:50:26 +01:00
paboyle
56277a11c8 Build a list of whats on the surface 2017-04-24 17:06:15 +01:00
Peter Boyle
e3d0e31525 Debugged assemply split phase with interior suppression 2017-04-23 19:29:27 -04:00
paboyle
b722889234 Try a better load balancing loop 2017-04-22 19:27:41 +01:00
paboyle
736bf3c866 Major rework of stencil. Half precision and MPI3 now working. 2017-04-22 11:33:50 +01:00
paboyle
fc4ab9ccd5 Working half precision comms 2017-04-20 11:20:26 +01:00
paboyle
4a340aa5ca Massive compressor rework to support reduced precision comms 2017-04-20 09:28:27 +01:00
Guido Cossu
8c540333d5 Merge branch 'develop' into feature/hmc_generalise 2017-04-05 14:41:04 +01:00
paboyle
4b17e8eba8 Merge branch 'develop' into feature/bgq-asm
Conflicts:
	lib/qcd/action/fermion/Fermion.h
	lib/qcd/action/fermion/WilsonFermion.cc
	lib/util/Init.cc
	tests/Test_cayley_even_odd_vec.cc
2017-03-28 04:49:30 -04:00
paboyle
18bde08d1b Merge branch 'feature/staggering' into develop 2017-03-28 15:25:55 +09:00
paboyle
af230a1fb8 Average the time across the whole machine for outliers 2017-02-28 17:05:22 -05:00
paboyle
e099dcdae7 Merge branch 'develop' into feature/bgq-asm 2017-02-23 00:25:29 +00:00
paboyle
4e7ab3166f Refactoring header layout 2017-02-22 18:09:33 +00:00
paboyle
3ae92fa2e6 Global changes to parallel_for structure.
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
Guido Cossu
e0571c872b Merge branch 'develop' into feature/hmc_generalise 2017-02-09 16:12:00 +00:00
paboyle
2c246551d0 Overlap comms and compute options in wilson kernels 2017-02-07 01:37:10 -05:00
a0cfbb6e88 Merge branch 'feature/gammas' into feature/hadrons
# Conflicts:
#	.gitignore
#	lib/qcd/spin/Dirac.cc
#	scripts/filelist
2017-01-30 09:10:49 -08:00
fad743fbb1 Build system sanity check: corrected several headers not in the <Grid/*> format 2017-01-26 17:00:41 -08:00
Guido Cossu
17629b8d9e Merge branch 'develop' into feature/hmc_generalise 2017-01-25 11:33:53 +00:00
a37e71f362 New automatic implementation of gamma matrices, Meson and SeqGamma are broken 2017-01-23 19:13:43 -08:00
Peter Boyle
03c81bd902 Merge branch 'feature/bgq-asm' of https://github.com/paboyle/Grid into feature/bgq-asm 2016-12-27 11:25:35 +00:00
Peter Boyle
a869addef1 Stats switch off 2016-12-27 11:25:22 +00:00
Peter Boyle
3d21297bbb Call the fast path compressor for wilson kernels to avoid if else on projector 2016-12-27 11:23:13 +00:00
Peter Boyle
25efefc5b4 Back to original thread policy post test 2016-12-23 09:49:04 +00:00
Peter Boyle
b8cdb3e90a Debug hack; raises from 62GF/s to 72 GF/s per node on BG/Q 2016-12-22 17:50:14 +00:00
azusayamaguchi
eabc577940 Assembler possibly working 2016-12-16 16:55:36 +00:00
Peter Boyle
fb8d4b2357 Lots of debug on performance Mobius 2016-12-08 17:28:28 +00:00
Guido Cossu
143c70e29f Debugged the threaded version. Cleaning up 2016-12-07 04:40:25 +00:00
Guido Cossu
b812d5e39c Added single threaded version of the derivative for the Ls vectorised DWF 2016-12-06 16:31:13 +00:00
Guido Cossu
ae9688e343 Reporting also the total mflops 2016-11-28 11:37:02 +00:00
ca21003f01 Merge branch 'feature/fft-opt' into feature/feynman-rules
# Conflicts:
#	lib/FFT.h
#	lib/qcd/action/fermion/WilsonFermion5D.h
#	tests/core/Test_fft.cc
2016-10-26 18:44:47 +01:00
azusayamaguchi
c190221fd3 Internal SHM comms in non-simd directions working
Need to fix simd directions
2016-10-22 18:14:27 +01:00
azusayamaguchi
6a9eae6b6b Reporting improvements 2016-10-21 13:36:18 +01:00
997fd882ff Merge branch 'develop' into feature/feynman-rules
# Conflicts:
#	lib/Threads.h
#	lib/qcd/action/fermion/WilsonFermion.cc
#	lib/qcd/action/fermion/WilsonFermion.h
#	lib/qcd/utils/SUn.h
#	lib/simd/Grid_avx.h
#	lib/simd/Intel512common.h
2016-10-19 18:35:18 +01:00
azusayamaguchi
81f2aeaece KNL streaming stores, and KNL performance coutners 2016-10-12 11:45:22 +01:00
paboyle
96f1d1b828 Debugged Domain wall and Overlap feynman rules (infinite Ls, finite mass). 2016-10-10 23:46:45 +01:00
Guido Cossu
2e453dfbf5 Added some instrumentation to benchmark the force computation 2016-10-06 17:52:45 +01:00
paboyle
4089984431 Timing hooks 2016-10-06 09:25:12 +01:00
paboyle
b6713ecb60 Momentum space rules for Overlap, DWF untested to date 2016-09-26 09:39:09 +01:00
paboyle
48fb1cdc11 Update domain 5d vectorised impl type, move the type over to 4d redblack with
the dense OO inverse
2016-07-14 23:54:35 +01:00
paboyle
6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
paboyle
55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00