paboyle
f68b5de9c8
No compile fix on Clang
2017-08-25 19:35:21 +01:00
Peter Boyle
c3b1263e75
Benchmark prep
2017-08-25 09:25:54 +01:00
paboyle
a446d95c33
Trying to pass TeamCity and Travis
2017-08-20 01:10:50 +01:00
Peter Boyle
14d53e1c9e
Threaded MPI calls patches
2017-07-29 13:08:10 -04:00
paboyle
54e94360ad
Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit
2017-06-24 23:10:24 +01:00
Guido Cossu
3344788fa1
Merge branch 'develop' into feature/hmc_generalise
2017-05-01 12:13:56 +01:00
Peter Boyle
99220f6531
Fixes and better timing
2017-04-26 17:24:11 -04:00
Peter Boyle
fd1eb7de13
Clean implementation of the exterior faces listing only those points on the boudary
2017-04-26 02:34:52 -04:00
paboyle
ab66bac4e6
Think I'm getting on top of the reduced cost exterior precomputed list of links
2017-04-25 08:50:26 +01:00
paboyle
56277a11c8
Build a list of whats on the surface
2017-04-24 17:06:15 +01:00
Peter Boyle
e3d0e31525
Debugged assemply split phase with interior suppression
2017-04-23 19:29:27 -04:00
paboyle
b722889234
Try a better load balancing loop
2017-04-22 19:27:41 +01:00
paboyle
736bf3c866
Major rework of stencil. Half precision and MPI3 now working.
2017-04-22 11:33:50 +01:00
paboyle
fc4ab9ccd5
Working half precision comms
2017-04-20 11:20:26 +01:00
paboyle
4a340aa5ca
Massive compressor rework to support reduced precision comms
2017-04-20 09:28:27 +01:00
Guido Cossu
8c540333d5
Merge branch 'develop' into feature/hmc_generalise
2017-04-05 14:41:04 +01:00
paboyle
4b17e8eba8
Merge branch 'develop' into feature/bgq-asm
...
Conflicts:
lib/qcd/action/fermion/Fermion.h
lib/qcd/action/fermion/WilsonFermion.cc
lib/util/Init.cc
tests/Test_cayley_even_odd_vec.cc
2017-03-28 04:49:30 -04:00
paboyle
18bde08d1b
Merge branch 'feature/staggering' into develop
2017-03-28 15:25:55 +09:00
paboyle
af230a1fb8
Average the time across the whole machine for outliers
2017-02-28 17:05:22 -05:00
paboyle
e099dcdae7
Merge branch 'develop' into feature/bgq-asm
2017-02-23 00:25:29 +00:00
paboyle
4e7ab3166f
Refactoring header layout
2017-02-22 18:09:33 +00:00
paboyle
3ae92fa2e6
Global changes to parallel_for structure.
...
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
Guido Cossu
e0571c872b
Merge branch 'develop' into feature/hmc_generalise
2017-02-09 16:12:00 +00:00
paboyle
2c246551d0
Overlap comms and compute options in wilson kernels
2017-02-07 01:37:10 -05:00
a0cfbb6e88
Merge branch 'feature/gammas' into feature/hadrons
...
# Conflicts:
# .gitignore
# lib/qcd/spin/Dirac.cc
# scripts/filelist
2017-01-30 09:10:49 -08:00
fad743fbb1
Build system sanity check: corrected several headers not in the <Grid/*> format
2017-01-26 17:00:41 -08:00
Guido Cossu
17629b8d9e
Merge branch 'develop' into feature/hmc_generalise
2017-01-25 11:33:53 +00:00
a37e71f362
New automatic implementation of gamma matrices, Meson and SeqGamma are broken
2017-01-23 19:13:43 -08:00
Peter Boyle
03c81bd902
Merge branch 'feature/bgq-asm' of https://github.com/paboyle/Grid into feature/bgq-asm
2016-12-27 11:25:35 +00:00
Peter Boyle
a869addef1
Stats switch off
2016-12-27 11:25:22 +00:00
Peter Boyle
3d21297bbb
Call the fast path compressor for wilson kernels to avoid if else on projector
2016-12-27 11:23:13 +00:00
Peter Boyle
25efefc5b4
Back to original thread policy post test
2016-12-23 09:49:04 +00:00
Peter Boyle
b8cdb3e90a
Debug hack; raises from 62GF/s to 72 GF/s per node on BG/Q
2016-12-22 17:50:14 +00:00
azusayamaguchi
eabc577940
Assembler possibly working
2016-12-16 16:55:36 +00:00
Peter Boyle
fb8d4b2357
Lots of debug on performance Mobius
2016-12-08 17:28:28 +00:00
Guido Cossu
143c70e29f
Debugged the threaded version. Cleaning up
2016-12-07 04:40:25 +00:00
Guido Cossu
b812d5e39c
Added single threaded version of the derivative for the Ls vectorised DWF
2016-12-06 16:31:13 +00:00
Guido Cossu
ae9688e343
Reporting also the total mflops
2016-11-28 11:37:02 +00:00
ca21003f01
Merge branch 'feature/fft-opt' into feature/feynman-rules
...
# Conflicts:
# lib/FFT.h
# lib/qcd/action/fermion/WilsonFermion5D.h
# tests/core/Test_fft.cc
2016-10-26 18:44:47 +01:00
azusayamaguchi
c190221fd3
Internal SHM comms in non-simd directions working
...
Need to fix simd directions
2016-10-22 18:14:27 +01:00
azusayamaguchi
6a9eae6b6b
Reporting improvements
2016-10-21 13:36:18 +01:00
997fd882ff
Merge branch 'develop' into feature/feynman-rules
...
# Conflicts:
# lib/Threads.h
# lib/qcd/action/fermion/WilsonFermion.cc
# lib/qcd/action/fermion/WilsonFermion.h
# lib/qcd/utils/SUn.h
# lib/simd/Grid_avx.h
# lib/simd/Intel512common.h
2016-10-19 18:35:18 +01:00
azusayamaguchi
81f2aeaece
KNL streaming stores, and KNL performance coutners
2016-10-12 11:45:22 +01:00
paboyle
96f1d1b828
Debugged Domain wall and Overlap feynman rules (infinite Ls, finite mass).
2016-10-10 23:46:45 +01:00
Guido Cossu
2e453dfbf5
Added some instrumentation to benchmark the force computation
2016-10-06 17:52:45 +01:00
paboyle
4089984431
Timing hooks
2016-10-06 09:25:12 +01:00
paboyle
b6713ecb60
Momentum space rules for Overlap, DWF untested to date
2016-09-26 09:39:09 +01:00
paboyle
48fb1cdc11
Update domain 5d vectorised impl type, move the type over to 4d redblack with
...
the dense OO inverse
2016-07-14 23:54:35 +01:00
paboyle
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
paboyle
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00