Peter Boyle
c289699d9a
updated from cambridge mpi3 shakeout
2017-08-25 11:41:01 +01:00
Peter Boyle
c3b1263e75
Benchmark prep
2017-08-25 09:25:54 +01:00
Christopher Kelly
34a9aeb331
Reduced number of if-statement evaluations in G-parity unrolled kernel
2017-08-24 13:53:50 -07:00
21b02760c3
Merge branch 'develop' into feature/hadrons
2017-08-24 17:05:45 +01:00
Christopher Kelly
ce5df177ee
Removed superfluous implementation of G-parity twist for hand-unrolled kernel from GparityWilsonImpl
2017-08-23 15:05:22 -04:00
Christopher Kelly
a0bb8e5b46
Added hand-unrolled kernel implementations of all the other dslash precision / comms precision combinations with G-parity
2017-08-23 14:44:40 -04:00
Christopher Kelly
46f88e6d72
G-parity hand-unrolled intrinsics twist now uses one less permute and one less temporary
2017-08-23 13:21:10 -04:00
David Murphy
dd8f1ea189
Vectorized Mobius EOFA Dperp + shift operation
2017-08-23 13:17:26 -04:00
Christopher Kelly
b61835c1a5
Added inplace version of intrinsic G-parity twist to hand-unrolled kernel
2017-08-23 12:33:48 -04:00
Azusa Yamaguchi
d9cd4f0273
Staggered multinode block cg debugged. Missing global sum.
...
Code stalls and resumes on KNL at cambridge. Curious.
CG iterations 23ms each, then 3200 ms pauses. Mean bandwidth reports
as 200MB/s. Comms dominant in the report. However, the time behaviour suggests it
is *bursty*.... Could be swap to disk?
2017-08-23 15:07:18 +01:00
David Murphy
459f70e8d4
Check-in of working Mobius EOFA class and tests
2017-08-22 22:38:30 -04:00
Christopher Kelly
061e48fd73
Replaced slow unpack-repack in G-parity BC twist with intrinsics version
2017-08-22 18:12:12 -04:00
Christopher Kelly
ab50145001
Implemented first, unoptimized version of hand-unrolled G-parity kernels
...
Improved Test_gparity
2017-08-22 17:12:25 -04:00
paboyle
a446d95c33
Trying to pass TeamCity and Travis
2017-08-20 01:10:50 +01:00
David Murphy
9d45fca8bc
Implement MobiusEOFAFermioncache.cc
2017-08-17 23:45:36 -04:00
David Murphy
ac9e6b63c0
More re-import of Mobius EOFA
2017-08-17 19:28:53 -04:00
David Murphy
e140b3f802
Beginning to re-import Mobius EOFA
2017-08-16 23:36:23 -04:00
David Murphy
d9d3d30cc7
Minor clean-up
2017-08-16 20:57:51 -04:00
David Murphy
6d0786ff9d
Typo fixes and check-in of G-parity action test for DWF
2017-08-15 22:47:00 -04:00
David Murphy
b7f93aeb4d
Change CayleyFermion5D::SetCoefficientsInternal to virtual to allow overriding in derived EOFA classes
2017-08-15 14:18:51 -04:00
David Murphy
202a7fe900
Re-import DWF and abstract base EOFA fermion classes and tests
2017-08-15 13:36:08 -04:00
Lanny91
67b34e5789
Modified conserved current 5th dimension loop for compatibility with 5D vectorisation.
2017-07-31 11:35:01 +01:00
Peter Boyle
14d53e1c9e
Threaded MPI calls patches
2017-07-29 13:08:10 -04:00
paboyle
54e94360ad
Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit
2017-06-24 23:10:24 +01:00
Lanny91
c11d69787e
Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/rare_kaon
...
# Conflicts:
# extras/Hadrons/Modules.hpp
# extras/Hadrons/Modules/MFermion/GaugeProp.hpp
# extras/Hadrons/modules.inc
# tests/hadrons/Test_hadrons.hpp
# tests/hadrons/Test_hadrons_meson_3pt.cc
2017-06-22 16:26:31 +02:00
7587df831a
Merge branch 'develop' into feature/hadrons
...
# Conflicts:
# lib/qcd/action/scalar/ScalarImpl.h
2017-06-20 15:50:39 +01:00
paboyle
46879e1658
Complex defined in Impl even for gauge.
2017-06-18 00:11:45 +01:00
Lanny91
1bd311ba9c
Faster sequential conserved current implementation, now compatible with 5D vectorisation & G-parity.
2017-06-16 16:43:15 +01:00
Lanny91
41af8c12d7
Code cleaning for conserved current contractions. Will now be easier to implement mobius conserved current.
2017-06-16 16:38:59 +01:00
Lanny91
5633a2db20
Faster implementation of conserved current site contraction. Added 5D vectorised support, but not G-parity.
2017-06-12 10:41:02 +01:00
Lanny91
b35fc4e7f9
Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/rare_kaon
...
# Conflicts:
# extras/Hadrons/Global.hpp
# tests/hadrons/Test_hadrons_rarekaon.cc
2017-06-07 14:38:51 +01:00
Lanny91
8d442b502d
Sequential current fix for spacial indices.
2017-06-06 17:06:40 +01:00
0503c028be
Merge branch 'feature/qed-fvol' into feature/hadrons (non-trivial conflicts on scalar Impl)
...
# Conflicts:
# configure.ac
# lib/qcd/action/scalar/Scalar.h
2017-06-05 16:37:47 -05:00
Lanny91
622a21bec6
Improvements to sequential conserved current test and small bugfix.
2017-06-05 15:55:32 +01:00
Lanny91
eec79e0a1e
Ward Identity test improvements and conserved current bug fixes
2017-06-05 11:55:41 +01:00
Lanny91
23135aa58a
Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/rare_kaon
2017-05-26 16:00:50 +01:00
Guido Cossu
9c12c37aaf
Confirming the fix on the complex boundary conditions
2017-05-09 08:41:29 +01:00
paboyle
529e78d43f
Restart the v0.7.0 release
2017-05-08 18:20:04 +01:00
paboyle
2439999ec8
Warning elimination; drop to -O2 on G++ bad versions
2017-05-06 14:44:49 +01:00
paboyle
1d96f662e3
Fixed 4d fermion gparity force. Put strong tests on make check force tests
2017-05-06 00:46:31 +01:00
Guido Cossu
20999c1370
Merge branch 'develop' into feature/hmc_generalise
2017-05-05 12:47:17 +01:00
Lanny91
77e0af9c2e
Compilation fix after merge - conserved current code not yet operational for vectorised 5D or Gparity Impl.
2017-05-05 12:27:50 +01:00
paboyle
78ef10e60f
Mobius force improvement
2017-05-04 19:53:21 +01:00
Lanny91
ca1077c560
Merge branch 'develop' of https://github.com/paboyle/Grid into feature/rare_kaon
...
# Conflicts:
# lib/qcd/action/fermion/WilsonFermion5D.cc
# tests/hadrons/Test_hadrons_rarekaon.cc
2017-05-04 16:22:33 +01:00
paboyle
90f6bc16bb
No compile clang fix
2017-05-04 12:15:06 +01:00
Peter Boyle
422cdf4979
Some checks
2017-05-03 18:37:38 -04:00
Peter Boyle
38db174f3b
Print statement
2017-05-03 18:25:26 -04:00
Guido Cossu
4063238943
Adding HMC test file example for Mobius + smearing
2017-05-01 13:44:00 +01:00
Guido Cossu
3344788fa1
Merge branch 'develop' into feature/hmc_generalise
2017-05-01 12:13:56 +01:00
Lanny91
51d84ec057
Bugfixes in Wilson 5D sequential conserved current insertion
2017-04-28 16:49:14 +01:00
Peter Boyle
99220f6531
Fixes and better timing
2017-04-26 17:24:11 -04:00
Lanny91
d2003f24f4
Corrected incorrect usage of ExtractSlice for conserved current code.
2017-04-26 17:25:28 +01:00
Peter Boyle
f8797e1e3e
bug fix. works now and great face performance
2017-04-26 03:14:02 -04:00
Peter Boyle
fd1eb7de13
Clean implementation of the exterior faces listing only those points on the boudary
2017-04-26 02:34:52 -04:00
Peter Boyle
2ce898efa3
Pretty code
2017-04-26 02:34:25 -04:00
Lanny91
44260643f6
First conserved current implementation for Wilson fermions only. Not implemented for Gparity or 5D-vectorised Wilson fermions.
2017-04-25 18:00:24 +01:00
paboyle
ab66bac4e6
Think I'm getting on top of the reduced cost exterior precomputed list of links
2017-04-25 08:50:26 +01:00
paboyle
56277a11c8
Build a list of whats on the surface
2017-04-24 17:06:15 +01:00
Peter Boyle
5b55867a7a
Slightly cheaper Ext assembly
2017-04-24 05:36:11 -04:00
Peter Boyle
3accb1ef89
Debugged assemply split phase with interior suppression
2017-04-23 19:30:19 -04:00
Peter Boyle
e3d0e31525
Debugged assemply split phase with interior suppression
2017-04-23 19:29:27 -04:00
Peter Boyle
5812eb8a8c
Partially fixed. But the comms-overlap does not work yet.
2017-04-22 18:50:25 -04:00
paboyle
ac58565d0a
Dangerous rewrite of the assembly. If I make a mistake the debug will be painful.
2017-04-22 19:31:04 +01:00
paboyle
b722889234
Try a better load balancing loop
2017-04-22 19:27:41 +01:00
paboyle
abba44a837
Hand unrolled for overlapped comms
2017-04-22 17:45:17 +01:00
paboyle
f301be94ce
Fixed
2017-04-22 17:42:31 +01:00
Peter Boyle
1d1b225497
Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).
2017-04-22 09:05:28 -04:00
Peter Boyle
53a785a3dd
Fixing the KNL compile
2017-04-22 08:11:51 -04:00
paboyle
736bf3c866
Major rework of stencil. Half precision and MPI3 now working.
2017-04-22 11:33:50 +01:00
paboyle
fc4ab9ccd5
Working half precision comms
2017-04-20 11:20:26 +01:00
paboyle
4a340aa5ca
Massive compressor rework to support reduced precision comms
2017-04-20 09:28:27 +01:00
a6a0da873f
Merge branch 'feature/hadrons' into feature/qed-fvol
2017-04-13 15:31:06 +01:00
paboyle
42fb49d3fd
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2017-04-13 14:12:47 +01:00
8ef4300412
spurious .dirstamp files removed
2017-04-10 17:00:22 +01:00
paboyle
db5f6d3ae3
Verbose fix
2017-04-09 23:41:30 +09:00
paboyle
86aaa35294
Christoph needs SchurDiagTwoKappa which is mobius specific.
2017-04-07 11:07:40 +09:00
Guido Cossu
8c540333d5
Merge branch 'develop' into feature/hmc_generalise
2017-04-05 14:41:04 +01:00
paboyle
1c4bc7ed38
Debugged staggered conventions
2017-03-31 14:41:48 +09:00
paboyle
9fd23faadf
Pretty layout
2017-03-30 13:44:45 +09:00
paboyle
10e4fa0dc8
Template instantiation improvements
2017-03-30 13:44:25 +09:00
paboyle
c4aca1dde4
Conjugate coefficients on adjoint
2017-03-30 13:44:05 +09:00
paboyle
b9e8ea3aaa
conjugate coefficient on the dagger
2017-03-30 13:43:13 +09:00
paboyle
077aa728b9
Fix the ZMobius (I think)
2017-03-30 13:42:09 +09:00
paboyle
a8d83d886e
Macro controls
2017-03-30 13:31:34 +09:00
paboyle
7fd46eeec4
Trailing whitespace removal
2017-03-30 13:31:10 +09:00
paboyle
2b115929dc
Small AVX512 asm ifdef patch
2017-03-29 18:51:23 +09:00
paboyle
d805867e02
Better init
2017-03-28 13:25:05 -04:00
paboyle
98f9318279
Build on AVX2 and MPI passing with clang++
2017-03-28 23:16:04 +09:00
paboyle
4b17e8eba8
Merge branch 'develop' into feature/bgq-asm
...
Conflicts:
lib/qcd/action/fermion/Fermion.h
lib/qcd/action/fermion/WilsonFermion.cc
lib/util/Init.cc
tests/Test_cayley_even_odd_vec.cc
2017-03-28 04:49:30 -04:00
paboyle
18bde08d1b
Merge branch 'feature/staggering' into develop
2017-03-28 15:25:55 +09:00
paboyle
e7c36771ed
ZMobius prep for asm
2017-03-15 14:23:33 -04:00
paboyle
8dc57a1e25
Layout change
2017-03-13 11:11:46 +00:00
paboyle
f57bd770b0
Merge branch 'bugfix/dminus' into feature/bgq-asm
2017-03-13 11:11:03 +00:00
Chulwoo Jung
33edde245d
Changing Dminus(Dag) to use full vectors to work correctly
2017-03-12 23:02:42 -04:00
paboyle
447c5e6cd7
Z mobius hermiticity correction
2017-03-13 01:30:43 +00:00
paboyle
8b99d80d8c
Merge branch 'bgq-asm-shmemfixes' into feature/bgq-asm
2017-03-12 23:30:09 +00:00
paboyle
af230a1fb8
Average the time across the whole machine for outliers
2017-02-28 17:05:22 -05:00
Christopher Kelly
06a132e3f9
Fixes to SHMEM comms
2017-02-28 13:31:54 -08:00
paboyle
e099dcdae7
Merge branch 'develop' into feature/bgq-asm
2017-02-23 00:25:29 +00:00
paboyle
4e7ab3166f
Refactoring header layout
2017-02-22 18:09:33 +00:00
azusayamaguchi
1c30e9a961
Verified
2017-02-21 23:01:25 +00:00
azusayamaguchi
bf7e3f20d4
Staggaered fermion optimised version
2017-02-21 14:35:42 +00:00
paboyle
3ae92fa2e6
Global changes to parallel_for structure.
...
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
Guido Cossu
e0571c872b
Merge branch 'develop' into feature/hmc_generalise
2017-02-09 16:12:00 +00:00
paboyle
2c246551d0
Overlap comms and compute options in wilson kernels
2017-02-07 01:37:10 -05:00
a0cfbb6e88
Merge branch 'feature/gammas' into feature/hadrons
...
# Conflicts:
# .gitignore
# lib/qcd/spin/Dirac.cc
# scripts/filelist
2017-01-30 09:10:49 -08:00
fad743fbb1
Build system sanity check: corrected several headers not in the <Grid/*> format
2017-01-26 17:00:41 -08:00
Guido Cossu
17629b8d9e
Merge branch 'develop' into feature/hmc_generalise
2017-01-25 11:33:53 +00:00
a37e71f362
New automatic implementation of gamma matrices, Meson and SeqGamma are broken
2017-01-23 19:13:43 -08:00
Guido Cossu
27dfe816fa
Added TwoFlavorsEO
...
Had to remove a conformability check in the Derivative of SchurDiff,
see the comments in the file
2017-01-20 16:59:31 +00:00
Peter Boyle
03c81bd902
Merge branch 'feature/bgq-asm' of https://github.com/paboyle/Grid into feature/bgq-asm
2016-12-27 11:25:35 +00:00
Peter Boyle
a869addef1
Stats switch off
2016-12-27 11:25:22 +00:00
Peter Boyle
3d21297bbb
Call the fast path compressor for wilson kernels to avoid if else on projector
2016-12-27 11:23:13 +00:00
Peter Boyle
25efefc5b4
Back to original thread policy post test
2016-12-23 09:49:04 +00:00
Peter Boyle
eabf316ed9
BGQ performance ASM
2016-12-22 21:56:08 +00:00
Peter Boyle
04ae7929a3
BGQ or KNL assembler now
2016-12-22 17:53:22 +00:00
Peter Boyle
caba0d42a5
L1p controls
2016-12-22 17:52:55 +00:00
Peter Boyle
9ae81c06d2
L1p controls for BG/Q
2016-12-22 17:52:21 +00:00
Peter Boyle
b8cdb3e90a
Debug hack; raises from 62GF/s to 72 GF/s per node on BG/Q
2016-12-22 17:50:14 +00:00
paboyle
3e6945cd65
Fixing AVX Z-mobius
2016-12-18 02:05:11 +00:00
paboyle
87be03006a
AVX 512 code broke other compiles; fixing
2016-12-18 01:45:09 +00:00
Peter Boyle
4d8b01b7ed
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2016-12-18 00:56:57 +00:00
Peter Boyle
fa6acccf55
Zmobius asm
2016-12-18 00:56:19 +00:00
azusayamaguchi
df9108154d
Debugged 2 versions of assembler; ls vectorised, xyzt vectorised
2016-12-17 23:47:51 +00:00
azusayamaguchi
b3e7f600da
Partial implementation of 4d vectorisation assembler
2016-12-16 23:50:30 +00:00
azusayamaguchi
d4071daf2a
Template specialise
2016-12-16 22:28:29 +00:00
azusayamaguchi
a2a6329094
AVX512 only for ASM compilation
2016-12-16 22:03:29 +00:00
azusayamaguchi
eabc577940
Assembler possibly working
2016-12-16 16:55:36 +00:00
91e98b1dd5
Merge branch 'feature/hadrons' into develop
2016-12-15 18:15:56 +00:00
Guido Cossu
2fb92dbc6e
Cleaning up previous debug lines
2016-12-13 07:53:43 +00:00
Guido Cossu
5c74b6028b
Commit for debugging, lot of IO
2016-12-13 06:35:30 +00:00
Azusa Yamaguchi
426197e446
Nc=3
2016-12-12 09:10:54 +00:00
Azusa Yamaguchi
99e2c1e666
Kernels options
2016-12-12 09:08:53 +00:00
Azusa Yamaguchi
1440565a10
Decrease verbosity
2016-12-12 09:08:04 +00:00
Peter Boyle
fe187e9ed3
Compiles and passes under ZMobius with assembler
2016-12-10 00:47:48 +00:00
Peter Boyle
0091b50f49
Zmobius working -- not asm yet
2016-12-09 22:51:32 +00:00
Peter Boyle
fb8d4b2357
Lots of debug on performance Mobius
2016-12-08 17:28:28 +00:00
Guido Cossu
2bd4233919
Completed testing of the HMC for Ls vectorised version (on AVX2)
2016-12-07 04:56:37 +00:00
Guido Cossu
143c70e29f
Debugged the threaded version. Cleaning up
2016-12-07 04:40:25 +00:00
Guido Cossu
b812d5e39c
Added single threaded version of the derivative for the Ls vectorised DWF
2016-12-06 16:31:13 +00:00
Peter Boyle
e27c6b217c
Updating
2016-12-01 12:42:53 +00:00
paboyle
6adf35da54
Faster Mobius
2016-12-01 11:39:04 +00:00
paboyle
bd0430b34f
Serialisation in malloc fixed
2016-11-29 22:27:55 +00:00
Azusa Yamaguchi
c097fd041a
Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering
2016-11-29 13:44:17 +00:00
Azusa Yamaguchi
77fb25fb29
Push 5d tests
2016-11-29 13:43:56 +00:00
Azusa Yamaguchi
389e0a77bd
Staggerd Fermion 5D
2016-11-29 13:13:56 +00:00
Guido Cossu
ae9688e343
Reporting also the total mflops
2016-11-28 11:37:02 +00:00
fabcd4179d
Hadrons: propagator type coming from the fermion implementation
2016-11-28 14:02:10 +09:00
Azusa Yamaguchi
668ca57702
Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering
2016-11-22 13:49:11 +00:00
azusayamaguchi
f7b60004f3
Merge branch 'develop' into release/v0.6.0
2016-11-04 16:08:07 +00:00