1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-04-26 21:46:00 +01:00
Commit Graph

351 Commits

Author SHA1 Message Date
Peter Boyle bfb68e6f02 Merge pull request #130 from giltirn/gparity-handunroll
Gparity handunroll
2017-09-21 10:11:00 +01:00
Christopher Kelly 59bd1fe21b Fix for 'perm' and 'local' not being set for hand-unrolled external-site Dslash, which caused incorrect behavior of G-parity kernel 2017-08-29 13:07:37 -07:00
Christopher Kelly 74af885d4e Removed some no-longer-needed associated with G-parity hand unrolled kernel 2017-08-29 09:50:37 -04:00
paboyle 80c5bce5bb Merge branch 'develop' into feature/multi-communicator 2017-08-25 20:21:26 +01:00
paboyle f68b5de9c8 No compile fix on Clang 2017-08-25 19:35:21 +01:00
Christopher Kelly f365a83fae In G-parity unrolled kernel, replaced calls to permute and exchange with run-time-evaluated permute type with explicit calls to appropriate underlying functions 2017-08-25 14:24:11 -04:00
Peter Boyle c289699d9a updated from cambridge mpi3 shakeout 2017-08-25 11:41:01 +01:00
Peter Boyle c3b1263e75 Benchmark prep 2017-08-25 09:25:54 +01:00
Christopher Kelly 34a9aeb331 Reduced number of if-statement evaluations in G-parity unrolled kernel 2017-08-24 13:53:50 -07:00
Christopher Kelly ce5df177ee Removed superfluous implementation of G-parity twist for hand-unrolled kernel from GparityWilsonImpl 2017-08-23 15:05:22 -04:00
Christopher Kelly a0bb8e5b46 Added hand-unrolled kernel implementations of all the other dslash precision / comms precision combinations with G-parity 2017-08-23 14:44:40 -04:00
Christopher Kelly 46f88e6d72 G-parity hand-unrolled intrinsics twist now uses one less permute and one less temporary 2017-08-23 13:21:10 -04:00
Christopher Kelly b61835c1a5 Added inplace version of intrinsic G-parity twist to hand-unrolled kernel 2017-08-23 12:33:48 -04:00
Azusa Yamaguchi d9cd4f0273 Staggered multinode block cg debugged. Missing global sum.
Code stalls and resumes on KNL at cambridge. Curious.

CG iterations 23ms each, then 3200 ms pauses. Mean bandwidth reports
as 200MB/s. Comms dominant in the report. However, the time behaviour suggests it
is *bursty*.... Could be swap to disk?
2017-08-23 15:07:18 +01:00
Christopher Kelly 061e48fd73 Replaced slow unpack-repack in G-parity BC twist with intrinsics version 2017-08-22 18:12:12 -04:00
Christopher Kelly ab50145001 Implemented first, unoptimized version of hand-unrolled G-parity kernels
Improved Test_gparity
2017-08-22 17:12:25 -04:00
paboyle a446d95c33 Trying to pass TeamCity and Travis 2017-08-20 01:10:50 +01:00
Peter Boyle 14d53e1c9e Threaded MPI calls patches 2017-07-29 13:08:10 -04:00
paboyle 54e94360ad Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit 2017-06-24 23:10:24 +01:00
portelli 7587df831a Merge branch 'develop' into feature/hadrons
# Conflicts:
#	lib/qcd/action/scalar/ScalarImpl.h
2017-06-20 15:50:39 +01:00
paboyle 46879e1658 Complex defined in Impl even for gauge. 2017-06-18 00:11:45 +01:00
portelli 0503c028be Merge branch 'feature/qed-fvol' into feature/hadrons (non-trivial conflicts on scalar Impl)
# Conflicts:
#	configure.ac
#	lib/qcd/action/scalar/Scalar.h
2017-06-05 16:37:47 -05:00
Guido Cossu 9c12c37aaf Confirming the fix on the complex boundary conditions 2017-05-09 08:41:29 +01:00
paboyle 529e78d43f Restart the v0.7.0 release 2017-05-08 18:20:04 +01:00
paboyle 2439999ec8 Warning elimination; drop to -O2 on G++ bad versions 2017-05-06 14:44:49 +01:00
paboyle 1d96f662e3 Fixed 4d fermion gparity force. Put strong tests on make check force tests 2017-05-06 00:46:31 +01:00
Guido Cossu 20999c1370 Merge branch 'develop' into feature/hmc_generalise 2017-05-05 12:47:17 +01:00
paboyle 78ef10e60f Mobius force improvement 2017-05-04 19:53:21 +01:00
paboyle 90f6bc16bb No compile clang fix 2017-05-04 12:15:06 +01:00
Peter Boyle 422cdf4979 Some checks 2017-05-03 18:37:38 -04:00
Peter Boyle 38db174f3b Print statement 2017-05-03 18:25:26 -04:00
Guido Cossu 4063238943 Adding HMC test file example for Mobius + smearing 2017-05-01 13:44:00 +01:00
Guido Cossu 3344788fa1 Merge branch 'develop' into feature/hmc_generalise 2017-05-01 12:13:56 +01:00
Peter Boyle 99220f6531 Fixes and better timing 2017-04-26 17:24:11 -04:00
Peter Boyle f8797e1e3e bug fix. works now and great face performance 2017-04-26 03:14:02 -04:00
Peter Boyle fd1eb7de13 Clean implementation of the exterior faces listing only those points on the boudary 2017-04-26 02:34:52 -04:00
Peter Boyle 2ce898efa3 Pretty code 2017-04-26 02:34:25 -04:00
paboyle ab66bac4e6 Think I'm getting on top of the reduced cost exterior precomputed list of links 2017-04-25 08:50:26 +01:00
paboyle 56277a11c8 Build a list of whats on the surface 2017-04-24 17:06:15 +01:00
Peter Boyle 5b55867a7a Slightly cheaper Ext assembly 2017-04-24 05:36:11 -04:00
Peter Boyle 3accb1ef89 Debugged assemply split phase with interior suppression 2017-04-23 19:30:19 -04:00
Peter Boyle e3d0e31525 Debugged assemply split phase with interior suppression 2017-04-23 19:29:27 -04:00
Peter Boyle 5812eb8a8c Partially fixed. But the comms-overlap does not work yet. 2017-04-22 18:50:25 -04:00
paboyle ac58565d0a Dangerous rewrite of the assembly. If I make a mistake the debug will be painful. 2017-04-22 19:31:04 +01:00
paboyle b722889234 Try a better load balancing loop 2017-04-22 19:27:41 +01:00
paboyle abba44a837 Hand unrolled for overlapped comms 2017-04-22 17:45:17 +01:00
paboyle f301be94ce Fixed 2017-04-22 17:42:31 +01:00
Peter Boyle 1d1b225497 Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node). 2017-04-22 09:05:28 -04:00
Peter Boyle 53a785a3dd Fixing the KNL compile 2017-04-22 08:11:51 -04:00
paboyle 736bf3c866 Major rework of stencil. Half precision and MPI3 now working. 2017-04-22 11:33:50 +01:00