c12a67030a
980 GiB/s Wilson; 680 GiB/s DW (DP)
2020-04-15 10:55:06 +02:00
581392f2f2
now with pf, best results so far using intrinsics+pf
2020-04-12 22:06:14 +02:00
113f277b6a
enable dslash asm using -DA64FXASM, additionaly -DDSLASHINTRIN for intrinsics impl
2020-04-11 04:55:01 +02:00
974586bedc
Dslash finally works; cleaned up; uses MOVPRFX in assembly
2020-04-10 22:26:40 +02:00
8e81a811d0
Merge branch 'feature/hdcr' into develop
2020-04-10 11:14:49 -04:00
160f78c1e4
changed debug output to variable direct 3
2020-04-10 12:23:07 +02:00
7e4e1bbbc2
changed debug output to variable direct 2
2020-04-10 12:22:04 +02:00
e699b7e9f9
changed debug output to variable direct
2020-04-10 12:18:30 +02:00
a28bc0de90
debug register address test in WilsonHand
2020-04-10 12:07:45 +02:00
14d0fe4d6c
added predication in WilsonHand
2020-04-10 12:04:00 +02:00
0ad2e0815c
debug output in WilsonHand
2020-04-10 11:56:29 +02:00
dc9c8340bb
switched to DSLASHINTRIN for A64FX Dslash intrinsics
2020-04-09 23:30:23 +02:00
19eef97503
specialized A64FX Dslash kernels
2020-04-09 23:25:25 +02:00
5cdbb7e71e
fixed A64FX Dslash; compiles, but does not specialize -> assertion
2020-04-09 21:23:39 +02:00
86c9c4da8b
changes
2020-04-09 16:40:06 +02:00
bd310932f7
changes
2020-04-09 16:32:31 +02:00
77fa586f6c
introduced A64FX Wilson kernels
2020-04-09 13:30:06 +02:00
2c22db841a
Added momentum scaling to scalar HMC theories in order to follow UKQCD/CPS conventions
2020-04-02 17:38:47 +01:00
b6cbdd2aa3
Merge pull request #1 from DanielRichtmann/feature/read-openqcd
...
Feature/read openqcd
2020-03-26 17:39:04 +01:00
a2188ea875
remove debugging printf from WilsonKernelsImplementation
2020-03-26 09:12:36 -04:00
989af65807
Check in parallel reader for openqcd configs
2020-03-24 11:20:54 +01:00
c9b737a4e7
make trace,adj,transpose unary operators
2020-03-16 17:58:30 -04:00
037bb6ea73
Check in reader for openqcd configs
...
This reader is suboptimal in the sense that it opens the entire config on every MPI rank.
2020-03-16 14:28:02 +01:00
7c061e20c9
All directions of dirac operator for fastt coarsening
2020-01-27 12:40:13 -05:00
e5d1c09665
Faster DhopDirAll for little dirac operator coarsening
2020-01-27 12:38:54 -05:00
8016a465ae
Remove extraneous variable
2020-01-27 12:35:37 -05:00
d8b9742092
DhopDirAll for faster matrix elements of little Dirac operator
2020-01-27 12:34:54 -05:00
96671bbb24
Added ability to pass callback to MADWF that is called every inner iteration and allows user to, for example, adjust the inner solver tolerance depending on residual
...
Added a general implementation of the Remez algorithm for producing arbitrary rational polynomial approximation with optional restriction to even/odd polynomials
Added implementation of computation of ZMobius parameters
Added Test_zMADWF_prec to test ZMobius in MADWF
2020-01-17 12:45:30 -08:00
e583035614
Change to interface to minise comms in evaluating coarse space operator
2020-01-06 11:43:59 -05:00
3c3d6a94f3
OPtimising the force term a bit
2020-01-04 03:16:23 -05:00
039eb7b2eb
Make the force term and coarsening multigrid more optimised
2020-01-04 03:12:17 -05:00
f7373e97a4
Missing conjugate in MooeeInvDag
2019-12-16 10:05:50 +01:00
848079e8ba
Merge pull request #235 from grid-test-organisation/feature/5d-improvement
...
MooeeInv and M5D optimisations + enable threading with nvcc
2019-12-10 21:45:03 -05:00
4180a4a8a7
Import BiCGSTAB solvers and tests
2019-12-10 17:20:35 -05:00
6446671a9c
Merge pull request #241 from nils-asmussen/fix/remQCDns_ignore_ws
...
Undo whitespace changes in fix/removeQCDremnants to allow comparing relevant changes
2019-12-09 18:02:21 +00:00
9b6b0caa55
Junk commit fix
2019-12-09 03:01:58 -05:00
2a48617ac5
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2019-12-09 03:00:00 -05:00
3d2fe80780
Temporary size depends on checkerboard/uncheckerboard. The Mdir cares
2019-12-09 02:58:24 -05:00
f7698b93ca
corrected comments about quark line directions
2019-12-06 09:46:52 +00:00
a54157e682
more definitions changed
2019-12-05 17:08:09 +00:00
b766038810
new syntax after merge
2019-12-04 18:08:00 +00:00
cd9fd80a5d
merged in develop
2019-12-04 17:12:46 +00:00
e940f4db7e
removed unused parameter parity
2019-12-03 12:01:31 +00:00
7983ff2fdd
Merge branch 'develop' into feature/distil
...
* develop:
Change to reporting
NVCC timer support
Fix nocompilee under NVCC
--enable-summit flag
IBM summit optimisation. Synchronise in node is still btweeen 2 halves of AC922, so could be a little faster
Sliced propagator contraction was not producing any results because buf.size()=0
several typos in hadrons
2019-11-30 16:47:03 +00:00
2db814f2b7
Resolve conflicts in BaryonUtils (just use latest from develop)
2019-11-29 18:19:35 +00:00
799ff0c96e
speed-up
2019-11-26 15:28:47 +00:00
5fd5c25114
now two seperate functions for Eye and NonEye
2019-11-26 13:44:55 +00:00
feb1ff3494
Fix nocompilee under NVCC
2019-11-21 20:03:39 +00:00
421a4395af
Sigma to Nucleon contractions
2019-11-21 17:25:37 +00:00
22c654182a
Fixes for GPU compile
2019-11-04 17:24:34 +00:00