fionnoh
24128ff109
Changes needed for MF benchmark to work with comms correctly
2018-07-23 15:51:37 +01:00
Peter Boyle
da17a015c7
Pack the stencil smaller for 128 bit access
2018-07-23 06:12:45 -04:00
Peter Boyle
1fd08c21ac
make simd width configure time option for GPU
2018-07-23 06:10:55 -04:00
Peter Boyle
28db0631ff
Hack to force 128bit accesses
2018-07-23 06:10:27 -04:00
Peter Boyle
b35401b86b
Fix CUDA_ARCH. Need to simplify. See when new eigen release happens
2018-07-23 06:09:33 -04:00
Peter Boyle
a0714de8ec
Define vector length for GPU
2018-07-23 06:09:05 -04:00
Peter Boyle
21a1710b43
Verbose vector length
2018-07-23 06:08:39 -04:00
fionnoh
34e9d3f0ca
Moved the creation and resizing of the v and w high modes from the A2A class to the A2A module and made them an output of the module. This means that they have to be inputs of the contration modules and they will freed from memory if they are no longer needed.
2018-07-22 14:40:31 +01:00
fionnoh
c995788259
Added ImportUnphysicalFermion and included appropriate logic for 5d w vectors in A2A code
2018-07-21 00:08:11 +01:00
fionnoh
94c7198001
Added ZFIMPL to A2AMeson contraction
2018-07-20 23:08:22 +01:00
fionnoh
04d86fe9f3
Removed overly verbose print statement
2018-07-20 21:38:19 +01:00
fionnoh
b78074b6a0
Removed a Dminus from high mode v and removed duplication pf D_oo code
2018-07-20 16:55:24 +01:00
fionnoh
7dfd3cdae8
Inclusion of ExportPhysicalFermionSource that fixes a bug in the low mode w vectors
2018-07-20 15:45:43 +01:00
fionnoh
cecee1ef2c
Merge branch 'develop' of github.com:paboyle/Grid into feature/hadrons
2018-07-20 13:37:50 +01:00
fionnoh
355d4b58be
Merge branch 'feature/hadrons' of github.com:fionnoh/Grid into feature/hadrons
2018-07-19 16:07:54 +01:00
fionnoh
2c54a536f3
Moved the meson field inner product to its own header file
2018-07-19 15:56:52 +01:00
fionnoh
d868a45120
Cleaned up some stuff that was erroneously included in a previous "trash" commit. Leaving in the mySliceInnerProdct function for now as it speeds up mesonfield creation quite a lot for 24^3 tests
2018-07-16 16:19:59 +01:00
fionnoh
9deae8c962
A2A meson field contraction code
2018-07-16 14:18:45 +01:00
Peter Boyle
b2b5137d28
Finally starting to get decent performance on Volta
2018-07-13 12:06:18 -04:00
fionnoh
db86cdd7bd
Possible trash commit
2018-07-10 13:30:45 +01:00
paboyle
ec9939c1ba
Test for faster implementation of meson field inner loop
...
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.
It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.
Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.
This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
Peter Boyle
2cc07450f4
Fastest option for the dslash
2018-07-05 09:57:55 -04:00
Peter Boyle
c0e8bc9da9
Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume.
2018-07-05 07:10:25 -04:00
Peter Boyle
b1265ae867
Prettify code
2018-07-05 07:08:06 -04:00
Peter Boyle
32bb85ea4c
Standard extractLane is fast
2018-07-05 07:07:30 -04:00
Peter Boyle
ca0607b6ef
Clearer kernel call meaning
2018-07-05 07:06:15 -04:00
Peter Boyle
19b527e83f
Better extract merge for GPU. Let the SIMD header files define the pointer type for
...
access. GPU redirects through builtin float2, double2 for complex
2018-07-05 07:05:13 -04:00
Peter Boyle
4730d4692a
Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks
2018-07-05 07:03:33 -04:00
Peter Boyle
1bb456c0c5
Minor GPU vector width changeÂ
2018-07-05 07:02:04 -04:00
Peter Boyle
4b04ae3611
Printing improvement
2018-07-05 06:59:38 -04:00
Peter Boyle
2f776d51c6
Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
...
but a bitof (known) work.
2018-07-05 06:58:37 -04:00
fionnoh
f74617c124
Added ZFIMPL to meson field module
2018-07-03 14:04:53 +01:00
fionnoh
8c6a3921ed
Merge remote-tracking branch 'upstream/feature/hadrons' into feature/hadrons
2018-07-03 11:35:14 +01:00
a8a15dd9d0
Hadrons: code cleaning
2018-07-02 17:52:39 +01:00
3ce68a751a
Hadrons: stout smearing module
2018-07-02 17:52:04 +01:00
fionnoh
daa0977d01
Included a print statement that indicates that the guess is being subtracted from the solve.
2018-06-28 16:34:56 +01:00
fionnoh
a2929f4384
Removed A2A contraction module and replaced it with the beginnings of a meson field module
2018-06-28 16:17:26 +01:00
fionnoh
7fe3974c0a
Included eigenPacks and action as references, not inputs, of A2A module. They now now longer need to be parameters in the meson field modules.
2018-06-28 16:14:49 +01:00
fionnoh
f7e86f81a0
Changes A2A class to make use of the new Solver class
2018-06-28 16:14:16 +01:00
fionnoh
fecec803d9
Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons
2018-06-28 16:13:43 +01:00
fionnoh
8fe9a13cdd
Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons
2018-06-28 16:13:07 +01:00
paboyle
3a50afe7e7
GPU dslash updates
2018-06-27 22:32:21 +01:00
paboyle
f8e880b445
Loop for s and xyzt offlow
2018-06-27 21:49:57 +01:00
paboyle
3e947527cb
Move looping over "s" and "site" into kernels for GPU optimisatoin
2018-06-27 21:29:43 +01:00
paboyle
31f65beac8
Move site and Ls looping into the kernels
2018-06-27 21:28:48 +01:00
paboyle
38e2a32ac9
Single SIMD lane operations for CUDA
2018-06-27 21:28:06 +01:00
paboyle
efa84ca50a
Keep Cuda 9.1 happy
2018-06-27 21:27:32 +01:00
paboyle
5e96d6d04c
Keep CUDA happy
2018-06-27 21:27:11 +01:00
paboyle
df30bdc599
CUDA happy
2018-06-27 21:26:49 +01:00
paboyle
7f45222924
Diagnostics on memory alloc fail
2018-06-27 21:26:20 +01:00