Peter Boyle
b2b5137d28
Finally starting to get decent performance on Volta
2018-07-13 12:06:18 -04:00
fionnoh
db86cdd7bd
Possible trash commit
2018-07-10 13:30:45 +01:00
paboyle
ec9939c1ba
Test for faster implementation of meson field inner loop
...
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.
It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.
Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.
This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
Peter Boyle
2cc07450f4
Fastest option for the dslash
2018-07-05 09:57:55 -04:00
Peter Boyle
c0e8bc9da9
Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume.
2018-07-05 07:10:25 -04:00
Peter Boyle
b1265ae867
Prettify code
2018-07-05 07:08:06 -04:00
Peter Boyle
32bb85ea4c
Standard extractLane is fast
2018-07-05 07:07:30 -04:00
Peter Boyle
ca0607b6ef
Clearer kernel call meaning
2018-07-05 07:06:15 -04:00
Peter Boyle
19b527e83f
Better extract merge for GPU. Let the SIMD header files define the pointer type for
...
access. GPU redirects through builtin float2, double2 for complex
2018-07-05 07:05:13 -04:00
Peter Boyle
4730d4692a
Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks
2018-07-05 07:03:33 -04:00
Peter Boyle
1bb456c0c5
Minor GPU vector width changeÂ
2018-07-05 07:02:04 -04:00
Peter Boyle
4b04ae3611
Printing improvement
2018-07-05 06:59:38 -04:00
Peter Boyle
2f776d51c6
Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
...
but a bitof (known) work.
2018-07-05 06:58:37 -04:00
fionnoh
f74617c124
Added ZFIMPL to meson field module
2018-07-03 14:04:53 +01:00
fionnoh
8c6a3921ed
Merge remote-tracking branch 'upstream/feature/hadrons' into feature/hadrons
2018-07-03 11:35:14 +01:00
a8a15dd9d0
Hadrons: code cleaning
2018-07-02 17:52:39 +01:00
3ce68a751a
Hadrons: stout smearing module
2018-07-02 17:52:04 +01:00
fionnoh
daa0977d01
Included a print statement that indicates that the guess is being subtracted from the solve.
2018-06-28 16:34:56 +01:00
fionnoh
a2929f4384
Removed A2A contraction module and replaced it with the beginnings of a meson field module
2018-06-28 16:17:26 +01:00
fionnoh
7fe3974c0a
Included eigenPacks and action as references, not inputs, of A2A module. They now now longer need to be parameters in the meson field modules.
2018-06-28 16:14:49 +01:00
fionnoh
f7e86f81a0
Changes A2A class to make use of the new Solver class
2018-06-28 16:14:16 +01:00
fionnoh
fecec803d9
Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons
2018-06-28 16:13:43 +01:00
fionnoh
8fe9a13cdd
Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons
2018-06-28 16:13:07 +01:00
paboyle
3a50afe7e7
GPU dslash updates
2018-06-27 22:32:21 +01:00
paboyle
f8e880b445
Loop for s and xyzt offlow
2018-06-27 21:49:57 +01:00
paboyle
3e947527cb
Move looping over "s" and "site" into kernels for GPU optimisatoin
2018-06-27 21:29:43 +01:00
paboyle
31f65beac8
Move site and Ls looping into the kernels
2018-06-27 21:28:48 +01:00
paboyle
38e2a32ac9
Single SIMD lane operations for CUDA
2018-06-27 21:28:06 +01:00
paboyle
efa84ca50a
Keep Cuda 9.1 happy
2018-06-27 21:27:32 +01:00
paboyle
5e96d6d04c
Keep CUDA happy
2018-06-27 21:27:11 +01:00
paboyle
df30bdc599
CUDA happy
2018-06-27 21:26:49 +01:00
paboyle
7f45222924
Diagnostics on memory alloc fail
2018-06-27 21:26:20 +01:00
paboyle
dd891f5e3b
Use NVCC to suppress device Eigen
2018-06-27 21:25:17 +01:00
d2c42e6f42
Hadrons: scaled DWF action
2018-06-26 14:59:33 +01:00
Daniel Richtmann
2881b3e8e5
WilsonMG: Remove unnecessary static assertions
2018-06-26 14:42:30 +02:00
049cc518f4
Hadrons: introduction message 2
2018-06-25 19:08:39 +01:00
2e1c66897f
Hadrons: introduction message
2018-06-25 19:08:22 +01:00
adcef36189
Hadrons: Möbius DWF action
2018-06-25 15:58:35 +01:00
fionnoh
2f121c41c9
Commiting reation of meson field code before a merge with the upstream branch feature/hadrons
2018-06-25 12:20:46 +01:00
e0ed7e300f
Hadrons: spurious Dminus removed
2018-06-22 16:33:43 +02:00
485207901b
Merge branch 'develop' into feature/hadrons
2018-06-22 16:15:32 +02:00
c760f0a4c3
Hadrons: remove make_5D/4D functions and FreeProp fix
2018-06-22 16:12:46 +02:00
c84eeedec3
Hadrons: GaugeProp module for z-Wilson actions
2018-06-22 15:53:22 +02:00
fionnoh
1ac3526f33
Small changes to the A2A header and module
2018-06-22 12:29:42 +01:00
fionnoh
0de090ee74
Temporarily added in the contraction code that produced the working 2-pt function. This is commited for reference only and will be removed in the next push.
2018-06-22 12:28:41 +01:00
91405de3f7
Hadrons: new solver exposing fermion matrix and generic source/solve import/export
2018-06-22 12:14:37 +02:00
fionnoh
8fccda301a
Fixed a bug where the guess was always subtracted after the solve and included appropriate weights for the sources in the one case we're looking at now. More work needs to be done to make the 5d/4d source logic less brittle.
2018-06-21 16:36:59 +01:00
fionnoh
7a0abfac89
Restructured the class that computes and returns the A2A vectors.
2018-06-21 16:36:06 +01:00
fionnoh
ae37fda699
A more elegant way to subtract guesses from solve and a bool check before verifying residual
2018-06-20 16:07:40 +01:00
fionnoh
b5fc5e2030
All to all module update that hit a promising milestone. Commiting for a reference for future changes.
2018-06-20 10:59:07 +01:00