1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-04-19 10:11:02 +01:00
Commit Graph

5998 Commits

Author SHA1 Message Date
Peter Boyle 28db0631ff Hack to force 128bit accesses 2018-07-23 06:10:27 -04:00
Peter Boyle b35401b86b Fix CUDA_ARCH. Need to simplify. See when new eigen release happens 2018-07-23 06:09:33 -04:00
Peter Boyle a0714de8ec Define vector length for GPU 2018-07-23 06:09:05 -04:00
Peter Boyle 21a1710b43 Verbose vector length 2018-07-23 06:08:39 -04:00
fionnoh 34e9d3f0ca Moved the creation and resizing of the v and w high modes from the A2A class to the A2A module and made them an output of the module. This means that they have to be inputs of the contration modules and they will freed from memory if they are no longer needed. 2018-07-22 14:40:31 +01:00
fionnoh c995788259 Added ImportUnphysicalFermion and included appropriate logic for 5d w vectors in A2A code 2018-07-21 00:08:11 +01:00
fionnoh 94c7198001 Added ZFIMPL to A2AMeson contraction 2018-07-20 23:08:22 +01:00
fionnoh 04d86fe9f3 Removed overly verbose print statement 2018-07-20 21:38:19 +01:00
fionnoh b78074b6a0 Removed a Dminus from high mode v and removed duplication pf D_oo code 2018-07-20 16:55:24 +01:00
fionnoh 7dfd3cdae8 Inclusion of ExportPhysicalFermionSource that fixes a bug in the low mode w vectors 2018-07-20 15:45:43 +01:00
fionnoh cecee1ef2c Merge branch 'develop' of github.com:paboyle/Grid into feature/hadrons 2018-07-20 13:37:50 +01:00
fionnoh 355d4b58be Merge branch 'feature/hadrons' of github.com:fionnoh/Grid into feature/hadrons 2018-07-19 16:07:54 +01:00
fionnoh 2c54a536f3 Moved the meson field inner product to its own header file 2018-07-19 15:56:52 +01:00
fionnoh d868a45120 Cleaned up some stuff that was erroneously included in a previous "trash" commit. Leaving in the mySliceInnerProdct function for now as it speeds up mesonfield creation quite a lot for 24^3 tests 2018-07-16 16:19:59 +01:00
fionnoh 9deae8c962 A2A meson field contraction code 2018-07-16 14:18:45 +01:00
Peter Boyle b2b5137d28 Finally starting to get decent performance on Volta 2018-07-13 12:06:18 -04:00
fionnoh db86cdd7bd Possible trash commit 2018-07-10 13:30:45 +01:00
paboyle ec9939c1ba Test for faster implementation of meson field inner loop
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.

It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.

Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.

This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
Peter Boyle 2cc07450f4 Fastest option for the dslash 2018-07-05 09:57:55 -04:00
Peter Boyle c0e8bc9da9 Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume. 2018-07-05 07:10:25 -04:00
Peter Boyle b1265ae867 Prettify code 2018-07-05 07:08:06 -04:00
Peter Boyle 32bb85ea4c Standard extractLane is fast 2018-07-05 07:07:30 -04:00
Peter Boyle ca0607b6ef Clearer kernel call meaning 2018-07-05 07:06:15 -04:00
Peter Boyle 19b527e83f Better extract merge for GPU. Let the SIMD header files define the pointer type for
access. GPU redirects through builtin float2, double2 for complex
2018-07-05 07:05:13 -04:00
Peter Boyle 4730d4692a Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks 2018-07-05 07:03:33 -04:00
Peter Boyle 1bb456c0c5 Minor GPU vector width change 2018-07-05 07:02:04 -04:00
Peter Boyle 4b04ae3611 Printing improvement 2018-07-05 06:59:38 -04:00
Peter Boyle 2f776d51c6 Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
but a bitof (known) work.
2018-07-05 06:58:37 -04:00
fionnoh f74617c124 Added ZFIMPL to meson field module 2018-07-03 14:04:53 +01:00
fionnoh 8c6a3921ed Merge remote-tracking branch 'upstream/feature/hadrons' into feature/hadrons 2018-07-03 11:35:14 +01:00
portelli a8a15dd9d0 Hadrons: code cleaning 2018-07-02 17:52:39 +01:00
portelli 3ce68a751a Hadrons: stout smearing module 2018-07-02 17:52:04 +01:00
fionnoh daa0977d01 Included a print statement that indicates that the guess is being subtracted from the solve. 2018-06-28 16:34:56 +01:00
fionnoh a2929f4384 Removed A2A contraction module and replaced it with the beginnings of a meson field module 2018-06-28 16:17:26 +01:00
fionnoh 7fe3974c0a Included eigenPacks and action as references, not inputs, of A2A module. They now now longer need to be parameters in the meson field modules. 2018-06-28 16:14:49 +01:00
fionnoh f7e86f81a0 Changes A2A class to make use of the new Solver class 2018-06-28 16:14:16 +01:00
fionnoh fecec803d9 Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons 2018-06-28 16:13:43 +01:00
fionnoh 8fe9a13cdd Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons 2018-06-28 16:13:07 +01:00
paboyle 3a50afe7e7 GPU dslash updates 2018-06-27 22:32:21 +01:00
paboyle f8e880b445 Loop for s and xyzt offlow 2018-06-27 21:49:57 +01:00
paboyle 3e947527cb Move looping over "s" and "site" into kernels for GPU optimisatoin 2018-06-27 21:29:43 +01:00
paboyle 31f65beac8 Move site and Ls looping into the kernels 2018-06-27 21:28:48 +01:00
paboyle 38e2a32ac9 Single SIMD lane operations for CUDA 2018-06-27 21:28:06 +01:00
paboyle efa84ca50a Keep Cuda 9.1 happy 2018-06-27 21:27:32 +01:00
paboyle 5e96d6d04c Keep CUDA happy 2018-06-27 21:27:11 +01:00
paboyle df30bdc599 CUDA happy 2018-06-27 21:26:49 +01:00
paboyle 7f45222924 Diagnostics on memory alloc fail 2018-06-27 21:26:20 +01:00
paboyle dd891f5e3b Use NVCC to suppress device Eigen 2018-06-27 21:25:17 +01:00
portelli d2c42e6f42 Hadrons: scaled DWF action 2018-06-26 14:59:33 +01:00
Daniel Richtmann 2881b3e8e5 WilsonMG: Remove unnecessary static assertions 2018-06-26 14:42:30 +02:00