da17a015c7
Pack the stencil smaller for 128 bit access
2018-07-23 06:12:45 -04:00
1fd08c21ac
make simd width configure time option for GPU
2018-07-23 06:10:55 -04:00
28db0631ff
Hack to force 128bit accesses
2018-07-23 06:10:27 -04:00
b35401b86b
Fix CUDA_ARCH. Need to simplify. See when new eigen release happens
2018-07-23 06:09:33 -04:00
a0714de8ec
Define vector length for GPU
2018-07-23 06:09:05 -04:00
21a1710b43
Verbose vector length
2018-07-23 06:08:39 -04:00
34e9d3f0ca
Moved the creation and resizing of the v and w high modes from the A2A class to the A2A module and made them an output of the module. This means that they have to be inputs of the contration modules and they will freed from memory if they are no longer needed.
2018-07-22 14:40:31 +01:00
c995788259
Added ImportUnphysicalFermion and included appropriate logic for 5d w vectors in A2A code
2018-07-21 00:08:11 +01:00
94c7198001
Added ZFIMPL to A2AMeson contraction
2018-07-20 23:08:22 +01:00
04d86fe9f3
Removed overly verbose print statement
2018-07-20 21:38:19 +01:00
b78074b6a0
Removed a Dminus from high mode v and removed duplication pf D_oo code
2018-07-20 16:55:24 +01:00
7dfd3cdae8
Inclusion of ExportPhysicalFermionSource that fixes a bug in the low mode w vectors
2018-07-20 15:45:43 +01:00
cecee1ef2c
Merge branch 'develop' of github.com:paboyle/Grid into feature/hadrons
2018-07-20 13:37:50 +01:00
355d4b58be
Merge branch 'feature/hadrons' of github.com:fionnoh/Grid into feature/hadrons
2018-07-19 16:07:54 +01:00
2c54a536f3
Moved the meson field inner product to its own header file
2018-07-19 15:56:52 +01:00
d868a45120
Cleaned up some stuff that was erroneously included in a previous "trash" commit. Leaving in the mySliceInnerProdct function for now as it speeds up mesonfield creation quite a lot for 24^3 tests
2018-07-16 16:19:59 +01:00
9deae8c962
A2A meson field contraction code
2018-07-16 14:18:45 +01:00
b2b5137d28
Finally starting to get decent performance on Volta
2018-07-13 12:06:18 -04:00
db86cdd7bd
Possible trash commit
2018-07-10 13:30:45 +01:00
ec9939c1ba
Test for faster implementation of meson field inner loop
...
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.
It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.
Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.
This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
2cc07450f4
Fastest option for the dslash
2018-07-05 09:57:55 -04:00
c0e8bc9da9
Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume.
2018-07-05 07:10:25 -04:00
b1265ae867
Prettify code
2018-07-05 07:08:06 -04:00
32bb85ea4c
Standard extractLane is fast
2018-07-05 07:07:30 -04:00
ca0607b6ef
Clearer kernel call meaning
2018-07-05 07:06:15 -04:00
19b527e83f
Better extract merge for GPU. Let the SIMD header files define the pointer type for
...
access. GPU redirects through builtin float2, double2 for complex
2018-07-05 07:05:13 -04:00
4730d4692a
Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks
2018-07-05 07:03:33 -04:00
1bb456c0c5
Minor GPU vector width changeÂ
2018-07-05 07:02:04 -04:00
4b04ae3611
Printing improvement
2018-07-05 06:59:38 -04:00
2f776d51c6
Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
...
but a bitof (known) work.
2018-07-05 06:58:37 -04:00
f74617c124
Added ZFIMPL to meson field module
2018-07-03 14:04:53 +01:00
8c6a3921ed
Merge remote-tracking branch 'upstream/feature/hadrons' into feature/hadrons
2018-07-03 11:35:14 +01:00
a8a15dd9d0
Hadrons: code cleaning
2018-07-02 17:52:39 +01:00
3ce68a751a
Hadrons: stout smearing module
2018-07-02 17:52:04 +01:00
daa0977d01
Included a print statement that indicates that the guess is being subtracted from the solve.
2018-06-28 16:34:56 +01:00
a2929f4384
Removed A2A contraction module and replaced it with the beginnings of a meson field module
2018-06-28 16:17:26 +01:00
7fe3974c0a
Included eigenPacks and action as references, not inputs, of A2A module. They now now longer need to be parameters in the meson field modules.
2018-06-28 16:14:49 +01:00
f7e86f81a0
Changes A2A class to make use of the new Solver class
2018-06-28 16:14:16 +01:00
fecec803d9
Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons
2018-06-28 16:13:43 +01:00
8fe9a13cdd
Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons
2018-06-28 16:13:07 +01:00
3a50afe7e7
GPU dslash updates
2018-06-27 22:32:21 +01:00
f8e880b445
Loop for s and xyzt offlow
2018-06-27 21:49:57 +01:00
3e947527cb
Move looping over "s" and "site" into kernels for GPU optimisatoin
2018-06-27 21:29:43 +01:00
31f65beac8
Move site and Ls looping into the kernels
2018-06-27 21:28:48 +01:00
38e2a32ac9
Single SIMD lane operations for CUDA
2018-06-27 21:28:06 +01:00
efa84ca50a
Keep Cuda 9.1 happy
2018-06-27 21:27:32 +01:00
5e96d6d04c
Keep CUDA happy
2018-06-27 21:27:11 +01:00
df30bdc599
CUDA happy
2018-06-27 21:26:49 +01:00
7f45222924
Diagnostics on memory alloc fail
2018-06-27 21:26:20 +01:00
dd891f5e3b
Use NVCC to suppress device Eigen
2018-06-27 21:25:17 +01:00