fionnoh
8d1679c6b8
Merge branch 'feature/hadrons-a2a' of https://github.com/paboyle/Grid into a2a_basics
2018-08-03 15:12:24 +01:00
Peter Boyle
3791a38f7c
Optimised the MesonField a bit more
2018-08-01 08:27:27 +01:00
Peter Boyle
142f7b0c86
Updated the A2A Meson Field module
2018-07-31 15:58:02 +01:00
fionnoh
891ad66eab
Included changes to Hadrons RBPrecCG solver needed for subtraction of guess
2018-07-31 11:26:07 +01:00
Peter Boyle
60c43151c5
Merge branch 'feature/hadrons-a2a' of https://github.com/paboyle/Grid into feature/hadrons-a2a
2018-07-31 01:09:02 +01:00
paboyle
e036800261
Eigen fix
2018-07-31 01:08:42 +01:00
Peter Boyle
62900def36
Merge branch 'feature/hadrons-a2a' of https://github.com/paboyle/Grid into feature/hadrons-a2a
2018-07-31 00:36:26 +01:00
paboyle
e3a309a73f
Eigen happiness
2018-07-31 00:35:17 +01:00
fionnoh
ad6c1c0c4e
The basics of what is needed in Grid and Hadrons for the A2A class and module, with none of the contraction or MF code.
2018-07-30 18:40:50 +01:00
Peter Boyle
00b92a91b5
Optimising
2018-07-28 23:46:22 +01:00
paboyle
65533741f7
7 moms
2018-07-28 16:17:47 +01:00
Peter Boyle
dc0259fbda
Merge pull request #173 from fionnoh/feature/hadrons-a2a
...
Changes to meson field benchmark. Now includes the gammas in the fina…
2018-07-27 23:03:56 +01:00
Peter Boyle
131a6785d4
Merge branch 'feature/hadrons-a2a' into feature/hadrons-a2a
2018-07-27 23:03:42 +01:00
paboyle
44f4f5c8e2
Momentum loop
2018-07-27 23:00:16 +01:00
fionnoh
2679df034f
Changes to meson field benchmark. Now includes the gammas in the final part of the naive method, both methods compute
...
lhs^dag*Gamma*rhs (previously Gamma*lhs^dag*rhs), and checks results.
2018-07-27 18:31:10 +01:00
bf71162b97
Hadrons: backtrace on abort
2018-07-26 19:20:12 +01:00
299e828d83
Merge branch 'develop' into feature/hadrons
2018-07-26 16:49:49 +01:00
ef5452cddf
Hadrons: smarter memory profiler
2018-07-26 16:47:45 +01:00
80de748737
Hadrons: new exceptions which can save a integer
2018-07-26 16:47:25 +01:00
paboyle
71e1006ba8
Updated meson field benchmark for dirac structures
2018-07-26 09:09:29 +01:00
00f31ae83f
Merge pull request #163 from goracle/unstaged
...
Add printing of whether there are unstaged changes in the git hash print
2018-07-25 19:00:00 +00:00
cce339deaf
Merge pull request #172 from fionnoh/feature/hadrons
...
feature/hadrons -> feature/hadrons-a2a
2018-07-25 17:20:19 +00:00
fionnoh
24128ff109
Changes needed for MF benchmark to work with comms correctly
2018-07-23 15:51:37 +01:00
Peter Boyle
da17a015c7
Pack the stencil smaller for 128 bit access
2018-07-23 06:12:45 -04:00
Peter Boyle
1fd08c21ac
make simd width configure time option for GPU
2018-07-23 06:10:55 -04:00
Peter Boyle
28db0631ff
Hack to force 128bit accesses
2018-07-23 06:10:27 -04:00
Peter Boyle
b35401b86b
Fix CUDA_ARCH. Need to simplify. See when new eigen release happens
2018-07-23 06:09:33 -04:00
Peter Boyle
a0714de8ec
Define vector length for GPU
2018-07-23 06:09:05 -04:00
Peter Boyle
21a1710b43
Verbose vector length
2018-07-23 06:08:39 -04:00
fionnoh
34e9d3f0ca
Moved the creation and resizing of the v and w high modes from the A2A class to the A2A module and made them an output of the module. This means that they have to be inputs of the contration modules and they will freed from memory if they are no longer needed.
2018-07-22 14:40:31 +01:00
fionnoh
c995788259
Added ImportUnphysicalFermion and included appropriate logic for 5d w vectors in A2A code
2018-07-21 00:08:11 +01:00
fionnoh
94c7198001
Added ZFIMPL to A2AMeson contraction
2018-07-20 23:08:22 +01:00
fionnoh
04d86fe9f3
Removed overly verbose print statement
2018-07-20 21:38:19 +01:00
fionnoh
b78074b6a0
Removed a Dminus from high mode v and removed duplication pf D_oo code
2018-07-20 16:55:24 +01:00
fionnoh
7dfd3cdae8
Inclusion of ExportPhysicalFermionSource that fixes a bug in the low mode w vectors
2018-07-20 15:45:43 +01:00
fionnoh
cecee1ef2c
Merge branch 'develop' of github.com:paboyle/Grid into feature/hadrons
2018-07-20 13:37:50 +01:00
fionnoh
355d4b58be
Merge branch 'feature/hadrons' of github.com:fionnoh/Grid into feature/hadrons
2018-07-19 16:07:54 +01:00
fionnoh
2c54a536f3
Moved the meson field inner product to its own header file
2018-07-19 15:56:52 +01:00
fionnoh
d868a45120
Cleaned up some stuff that was erroneously included in a previous "trash" commit. Leaving in the mySliceInnerProdct function for now as it speeds up mesonfield creation quite a lot for 24^3 tests
2018-07-16 16:19:59 +01:00
fionnoh
9deae8c962
A2A meson field contraction code
2018-07-16 14:18:45 +01:00
Peter Boyle
b2b5137d28
Finally starting to get decent performance on Volta
2018-07-13 12:06:18 -04:00
fionnoh
db86cdd7bd
Possible trash commit
2018-07-10 13:30:45 +01:00
paboyle
ec9939c1ba
Test for faster implementation of meson field inner loop
...
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.
It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.
Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.
This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
Peter Boyle
2cc07450f4
Fastest option for the dslash
2018-07-05 09:57:55 -04:00
Peter Boyle
c0e8bc9da9
Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume.
2018-07-05 07:10:25 -04:00
Peter Boyle
b1265ae867
Prettify code
2018-07-05 07:08:06 -04:00
Peter Boyle
32bb85ea4c
Standard extractLane is fast
2018-07-05 07:07:30 -04:00
Peter Boyle
ca0607b6ef
Clearer kernel call meaning
2018-07-05 07:06:15 -04:00
Peter Boyle
19b527e83f
Better extract merge for GPU. Let the SIMD header files define the pointer type for
...
access. GPU redirects through builtin float2, double2 for complex
2018-07-05 07:05:13 -04:00
Peter Boyle
4730d4692a
Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks
2018-07-05 07:03:33 -04:00