1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-14 05:07:05 +01:00
Commit Graph

14 Commits

Author SHA1 Message Date
1a4c8c3387 Global edit with change to View usage. autoView() creates a wrapper object that closes the view when scope closes. 2020-06-05 18:52:35 -04:00
a7abda89e2 View location & access mode 2020-05-21 16:13:59 -04:00
0561c2edeb Benchmarks modified for new GPU constructs 2019-06-15 12:52:56 +01:00
4e9df9e93c GPU patches 2019-05-18 17:43:11 +01:00
c43a2b599a GPU support 2019-01-01 15:07:29 +00:00
b57a4d32aa Merge branch 'develop' into feature/gpu-port 2018-12-13 05:11:34 +00:00
00b92a91b5 Optimising 2018-07-28 23:46:22 +01:00
65533741f7 7 moms 2018-07-28 16:17:47 +01:00
131a6785d4 Merge branch 'feature/hadrons-a2a' into feature/hadrons-a2a 2018-07-27 23:03:42 +01:00
44f4f5c8e2 Momentum loop 2018-07-27 23:00:16 +01:00
2679df034f Changes to meson field benchmark. Now includes the gammas in the final part of the naive method, both methods compute
lhs^dag*Gamma*rhs (previously Gamma*lhs^dag*rhs), and checks results.
2018-07-27 18:31:10 +01:00
71e1006ba8 Updated meson field benchmark for dirac structures 2018-07-26 09:09:29 +01:00
24128ff109 Changes needed for MF benchmark to work with comms correctly 2018-07-23 15:51:37 +01:00
ec9939c1ba Test for faster implementation of meson field inner loop
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.

It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.

Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.

This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00