1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-09-20 01:05:38 +01:00
Commit Graph

2 Commits

Author SHA1 Message Date
fionnoh
24128ff109 Changes needed for MF benchmark to work with comms correctly 2018-07-23 15:51:37 +01:00
paboyle
ec9939c1ba Test for faster implementation of meson field inner loop
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.

It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.

Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.

This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00