portelli/Grid - Grid - DiRAC Tursa git server

portelli/Grid

Fork 0

mirror of https://github.com/paboyle/Grid.git synced 2026-05-14 22:24:30 +01:00

Commit Graph

Author	SHA1	Message	Date
fionnoh	24128ff109	Changes needed for MF benchmark to work with comms correctly	2018-07-23 15:51:37 +01:00
paboyle	ec9939c1ba	Test for faster implementation of meson field inner loop This should be possible to cache block at outer levels, global sum across nodes not performed and deferred to caller to block them all into a big all reduce. Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether a product without the conjugate should be made available by Grid. It is not clear whether the explicit unroll, or the performing of conjugate on left once was the real source of the speed up. Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache. This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.	2018-07-10 12:38:51 +01:00

Author

SHA1

Message

Date

fionnoh

24128ff109

Changes needed for MF benchmark to work with comms correctly

2018-07-23 15:51:37 +01:00

paboyle

ec9939c1ba

Test for faster implementation of meson field inner loop

This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.

It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.

Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.

This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.

2018-07-10 12:38:51 +01:00

2 Commits