Grid/benchmarks at cecee1ef2cc2c67741aecf5abf2a060cf363d307 - Grid

mirror of https://github.com/paboyle/Grid.git synced 2026-07-09 11:53:29 +01:00

Files

T

paboyle ec9939c1ba Test for faster implementation of meson field inner loop

This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.

It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.

Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.

This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.

2018-07-10 12:38:51 +01:00

Benchmark_comms.cc

no threaded stencil benchmark if OpenMP is not supported

2018-05-03 16:20:01 +01:00

Benchmark_dwf_sweep.cc

Adding support for general Nc in the benchmark outputs

2018-01-25 13:46:31 +01:00

Benchmark_dwf.cc

Adding support for general Nc in the benchmark outputs

2018-01-25 13:46:31 +01:00

Benchmark_gparity.cc

Adding support for general Nc in the benchmark outputs