1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-09-20 09:15:38 +01:00
Grid/benchmarks
paboyle ec9939c1ba Test for faster implementation of meson field inner loop
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.

It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.

Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.

This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
..
Benchmark_comms.cc no threaded stencil benchmark if OpenMP is not supported 2018-05-03 16:20:01 +01:00
Benchmark_dwf_sweep.cc Adding support for general Nc in the benchmark outputs 2018-01-25 13:46:31 +01:00
Benchmark_dwf.cc Adding support for general Nc in the benchmark outputs 2018-01-25 13:46:31 +01:00
Benchmark_gparity.cc Adding support for general Nc in the benchmark outputs 2018-01-25 13:46:31 +01:00
Benchmark_IO.cc IO uses master boss node for metadata. 2018-03-30 16:17:05 +01:00
Benchmark_ITT.cc Benchmark improvements from tesseract 2018-04-27 11:44:46 +01:00
Benchmark_memory_asynch.cc Drop random device 2017-04-02 00:26:26 +09:00
Benchmark_memory_bandwidth.cc Improvement 2018-04-26 14:48:35 +01:00
Benchmark_meson_field.cc Test for faster implementation of meson field inner loop 2018-07-10 12:38:51 +01:00
Benchmark_mooee.cc Z mobius bmark 2016-12-18 00:55:37 +00:00
Benchmark_staggered.cc Splitting communicators first cut 2017-06-22 08:14:34 +01:00
Benchmark_su3.cc Performance of CovariantCshift now non-embarrassing. 2018-04-26 17:56:27 +01:00
Benchmark_wilson_sweep.cc Adding support for general Nc in the benchmark outputs 2018-01-25 13:46:31 +01:00
Benchmark_wilson.cc Adding utilities for perf profiling 2018-01-29 11:11:45 +01:00
Makefile.am Better check and benchmark driving 2017-05-05 19:54:38 +01:00
simple_simd_test.cc Makefile rule for simple_* objects 2016-11-19 01:33:13 +01:00
simple_su3_expr.cc Makefile rule for simple_* objects 2016-11-19 01:33:13 +01:00
simple_su3_test.cc generic 256bits SIMD 2016-11-15 12:16:15 +00:00