paboyle
ec9939c1ba
Test for faster implementation of meson field inner loop
...
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.
It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.
Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.
This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
bfbf2f1fa0
no threaded stencil benchmark if OpenMP is not supported
2018-05-03 16:20:01 +01:00
Dr Peter Boyle
1dddd17e3c
Benchmark improvements from tesseract
2018-04-27 11:44:46 +01:00
Peter Boyle
fa0d8feff4
Performance of CovariantCshift now non-embarrassing.
2018-04-26 17:56:27 +01:00
Peter Boyle
05b44aef6b
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
...
Conflicts:
benchmarks/Benchmark_su3.cc
2018-04-26 15:38:49 +01:00
Peter Boyle
91a0a3f820
Improvement
2018-04-26 14:48:35 +01:00
Peter Boyle
8f44c799a6
Saving the benchmarking tests for Cshift
2018-04-26 14:48:03 +01:00
Guido Cossu
43f5a0df50
More timers in the integrator
2018-04-26 12:01:56 +09:00
paboyle
2baf193031
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2018-04-25 00:14:03 +01:00
paboyle
362ba0443a
Cshift updates
2018-04-25 00:12:11 +01:00
Guido Cossu
c5b9147b53
Correction of a minor bug in the su3 benchmark
2018-04-24 08:03:57 -07:00
Guido Cossu
a1be533329
Corrected Flop count in Benchmark su3 and expanded the Wilson flow output
2018-04-24 01:19:53 -07:00
paboyle
b5510427f9
physical fermion interface, cshift benchmark in SU3.
2018-04-18 01:43:29 +01:00
paboyle
276f113f28
IO uses master boss node for metadata.
2018-03-30 16:17:05 +01:00
paboyle
ab6afd18ac
Still compile if no LIME
2018-03-30 13:39:20 +01:00
c5a885dcd6
I/O benchmark
2018-03-29 19:57:41 +01:00
Guido Cossu
fb24e3a7d2
Adding utilities for perf profiling
2018-01-29 11:11:45 +01:00
Guido Cossu
cff3bae155
Adding support for general Nc in the benchmark outputs
2018-01-25 13:46:31 +01:00
paboyle
9b32d51cd1
Simplify comms layer proliferatoin
2018-01-08 11:27:14 +00:00
paboyle
4f8b6f26b4
Merge branch 'develop' into feature/dwf-multirhs
2017-10-02 11:41:49 +01:00
Peter Boyle
bfb68e6f02
Merge pull request #130 from giltirn/gparity-handunroll
...
Gparity handunroll
2017-09-21 10:11:00 +01:00
paboyle
17c5b0f152
Patching comparison point
2017-09-16 18:18:07 +01:00
Peter Boyle
b331be9101
Better reporting
2017-08-31 11:32:57 +01:00
Peter Boyle
49c20a9fa8
Patch to reporting
2017-08-31 11:32:21 +01:00
paboyle
7359df3501
Full reporting for benchmark; save robustness factor
2017-08-31 10:42:35 +01:00
Christopher Kelly
d36d2fb40d
Added ability to override default Ls in Benchmark_dwf
2017-08-28 06:53:56 -07:00
Peter Boyle
5b9267e88d
Cleaner comms benchmark treatment for one node runs
2017-08-27 18:24:48 -04:00
paboyle
15fd4003ef
Improving presentation of results
2017-08-27 13:46:02 +01:00
paboyle
ad89abb018
Fix
2017-08-25 20:43:37 +01:00
paboyle
80c5bce5bb
Merge branch 'develop' into feature/multi-communicator
2017-08-25 20:21:26 +01:00
Peter Boyle
d0f3d525d5
Optimal block size for KNL
2017-08-25 19:33:54 +01:00
Peter Boyle
3a58217405
Updated
2017-08-25 14:29:53 +01:00
Peter Boyle
c289699d9a
updated from cambridge mpi3 shakeout
2017-08-25 11:41:01 +01:00
Peter Boyle
c3b1263e75
Benchmark prep
2017-08-25 09:25:54 +01:00
Christopher Kelly
edabb3577f
Imported Benchmark_gparity
2017-08-23 16:54:06 -04:00
paboyle
ae56e556c6
finalise issue on new OPA revert
2017-08-20 02:53:12 +01:00
paboyle
383ca7d392
Switch off comms for now until feature/multi-communicator is merged
2017-08-20 01:27:48 +01:00
paboyle
a446d95c33
Trying to pass TeamCity and Travis
2017-08-20 01:10:50 +01:00
paboyle
be66e7dd95
Merge branch 'develop' into feature/multi-communicator
2017-08-19 23:12:38 +01:00
paboyle
bfef525ed2
New benchmark prep
2017-08-19 23:10:12 +01:00
Peter Boyle
7d88198387
Merge branch 'develop' into feature/multi-communicator
2017-08-19 13:03:35 -04:00
Peter Boyle
9e658de238
Use Vector
2017-08-19 12:52:44 -04:00
Peter Boyle
14d53e1c9e
Threaded MPI calls patches
2017-07-29 13:08:10 -04:00
Peter Boyle
40e119c61c
NUMA improvements worth preserving from AMD EPYC tests
2017-07-08 22:27:11 -04:00
Peter Boyle
b73bd151bb
Switch off counters by default
2017-06-30 10:16:35 +01:00
Peter Boyle
694b305cab
Update to reporting
2017-06-30 10:16:13 +01:00
paboyle
6f5a5cd9b3
Improved threaded comms benchmark
2017-06-28 23:27:02 +01:00
Peter Boyle
08e04b9676
Better benchmarks
2017-06-28 15:30:06 +01:00
paboyle
54e94360ad
Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit
2017-06-24 23:10:24 +01:00
paboyle
6ebf9f15b7
Splitting communicators first cut
2017-06-22 08:14:34 +01:00