1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-04-19 10:11:02 +01:00
Commit Graph

225 Commits

Author SHA1 Message Date
portelli 0ba3d469c7 Benchmark IO in single and double precision 2018-10-17 20:27:34 +01:00
portelli 291bc2a1f0 IO benchmark on a list of directories 2018-10-15 17:25:08 +01:00
portelli a15a2dfd29 Merge branch 'develop' into feature/hadrons 2018-08-10 16:08:22 +01:00
paboyle 27cdb79063 Sha used to seed from a unique string 2018-08-10 15:11:01 +01:00
Peter Boyle 00b92a91b5 Optimising 2018-07-28 23:46:22 +01:00
paboyle 65533741f7 7 moms 2018-07-28 16:17:47 +01:00
Peter Boyle 131a6785d4 Merge branch 'feature/hadrons-a2a' into feature/hadrons-a2a 2018-07-27 23:03:42 +01:00
paboyle 44f4f5c8e2 Momentum loop 2018-07-27 23:00:16 +01:00
fionnoh 2679df034f Changes to meson field benchmark. Now includes the gammas in the final part of the naive method, both methods compute
lhs^dag*Gamma*rhs (previously Gamma*lhs^dag*rhs), and checks results.
2018-07-27 18:31:10 +01:00
paboyle 71e1006ba8 Updated meson field benchmark for dirac structures 2018-07-26 09:09:29 +01:00
fionnoh 24128ff109 Changes needed for MF benchmark to work with comms correctly 2018-07-23 15:51:37 +01:00
paboyle ec9939c1ba Test for faster implementation of meson field inner loop
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.

It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.

Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.

This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
portelli bfbf2f1fa0 no threaded stencil benchmark if OpenMP is not supported 2018-05-03 16:20:01 +01:00
Dr Peter Boyle 1dddd17e3c Benchmark improvements from tesseract 2018-04-27 11:44:46 +01:00
Peter Boyle fa0d8feff4 Performance of CovariantCshift now non-embarrassing. 2018-04-26 17:56:27 +01:00
Peter Boyle 05b44aef6b Merge branch 'develop' of https://github.com/paboyle/Grid into develop
Conflicts:
	benchmarks/Benchmark_su3.cc
2018-04-26 15:38:49 +01:00
Peter Boyle 91a0a3f820 Improvement 2018-04-26 14:48:35 +01:00
Peter Boyle 8f44c799a6 Saving the benchmarking tests for Cshift 2018-04-26 14:48:03 +01:00
Guido Cossu 43f5a0df50 More timers in the integrator 2018-04-26 12:01:56 +09:00
paboyle 2baf193031 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-04-25 00:14:03 +01:00
paboyle 362ba0443a Cshift updates 2018-04-25 00:12:11 +01:00
Guido Cossu c5b9147b53 Correction of a minor bug in the su3 benchmark 2018-04-24 08:03:57 -07:00
Guido Cossu a1be533329 Corrected Flop count in Benchmark su3 and expanded the Wilson flow output 2018-04-24 01:19:53 -07:00
paboyle b5510427f9 physical fermion interface, cshift benchmark in SU3. 2018-04-18 01:43:29 +01:00
paboyle 276f113f28 IO uses master boss node for metadata. 2018-03-30 16:17:05 +01:00
paboyle ab6afd18ac Still compile if no LIME 2018-03-30 13:39:20 +01:00
portelli c5a885dcd6 I/O benchmark 2018-03-29 19:57:41 +01:00
Guido Cossu fb24e3a7d2 Adding utilities for perf profiling 2018-01-29 11:11:45 +01:00
Guido Cossu cff3bae155 Adding support for general Nc in the benchmark outputs 2018-01-25 13:46:31 +01:00
paboyle 9b32d51cd1 Simplify comms layer proliferatoin 2018-01-08 11:27:14 +00:00
paboyle 4f8b6f26b4 Merge branch 'develop' into feature/dwf-multirhs 2017-10-02 11:41:49 +01:00
Peter Boyle bfb68e6f02 Merge pull request #130 from giltirn/gparity-handunroll
Gparity handunroll
2017-09-21 10:11:00 +01:00
paboyle 17c5b0f152 Patching comparison point 2017-09-16 18:18:07 +01:00
Peter Boyle b331be9101 Better reporting 2017-08-31 11:32:57 +01:00
Peter Boyle 49c20a9fa8 Patch to reporting 2017-08-31 11:32:21 +01:00
paboyle 7359df3501 Full reporting for benchmark; save robustness factor 2017-08-31 10:42:35 +01:00
Christopher Kelly d36d2fb40d Added ability to override default Ls in Benchmark_dwf 2017-08-28 06:53:56 -07:00
Peter Boyle 5b9267e88d Cleaner comms benchmark treatment for one node runs 2017-08-27 18:24:48 -04:00
paboyle 15fd4003ef Improving presentation of results 2017-08-27 13:46:02 +01:00
paboyle ad89abb018 Fix 2017-08-25 20:43:37 +01:00
paboyle 80c5bce5bb Merge branch 'develop' into feature/multi-communicator 2017-08-25 20:21:26 +01:00
Peter Boyle d0f3d525d5 Optimal block size for KNL 2017-08-25 19:33:54 +01:00
Peter Boyle 3a58217405 Updated 2017-08-25 14:29:53 +01:00
Peter Boyle c289699d9a updated from cambridge mpi3 shakeout 2017-08-25 11:41:01 +01:00
Peter Boyle c3b1263e75 Benchmark prep 2017-08-25 09:25:54 +01:00
Christopher Kelly edabb3577f Imported Benchmark_gparity 2017-08-23 16:54:06 -04:00
paboyle ae56e556c6 finalise issue on new OPA revert 2017-08-20 02:53:12 +01:00
paboyle 383ca7d392 Switch off comms for now until feature/multi-communicator is merged 2017-08-20 01:27:48 +01:00
paboyle a446d95c33 Trying to pass TeamCity and Travis 2017-08-20 01:10:50 +01:00
paboyle be66e7dd95 Merge branch 'develop' into feature/multi-communicator 2017-08-19 23:12:38 +01:00