1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-11 03:46:55 +01:00
Commit Graph

334 Commits

Author SHA1 Message Date
21a1710b43 Verbose vector length 2018-07-23 06:08:39 -04:00
ec9939c1ba Test for faster implementation of meson field inner loop
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.

It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.

Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.

This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
4b04ae3611 Printing improvement 2018-07-05 06:59:38 -04:00
2f776d51c6 Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
but a bitof (known) work.
2018-07-05 06:58:37 -04:00
25becc9324 GPU tweaks for benchmarking; really necessary? 2018-06-13 20:26:07 +01:00
eb921041d0 Perf count control 2018-05-12 17:57:32 -04:00
bfbf2f1fa0 no threaded stencil benchmark if OpenMP is not supported 2018-05-03 16:20:01 +01:00
1dddd17e3c Benchmark improvements from tesseract 2018-04-27 11:44:46 +01:00
fa0d8feff4 Performance of CovariantCshift now non-embarrassing. 2018-04-26 17:56:27 +01:00
05b44aef6b Merge branch 'develop' of https://github.com/paboyle/Grid into develop
Conflicts:
	benchmarks/Benchmark_su3.cc
2018-04-26 15:38:49 +01:00
91a0a3f820 Improvement 2018-04-26 14:48:35 +01:00
8f44c799a6 Saving the benchmarking tests for Cshift 2018-04-26 14:48:03 +01:00
43f5a0df50 More timers in the integrator 2018-04-26 12:01:56 +09:00
2baf193031 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-04-25 00:14:03 +01:00
362ba0443a Cshift updates 2018-04-25 00:12:11 +01:00
c5b9147b53 Correction of a minor bug in the su3 benchmark 2018-04-24 08:03:57 -07:00
a1be533329 Corrected Flop count in Benchmark su3 and expanded the Wilson flow output 2018-04-24 01:19:53 -07:00
b5510427f9 physical fermion interface, cshift benchmark in SU3. 2018-04-18 01:43:29 +01:00
276f113f28 IO uses master boss node for metadata. 2018-03-30 16:17:05 +01:00
ab6afd18ac Still compile if no LIME 2018-03-30 13:39:20 +01:00
c5a885dcd6 I/O benchmark 2018-03-29 19:57:41 +01:00
6fe9b28a82 Cosmetic 2018-03-24 19:27:14 -04:00
b002587d7c Simplify 2018-03-24 19:26:44 -04:00
6c08385782 Simplify 2018-03-24 19:26:19 -04:00
a3690071b4 Warm up GPu 2018-03-22 18:05:20 -04:00
5ac96dbdc6 Warm behaviour in SU3 benchmark 2018-03-20 07:18:31 -04:00
aead94e9a7 View introduced 2018-03-04 16:39:29 +00:00
36ea5f6b77 gpu friendly coordinates ; no std::vector on GPU 2018-02-24 22:20:14 +00:00
fb24e3a7d2 Adding utilities for perf profiling 2018-01-29 11:11:45 +01:00
604c05f4b8 parallel_for elimination -> thread_loop 2018-01-28 01:01:36 +00:00
ce4da83bc2 Zero changes, literally 2018-01-27 23:51:10 +00:00
c4f82e072b _grid becomes private ; use Grid()§ 2018-01-27 00:04:12 +00:00
2a4a0e43c1 Hide internals 2018-01-26 23:08:27 +00:00
f4010023ca Warning fixes 2018-01-25 23:46:47 +00:00
e7cba358c2 Temporary update to reflect the new dropping of std::vector in Lattice
Will update again to hide the internals in an interface
2018-01-25 23:31:41 +00:00
cff3bae155 Adding support for general Nc in the benchmark outputs 2018-01-25 13:46:31 +01:00
918c105c57 NVCC warning elimination 2018-01-24 13:23:59 +00:00
d74c21a386 GLobal edit for QCD namespace removal & NAMESPACE macros 2018-01-15 09:37:58 +00:00
9b32d51cd1 Simplify comms layer proliferatoin 2018-01-08 11:27:14 +00:00
4f8b6f26b4 Merge branch 'develop' into feature/dwf-multirhs 2017-10-02 11:41:49 +01:00
bfb68e6f02 Merge pull request #130 from giltirn/gparity-handunroll
Gparity handunroll
2017-09-21 10:11:00 +01:00
17c5b0f152 Patching comparison point 2017-09-16 18:18:07 +01:00
b331be9101 Better reporting 2017-08-31 11:32:57 +01:00
49c20a9fa8 Patch to reporting 2017-08-31 11:32:21 +01:00
7359df3501 Full reporting for benchmark; save robustness factor 2017-08-31 10:42:35 +01:00
d36d2fb40d Added ability to override default Ls in Benchmark_dwf 2017-08-28 06:53:56 -07:00
5b9267e88d Cleaner comms benchmark treatment for one node runs 2017-08-27 18:24:48 -04:00
15fd4003ef Improving presentation of results 2017-08-27 13:46:02 +01:00
ad89abb018 Fix 2017-08-25 20:43:37 +01:00
80c5bce5bb Merge branch 'develop' into feature/multi-communicator 2017-08-25 20:21:26 +01:00