portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-05-13 05:34:30 +01:00

Author	SHA1	Message	Date
paboyle	ec9939c1ba	Test for faster implementation of meson field inner loop This should be possible to cache block at outer levels, global sum across nodes not performed and deferred to caller to block them all into a big all reduce. Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether a product without the conjugate should be made available by Grid. It is not clear whether the explicit unroll, or the performing of conjugate on left once was the real source of the speed up. Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache. This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.	2018-07-10 12:38:51 +01:00
portelli	bfbf2f1fa0	no threaded stencil benchmark if OpenMP is not supported	2018-05-03 16:20:01 +01:00
Dr Peter Boyle	1dddd17e3c	Benchmark improvements from tesseract	2018-04-27 11:44:46 +01:00
Peter Boyle	fa0d8feff4	Performance of CovariantCshift now non-embarrassing.	2018-04-26 17:56:27 +01:00
Peter Boyle	05b44aef6b	Merge branch 'develop' of https://github.com/paboyle/Grid into develop Conflicts: benchmarks/Benchmark_su3.cc	2018-04-26 15:38:49 +01:00
Peter Boyle	91a0a3f820	Improvement	2018-04-26 14:48:35 +01:00
Peter Boyle	8f44c799a6	Saving the benchmarking tests for Cshift	2018-04-26 14:48:03 +01:00
Guido Cossu	43f5a0df50	More timers in the integrator	2018-04-26 12:01:56 +09:00
paboyle	2baf193031	Merge branch 'develop' of https://github.com/paboyle/Grid into develop	2018-04-25 00:14:03 +01:00
paboyle	362ba0443a	Cshift updates	2018-04-25 00:12:11 +01:00
Guido Cossu	c5b9147b53	Correction of a minor bug in the su3 benchmark	2018-04-24 08:03:57 -07:00
Guido Cossu	a1be533329	Corrected Flop count in Benchmark su3 and expanded the Wilson flow output	2018-04-24 01:19:53 -07:00
paboyle	b5510427f9	physical fermion interface, cshift benchmark in SU3.	2018-04-18 01:43:29 +01:00
paboyle	276f113f28	IO uses master boss node for metadata.	2018-03-30 16:17:05 +01:00
paboyle	ab6afd18ac	Still compile if no LIME	2018-03-30 13:39:20 +01:00
portelli	c5a885dcd6	I/O benchmark	2018-03-29 19:57:41 +01:00
Guido Cossu	fb24e3a7d2	Adding utilities for perf profiling	2018-01-29 11:11:45 +01:00
Guido Cossu	cff3bae155	Adding support for general Nc in the benchmark outputs	2018-01-25 13:46:31 +01:00
paboyle	9b32d51cd1	Simplify comms layer proliferatoin	2018-01-08 11:27:14 +00:00
paboyle	4f8b6f26b4	Merge branch 'develop' into feature/dwf-multirhs	2017-10-02 11:41:49 +01:00
Peter Boyle	bfb68e6f02	Merge pull request #130 from giltirn/gparity-handunroll Gparity handunroll	2017-09-21 10:11:00 +01:00
paboyle	17c5b0f152	Patching comparison point	2017-09-16 18:18:07 +01:00
Peter Boyle	b331be9101	Better reporting	2017-08-31 11:32:57 +01:00
Peter Boyle	49c20a9fa8	Patch to reporting	2017-08-31 11:32:21 +01:00
paboyle	7359df3501	Full reporting for benchmark; save robustness factor	2017-08-31 10:42:35 +01:00
Christopher Kelly	d36d2fb40d	Added ability to override default Ls in Benchmark_dwf	2017-08-28 06:53:56 -07:00
Peter Boyle	5b9267e88d	Cleaner comms benchmark treatment for one node runs	2017-08-27 18:24:48 -04:00
paboyle	15fd4003ef	Improving presentation of results	2017-08-27 13:46:02 +01:00
paboyle	ad89abb018	Fix	2017-08-25 20:43:37 +01:00
paboyle	80c5bce5bb	Merge branch 'develop' into feature/multi-communicator	2017-08-25 20:21:26 +01:00
Peter Boyle	d0f3d525d5	Optimal block size for KNL	2017-08-25 19:33:54 +01:00
Peter Boyle	3a58217405	Updated	2017-08-25 14:29:53 +01:00
Peter Boyle	c289699d9a	updated from cambridge mpi3 shakeout	2017-08-25 11:41:01 +01:00
Peter Boyle	c3b1263e75	Benchmark prep	2017-08-25 09:25:54 +01:00
Christopher Kelly	edabb3577f	Imported Benchmark_gparity	2017-08-23 16:54:06 -04:00
paboyle	ae56e556c6	finalise issue on new OPA revert	2017-08-20 02:53:12 +01:00
paboyle	383ca7d392	Switch off comms for now until feature/multi-communicator is merged	2017-08-20 01:27:48 +01:00
paboyle	a446d95c33	Trying to pass TeamCity and Travis	2017-08-20 01:10:50 +01:00
paboyle	be66e7dd95	Merge branch 'develop' into feature/multi-communicator	2017-08-19 23:12:38 +01:00
paboyle	bfef525ed2	New benchmark prep	2017-08-19 23:10:12 +01:00
Peter Boyle	7d88198387	Merge branch 'develop' into feature/multi-communicator	2017-08-19 13:03:35 -04:00
Peter Boyle	9e658de238	Use Vector	2017-08-19 12:52:44 -04:00
Peter Boyle	14d53e1c9e	Threaded MPI calls patches	2017-07-29 13:08:10 -04:00
Peter Boyle	40e119c61c	NUMA improvements worth preserving from AMD EPYC tests	2017-07-08 22:27:11 -04:00
Peter Boyle	b73bd151bb	Switch off counters by default	2017-06-30 10:16:35 +01:00
Peter Boyle	694b305cab	Update to reporting	2017-06-30 10:16:13 +01:00
paboyle	6f5a5cd9b3	Improved threaded comms benchmark	2017-06-28 23:27:02 +01:00
Peter Boyle	08e04b9676	Better benchmarks	2017-06-28 15:30:06 +01:00
paboyle	54e94360ad	Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit	2017-06-24 23:10:24 +01:00
paboyle	6ebf9f15b7	Splitting communicators first cut	2017-06-22 08:14:34 +01:00

1 2 3 4 5

214 Commits