portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2024-09-20 09:15:38 +01:00

Author	SHA1	Message	Date
Peter Boyle	4e9df9e93c	GPU patches	2019-05-18 17:43:11 +01:00
Peter Boyle	c43a2b599a	GPU support	2019-01-01 15:07:29 +00:00
Peter Boyle	b57a4d32aa	Merge branch 'develop' into feature/gpu-port	2018-12-13 05:11:34 +00:00
Peter Boyle	00b92a91b5	Optimising	2018-07-28 23:46:22 +01:00
paboyle	65533741f7	7 moms	2018-07-28 16:17:47 +01:00
Peter Boyle	131a6785d4	Merge branch 'feature/hadrons-a2a' into feature/hadrons-a2a	2018-07-27 23:03:42 +01:00
paboyle	44f4f5c8e2	Momentum loop	2018-07-27 23:00:16 +01:00
fionnoh	2679df034f	Changes to meson field benchmark. Now includes the gammas in the final part of the naive method, both methods compute lhs^dagGammarhs (previously Gammalhs^dagrhs), and checks results.	2018-07-27 18:31:10 +01:00
paboyle	71e1006ba8	Updated meson field benchmark for dirac structures	2018-07-26 09:09:29 +01:00
fionnoh	24128ff109	Changes needed for MF benchmark to work with comms correctly	2018-07-23 15:51:37 +01:00
paboyle	ec9939c1ba	Test for faster implementation of meson field inner loop This should be possible to cache block at outer levels, global sum across nodes not performed and deferred to caller to block them all into a big all reduce. Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether a product without the conjugate should be made available by Grid. It is not clear whether the explicit unroll, or the performing of conjugate on left once was the real source of the speed up. Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache. This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.	2018-07-10 12:38:51 +01:00

11 Commits