portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-05-15 06:34:31 +01:00

Author	SHA1	Message	Date
fionnoh	8d1679c6b8	Merge branch 'feature/hadrons-a2a' of https://github.com/paboyle/Grid into a2a_basics	2018-08-03 15:12:24 +01:00
Peter Boyle	3791a38f7c	Optimised the MesonField a bit more	2018-08-01 08:27:27 +01:00
Peter Boyle	142f7b0c86	Updated the A2A Meson Field module	2018-07-31 15:58:02 +01:00
fionnoh	891ad66eab	Included changes to Hadrons RBPrecCG solver needed for subtraction of guess	2018-07-31 11:26:07 +01:00
Peter Boyle	60c43151c5	Merge branch 'feature/hadrons-a2a' of https://github.com/paboyle/Grid into feature/hadrons-a2a	2018-07-31 01:09:02 +01:00
paboyle	e036800261	Eigen fix	2018-07-31 01:08:42 +01:00
Peter Boyle	62900def36	Merge branch 'feature/hadrons-a2a' of https://github.com/paboyle/Grid into feature/hadrons-a2a	2018-07-31 00:36:26 +01:00
paboyle	e3a309a73f	Eigen happiness	2018-07-31 00:35:17 +01:00
fionnoh	ad6c1c0c4e	The basics of what is needed in Grid and Hadrons for the A2A class and module, with none of the contraction or MF code.	2018-07-30 18:40:50 +01:00
Peter Boyle	00b92a91b5	Optimising	2018-07-28 23:46:22 +01:00
paboyle	65533741f7	7 moms	2018-07-28 16:17:47 +01:00
Peter Boyle	dc0259fbda	Merge pull request #173 from fionnoh/feature/hadrons-a2a Changes to meson field benchmark. Now includes the gammas in the fina…	2018-07-27 23:03:56 +01:00
Peter Boyle	131a6785d4	Merge branch 'feature/hadrons-a2a' into feature/hadrons-a2a	2018-07-27 23:03:42 +01:00
paboyle	44f4f5c8e2	Momentum loop	2018-07-27 23:00:16 +01:00
fionnoh	2679df034f	Changes to meson field benchmark. Now includes the gammas in the final part of the naive method, both methods compute lhs^dagGammarhs (previously Gammalhs^dagrhs), and checks results.	2018-07-27 18:31:10 +01:00
portelli	bf71162b97	Hadrons: backtrace on abort	2018-07-26 19:20:12 +01:00
portelli	299e828d83	Merge branch 'develop' into feature/hadrons	2018-07-26 16:49:49 +01:00
portelli	ef5452cddf	Hadrons: smarter memory profiler	2018-07-26 16:47:45 +01:00
portelli	80de748737	Hadrons: new exceptions which can save a integer	2018-07-26 16:47:25 +01:00
paboyle	71e1006ba8	Updated meson field benchmark for dirac structures	2018-07-26 09:09:29 +01:00
portelli	00f31ae83f	Merge pull request #163 from goracle/unstaged Add printing of whether there are unstaged changes in the git hash print	2018-07-25 19:00:00 +00:00
portelli	cce339deaf	Merge pull request #172 from fionnoh/feature/hadrons feature/hadrons -> feature/hadrons-a2a	2018-07-25 17:20:19 +00:00
fionnoh	24128ff109	Changes needed for MF benchmark to work with comms correctly	2018-07-23 15:51:37 +01:00
Peter Boyle	da17a015c7	Pack the stencil smaller for 128 bit access	2018-07-23 06:12:45 -04:00
Peter Boyle	1fd08c21ac	make simd width configure time option for GPU	2018-07-23 06:10:55 -04:00
Peter Boyle	28db0631ff	Hack to force 128bit accesses	2018-07-23 06:10:27 -04:00
Peter Boyle	b35401b86b	Fix CUDA_ARCH. Need to simplify. See when new eigen release happens	2018-07-23 06:09:33 -04:00
Peter Boyle	a0714de8ec	Define vector length for GPU	2018-07-23 06:09:05 -04:00
Peter Boyle	21a1710b43	Verbose vector length	2018-07-23 06:08:39 -04:00
fionnoh	34e9d3f0ca	Moved the creation and resizing of the v and w high modes from the A2A class to the A2A module and made them an output of the module. This means that they have to be inputs of the contration modules and they will freed from memory if they are no longer needed.	2018-07-22 14:40:31 +01:00
fionnoh	c995788259	Added ImportUnphysicalFermion and included appropriate logic for 5d w vectors in A2A code	2018-07-21 00:08:11 +01:00
fionnoh	94c7198001	Added ZFIMPL to A2AMeson contraction	2018-07-20 23:08:22 +01:00
fionnoh	04d86fe9f3	Removed overly verbose print statement	2018-07-20 21:38:19 +01:00
fionnoh	b78074b6a0	Removed a Dminus from high mode v and removed duplication pf D_oo code	2018-07-20 16:55:24 +01:00
fionnoh	7dfd3cdae8	Inclusion of ExportPhysicalFermionSource that fixes a bug in the low mode w vectors	2018-07-20 15:45:43 +01:00
fionnoh	cecee1ef2c	Merge branch 'develop' of github.com:paboyle/Grid into feature/hadrons	2018-07-20 13:37:50 +01:00
fionnoh	355d4b58be	Merge branch 'feature/hadrons' of github.com:fionnoh/Grid into feature/hadrons	2018-07-19 16:07:54 +01:00
fionnoh	2c54a536f3	Moved the meson field inner product to its own header file	2018-07-19 15:56:52 +01:00
fionnoh	d868a45120	Cleaned up some stuff that was erroneously included in a previous "trash" commit. Leaving in the mySliceInnerProdct function for now as it speeds up mesonfield creation quite a lot for 24^3 tests	2018-07-16 16:19:59 +01:00
fionnoh	9deae8c962	A2A meson field contraction code	2018-07-16 14:18:45 +01:00
Peter Boyle	b2b5137d28	Finally starting to get decent performance on Volta	2018-07-13 12:06:18 -04:00
fionnoh	db86cdd7bd	Possible trash commit	2018-07-10 13:30:45 +01:00
paboyle	ec9939c1ba	Test for faster implementation of meson field inner loop This should be possible to cache block at outer levels, global sum across nodes not performed and deferred to caller to block them all into a big all reduce. Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether a product without the conjugate should be made available by Grid. It is not clear whether the explicit unroll, or the performing of conjugate on left once was the real source of the speed up. Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache. This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.	2018-07-10 12:38:51 +01:00
Peter Boyle	2cc07450f4	Fastest option for the dslash	2018-07-05 09:57:55 -04:00
Peter Boyle	c0e8bc9da9	Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume.	2018-07-05 07:10:25 -04:00
Peter Boyle	b1265ae867	Prettify code	2018-07-05 07:08:06 -04:00
Peter Boyle	32bb85ea4c	Standard extractLane is fast	2018-07-05 07:07:30 -04:00
Peter Boyle	ca0607b6ef	Clearer kernel call meaning	2018-07-05 07:06:15 -04:00
Peter Boyle	19b527e83f	Better extract merge for GPU. Let the SIMD header files define the pointer type for access. GPU redirects through builtin float2, double2 for complex	2018-07-05 07:05:13 -04:00
Peter Boyle	4730d4692a	Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks	2018-07-05 07:03:33 -04:00

... 8 9 10 11 12 ...

4966 Commits