portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2024-11-16 02:35:36 +00:00

Author	SHA1	Message	Date
Antonin Portelli	ef5452cddf	Hadrons: smarter memory profiler	2018-07-26 16:47:45 +01:00
Antonin Portelli	80de748737	Hadrons: new exceptions which can save a integer	2018-07-26 16:47:25 +01:00
paboyle	71e1006ba8	Updated meson field benchmark for dirac structures	2018-07-26 09:09:29 +01:00
Antonin Portelli	00f31ae83f	Merge pull request #163 from goracle/unstaged Add printing of whether there are unstaged changes in the git hash print	2018-07-25 19:00:00 +00:00
Antonin Portelli	cce339deaf	Merge pull request #172 from fionnoh/feature/hadrons feature/hadrons -> feature/hadrons-a2a	2018-07-25 17:20:19 +00:00
fionnoh	24128ff109	Changes needed for MF benchmark to work with comms correctly	2018-07-23 15:51:37 +01:00
Peter Boyle	da17a015c7	Pack the stencil smaller for 128 bit access	2018-07-23 06:12:45 -04:00
Peter Boyle	1fd08c21ac	make simd width configure time option for GPU	2018-07-23 06:10:55 -04:00
Peter Boyle	28db0631ff	Hack to force 128bit accesses	2018-07-23 06:10:27 -04:00
Peter Boyle	b35401b86b	Fix CUDA_ARCH. Need to simplify. See when new eigen release happens	2018-07-23 06:09:33 -04:00
Peter Boyle	a0714de8ec	Define vector length for GPU	2018-07-23 06:09:05 -04:00
Peter Boyle	21a1710b43	Verbose vector length	2018-07-23 06:08:39 -04:00
fionnoh	34e9d3f0ca	Moved the creation and resizing of the v and w high modes from the A2A class to the A2A module and made them an output of the module. This means that they have to be inputs of the contration modules and they will freed from memory if they are no longer needed.	2018-07-22 14:40:31 +01:00
fionnoh	c995788259	Added ImportUnphysicalFermion and included appropriate logic for 5d w vectors in A2A code	2018-07-21 00:08:11 +01:00
fionnoh	94c7198001	Added ZFIMPL to A2AMeson contraction	2018-07-20 23:08:22 +01:00
fionnoh	04d86fe9f3	Removed overly verbose print statement	2018-07-20 21:38:19 +01:00
fionnoh	b78074b6a0	Removed a Dminus from high mode v and removed duplication pf D_oo code	2018-07-20 16:55:24 +01:00
fionnoh	7dfd3cdae8	Inclusion of ExportPhysicalFermionSource that fixes a bug in the low mode w vectors	2018-07-20 15:45:43 +01:00
fionnoh	cecee1ef2c	Merge branch 'develop' of github.com:paboyle/Grid into feature/hadrons	2018-07-20 13:37:50 +01:00
fionnoh	355d4b58be	Merge branch 'feature/hadrons' of github.com:fionnoh/Grid into feature/hadrons	2018-07-19 16:07:54 +01:00
fionnoh	2c54a536f3	Moved the meson field inner product to its own header file	2018-07-19 15:56:52 +01:00
fionnoh	d868a45120	Cleaned up some stuff that was erroneously included in a previous "trash" commit. Leaving in the mySliceInnerProdct function for now as it speeds up mesonfield creation quite a lot for 24^3 tests	2018-07-16 16:19:59 +01:00
fionnoh	9deae8c962	A2A meson field contraction code	2018-07-16 14:18:45 +01:00
Peter Boyle	b2b5137d28	Finally starting to get decent performance on Volta	2018-07-13 12:06:18 -04:00
fionnoh	db86cdd7bd	Possible trash commit	2018-07-10 13:30:45 +01:00
paboyle	ec9939c1ba	Test for faster implementation of meson field inner loop This should be possible to cache block at outer levels, global sum across nodes not performed and deferred to caller to block them all into a big all reduce. Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether a product without the conjugate should be made available by Grid. It is not clear whether the explicit unroll, or the performing of conjugate on left once was the real source of the speed up. Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache. This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.	2018-07-10 12:38:51 +01:00
Peter Boyle	2cc07450f4	Fastest option for the dslash	2018-07-05 09:57:55 -04:00
Peter Boyle	c0e8bc9da9	Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume.	2018-07-05 07:10:25 -04:00
Peter Boyle	b1265ae867	Prettify code	2018-07-05 07:08:06 -04:00
Peter Boyle	32bb85ea4c	Standard extractLane is fast	2018-07-05 07:07:30 -04:00
Peter Boyle	ca0607b6ef	Clearer kernel call meaning	2018-07-05 07:06:15 -04:00
Peter Boyle	19b527e83f	Better extract merge for GPU. Let the SIMD header files define the pointer type for access. GPU redirects through builtin float2, double2 for complex	2018-07-05 07:05:13 -04:00
Peter Boyle	4730d4692a	Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks	2018-07-05 07:03:33 -04:00
Peter Boyle	1bb456c0c5	Minor GPU vector width changeÂ	2018-07-05 07:02:04 -04:00
Peter Boyle	4b04ae3611	Printing improvement	2018-07-05 06:59:38 -04:00
Peter Boyle	2f776d51c6	Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions, but a bitof (known) work.	2018-07-05 06:58:37 -04:00
fionnoh	f74617c124	Added ZFIMPL to meson field module	2018-07-03 14:04:53 +01:00
fionnoh	8c6a3921ed	Merge remote-tracking branch 'upstream/feature/hadrons' into feature/hadrons	2018-07-03 11:35:14 +01:00
Antonin Portelli	a8a15dd9d0	Hadrons: code cleaning	2018-07-02 17:52:39 +01:00
Antonin Portelli	3ce68a751a	Hadrons: stout smearing module	2018-07-02 17:52:04 +01:00
fionnoh	daa0977d01	Included a print statement that indicates that the guess is being subtracted from the solve.	2018-06-28 16:34:56 +01:00
fionnoh	a2929f4384	Removed A2A contraction module and replaced it with the beginnings of a meson field module	2018-06-28 16:17:26 +01:00
fionnoh	7fe3974c0a	Included eigenPacks and action as references, not inputs, of A2A module. They now now longer need to be parameters in the meson field modules.	2018-06-28 16:14:49 +01:00
fionnoh	f7e86f81a0	Changes A2A class to make use of the new Solver class	2018-06-28 16:14:16 +01:00
fionnoh	fecec803d9	Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons	2018-06-28 16:13:43 +01:00
fionnoh	8fe9a13cdd	Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons	2018-06-28 16:13:07 +01:00
paboyle	3a50afe7e7	GPU dslash updates	2018-06-27 22:32:21 +01:00
paboyle	f8e880b445	Loop for s and xyzt offlow	2018-06-27 21:49:57 +01:00
paboyle	3e947527cb	Move looping over "s" and "site" into kernels for GPU optimisatoin	2018-06-27 21:29:43 +01:00
paboyle	31f65beac8	Move site and Ls looping into the kernels	2018-06-27 21:28:48 +01:00

... 40 41 42 43 44 ...

6556 Commits