portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-05-15 14:44:30 +01:00

Author	SHA1	Message	Date
Peter Boyle	b2b5137d28	Finally starting to get decent performance on Volta	2018-07-13 12:06:18 -04:00
fionnoh	db86cdd7bd	Possible trash commit	2018-07-10 13:30:45 +01:00
paboyle	ec9939c1ba	Test for faster implementation of meson field inner loop This should be possible to cache block at outer levels, global sum across nodes not performed and deferred to caller to block them all into a big all reduce. Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether a product without the conjugate should be made available by Grid. It is not clear whether the explicit unroll, or the performing of conjugate on left once was the real source of the speed up. Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache. This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.	2018-07-10 12:38:51 +01:00
Peter Boyle	2cc07450f4	Fastest option for the dslash	2018-07-05 09:57:55 -04:00
Peter Boyle	c0e8bc9da9	Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume.	2018-07-05 07:10:25 -04:00
Peter Boyle	b1265ae867	Prettify code	2018-07-05 07:08:06 -04:00
Peter Boyle	32bb85ea4c	Standard extractLane is fast	2018-07-05 07:07:30 -04:00
Peter Boyle	ca0607b6ef	Clearer kernel call meaning	2018-07-05 07:06:15 -04:00
Peter Boyle	19b527e83f	Better extract merge for GPU. Let the SIMD header files define the pointer type for access. GPU redirects through builtin float2, double2 for complex	2018-07-05 07:05:13 -04:00
Peter Boyle	4730d4692a	Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks	2018-07-05 07:03:33 -04:00
Peter Boyle	1bb456c0c5	Minor GPU vector width changeÂ	2018-07-05 07:02:04 -04:00
Peter Boyle	4b04ae3611	Printing improvement	2018-07-05 06:59:38 -04:00
Peter Boyle	2f776d51c6	Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions, but a bitof (known) work.	2018-07-05 06:58:37 -04:00
fionnoh	f74617c124	Added ZFIMPL to meson field module	2018-07-03 14:04:53 +01:00
fionnoh	8c6a3921ed	Merge remote-tracking branch 'upstream/feature/hadrons' into feature/hadrons	2018-07-03 11:35:14 +01:00
portelli	a8a15dd9d0	Hadrons: code cleaning	2018-07-02 17:52:39 +01:00
portelli	3ce68a751a	Hadrons: stout smearing module	2018-07-02 17:52:04 +01:00
fionnoh	daa0977d01	Included a print statement that indicates that the guess is being subtracted from the solve.	2018-06-28 16:34:56 +01:00
fionnoh	a2929f4384	Removed A2A contraction module and replaced it with the beginnings of a meson field module	2018-06-28 16:17:26 +01:00
fionnoh	7fe3974c0a	Included eigenPacks and action as references, not inputs, of A2A module. They now now longer need to be parameters in the meson field modules.	2018-06-28 16:14:49 +01:00
fionnoh	f7e86f81a0	Changes A2A class to make use of the new Solver class	2018-06-28 16:14:16 +01:00
fionnoh	fecec803d9	Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons	2018-06-28 16:13:43 +01:00
fionnoh	8fe9a13cdd	Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons	2018-06-28 16:13:07 +01:00
paboyle	3a50afe7e7	GPU dslash updates	2018-06-27 22:32:21 +01:00
paboyle	f8e880b445	Loop for s and xyzt offlow	2018-06-27 21:49:57 +01:00
paboyle	3e947527cb	Move looping over "s" and "site" into kernels for GPU optimisatoin	2018-06-27 21:29:43 +01:00
paboyle	31f65beac8	Move site and Ls looping into the kernels	2018-06-27 21:28:48 +01:00
paboyle	38e2a32ac9	Single SIMD lane operations for CUDA	2018-06-27 21:28:06 +01:00
paboyle	efa84ca50a	Keep Cuda 9.1 happy	2018-06-27 21:27:32 +01:00
paboyle	5e96d6d04c	Keep CUDA happy	2018-06-27 21:27:11 +01:00
paboyle	df30bdc599	CUDA happy	2018-06-27 21:26:49 +01:00
paboyle	7f45222924	Diagnostics on memory alloc fail	2018-06-27 21:26:20 +01:00
paboyle	dd891f5e3b	Use NVCC to suppress device Eigen	2018-06-27 21:25:17 +01:00
portelli	d2c42e6f42	Hadrons: scaled DWF action	2018-06-26 14:59:33 +01:00
Daniel Richtmann	2881b3e8e5	WilsonMG: Remove unnecessary static assertions	2018-06-26 14:42:30 +02:00
portelli	049cc518f4	Hadrons: introduction message 2	2018-06-25 19:08:39 +01:00
portelli	2e1c66897f	Hadrons: introduction message	2018-06-25 19:08:22 +01:00
portelli	adcef36189	Hadrons: Möbius DWF action	2018-06-25 15:58:35 +01:00
fionnoh	2f121c41c9	Commiting reation of meson field code before a merge with the upstream branch feature/hadrons	2018-06-25 12:20:46 +01:00
portelli	e0ed7e300f	Hadrons: spurious Dminus removed	2018-06-22 16:33:43 +02:00
portelli	485207901b	Merge branch 'develop' into feature/hadrons	2018-06-22 16:15:32 +02:00
portelli	c760f0a4c3	Hadrons: remove make_5D/4D functions and FreeProp fix	2018-06-22 16:12:46 +02:00
portelli	c84eeedec3	Hadrons: GaugeProp module for z-Wilson actions	2018-06-22 15:53:22 +02:00
fionnoh	1ac3526f33	Small changes to the A2A header and module	2018-06-22 12:29:42 +01:00
fionnoh	0de090ee74	Temporarily added in the contraction code that produced the working 2-pt function. This is commited for reference only and will be removed in the next push.	2018-06-22 12:28:41 +01:00
portelli	91405de3f7	Hadrons: new solver exposing fermion matrix and generic source/solve import/export	2018-06-22 12:14:37 +02:00
fionnoh	8fccda301a	Fixed a bug where the guess was always subtracted after the solve and included appropriate weights for the sources in the one case we're looking at now. More work needs to be done to make the 5d/4d source logic less brittle.	2018-06-21 16:36:59 +01:00
fionnoh	7a0abfac89	Restructured the class that computes and returns the A2A vectors.	2018-06-21 16:36:06 +01:00
fionnoh	ae37fda699	A more elegant way to subtract guesses from solve and a bool check before verifying residual	2018-06-20 16:07:40 +01:00
fionnoh	b5fc5e2030	All to all module update that hit a promising milestone. Commiting for a reference for future changes.	2018-06-20 10:59:07 +01:00

... 43 44 45 46 47 ...

6683 Commits