portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2025-07-02 14:37:06 +01:00

Author	SHA1	Message	Date
Peter Boyle	19b527e83f	Better extract merge for GPU. Let the SIMD header files define the pointer type for access. GPU redirects through builtin float2, double2 for complex	2018-07-05 07:05:13 -04:00
Peter Boyle	4730d4692a	Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks	2018-07-05 07:03:33 -04:00
Peter Boyle	1bb456c0c5	Minor GPU vector width changeÂ	2018-07-05 07:02:04 -04:00
Peter Boyle	4b04ae3611	Printing improvement	2018-07-05 06:59:38 -04:00
Peter Boyle	2f776d51c6	Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions, but a bitof (known) work.	2018-07-05 06:58:37 -04:00
paboyle	3a50afe7e7	GPU dslash updates	2018-06-27 22:32:21 +01:00
paboyle	f8e880b445	Loop for s and xyzt offlow	2018-06-27 21:49:57 +01:00
paboyle	3e947527cb	Move looping over "s" and "site" into kernels for GPU optimisatoin	2018-06-27 21:29:43 +01:00
paboyle	31f65beac8	Move site and Ls looping into the kernels	2018-06-27 21:28:48 +01:00
paboyle	38e2a32ac9	Single SIMD lane operations for CUDA	2018-06-27 21:28:06 +01:00
paboyle	efa84ca50a	Keep Cuda 9.1 happy	2018-06-27 21:27:32 +01:00
paboyle	5e96d6d04c	Keep CUDA happy	2018-06-27 21:27:11 +01:00
paboyle	df30bdc599	CUDA happy	2018-06-27 21:26:49 +01:00
paboyle	7f45222924	Diagnostics on memory alloc fail	2018-06-27 21:26:20 +01:00
paboyle	dd891f5e3b	Use NVCC to suppress device Eigen	2018-06-27 21:25:17 +01:00
paboyle	6c97a6a071	Coalescing version of the kernel	2018-06-13 20:52:29 +01:00
paboyle	73bb2d5128	Ugly hack to speed up compile on GPU; we don't use the hand kernels on GPU anyway so why compile	2018-06-13 20:35:28 +01:00
paboyle	b710fec6ea	Gpu code first version of specialised kernel	2018-06-13 20:34:39 +01:00
paboyle	b2a8cd60f5	Doubled gauge field is useful	2018-06-13 20:27:47 +01:00
paboyle	867ee364ab	Explicit instantiation hooks	2018-06-13 20:27:12 +01:00
paboyle	25becc9324	GPU tweaks for benchmarking; really necessary?	2018-06-13 20:26:07 +01:00
paboyle	94d1ae4c82	Some prep work for GPU shared memory. Need to be careful, as will try GPU direct RDMA and inter-GPU memory sharing on SUmmit later	2018-06-13 20:24:06 +01:00
paboyle	2075b177ef	CUDA_ARCH more carefule treatment	2018-06-13 20:22:34 +01:00
paboyle	847c761ccc	Move sfw IEEE fp16 into central location	2018-06-13 20:22:01 +01:00
paboyle	8287ed8383	New GPU vector targets	2018-06-13 20:21:35 +01:00
paboyle	e6be7416f4	Use managed memory	2018-06-13 20:14:00 +01:00
paboyle	26863b6d95	User Managed memory	2018-06-13 20:13:42 +01:00
paboyle	ebd730bd54	Adding 2D loops	2018-06-13 20:13:01 +01:00
paboyle	066be31a3b	Optional GPU target SIMD types; work in progress and trying experiments	2018-06-13 20:07:55 +01:00
paboyle	7a4c142955	Add GPU specific simd targets	2018-06-13 19:55:30 +01:00
Peter Boyle	eb7d34a4cc	GPU version	2018-05-14 19:41:47 -04:00
Peter Boyle	aab27a655a	Start of GPU kernels	2018-05-14 19:41:17 -04:00
Peter Boyle	93280bae85	Gpu option	2018-05-14 19:40:58 -04:00
Peter Boyle	c5f93abcd7	GPU clean up	2018-05-14 19:40:33 -04:00
Peter Boyle	d5deef782d	Useful debug comments	2018-05-14 19:39:52 -04:00
Peter Boyle	5f50473c0d	Clean up	2018-05-14 19:39:11 -04:00
Peter Boyle	13f50406e3	Suppress print statement	2018-05-12 18:00:00 -04:00
Peter Boyle	09cd46d337	Lane by Lane operation	2018-05-12 17:59:35 -04:00
Peter Boyle	d3f51065c2	Give command line control of blocks/threads split	2018-05-12 17:58:56 -04:00
Peter Boyle	925ac4173d	Thread count control for warp scheduler thingy doodaa thing	2018-05-12 17:58:22 -04:00
Peter Boyle	eb921041d0	Perf count control	2018-05-12 17:57:32 -04:00
Peter Boyle	87c5c0271b	Ficxing eigen	2018-04-16 19:08:07 -04:00
Peter Boyle	a3f5a13591	Better Eigen handling	2018-04-16 18:02:55 -04:00
Peter Boyle	9fe28f00eb	Eigen sim link off head revision	2018-04-16 17:54:46 -04:00
Peter Boyle	a8a0bb85cc	Control scalar execution or vector under generic. Disable Eigen vectorisation on powerpc / SUmmit	2018-04-12 12:32:57 -04:00
Peter Boyle	6411caad67	work distribution	2018-04-12 11:41:41 -04:00
Peter Boyle	7533035a99	Control Eigen vectorisatoin	2018-04-12 11:40:56 -04:00
Peter Boyle	b15db11c60	Kernels -> pure static object to enable device execution	2018-03-24 19:35:20 -04:00
Peter Boyle	f6077f9d48	Kernels -> not instantiaed otherwise object ref on GPU	2018-03-24 19:33:44 -04:00
Peter Boyle	572954ef12	Kernels not an instantiated object, just static	2018-03-24 19:33:13 -04:00

1 2 3 4 5 ...

3821 Commits