portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2025-11-25 00:49:31 +00:00

Author	SHA1	Message	Date
Peter Boyle	adbdc4e65b	Half comms not working on GPU yet, so disable.	2018-09-11 05:15:22 +01:00
Peter Boyle	e4deea4b94	Weird bug appears with Vector<Vector<>>. "fix" with std::vector<Vector<>> Lies in the face table code. But think there is some latent problem. Possibly in my allocator since it is caching, but could simplify or eliminate the caching option and retest. One to look at later.	2018-09-11 04:36:57 +01:00
Peter Boyle	94d721a20b	Comments on further topology discovery work	2018-09-11 04:20:04 +01:00
Peter Boyle	7bf82f5b37	Offload the face handling to GPU	2018-09-10 11:28:42 +01:00
Peter Boyle	f02c7ea534	Peer to peer on GPU's setup	2018-09-10 11:26:20 +01:00
Peter Boyle	bc503b60e6	Offloadable gather code	2018-09-10 11:21:25 +01:00
Peter Boyle	704ca162c1	Offloadable compression	2018-09-10 11:20:50 +01:00
Peter Boyle	b5329d8852	Protect against zero length loops giving a kernel call failure	2018-09-10 11:20:07 +01:00
Peter Boyle	f27b9347ff	Better unquiesce MPI coverage	2018-09-10 11:19:39 +01:00
Peter Boyle	b4967f0231	Verbose and error trapping cleaner	2018-09-09 14:28:02 +01:00
Peter Boyle	6d0f1aabb1	Fix the multi-node path	2018-09-09 14:27:37 +01:00
Peter Boyle	f4bfeb835d	Drop back to smaller Ls	2018-09-09 14:25:06 +01:00
Peter Boyle	394b7b6276	Verbose decrease	2018-09-09 14:24:46 +01:00
Peter Boyle	da17a015c7	Pack the stencil smaller for 128 bit access	2018-07-23 06:12:45 -04:00
Peter Boyle	1fd08c21ac	make simd width configure time option for GPU	2018-07-23 06:10:55 -04:00
Peter Boyle	28db0631ff	Hack to force 128bit accesses	2018-07-23 06:10:27 -04:00
Peter Boyle	b35401b86b	Fix CUDA_ARCH. Need to simplify. See when new eigen release happens	2018-07-23 06:09:33 -04:00
Peter Boyle	a0714de8ec	Define vector length for GPU	2018-07-23 06:09:05 -04:00
Peter Boyle	21a1710b43	Verbose vector length	2018-07-23 06:08:39 -04:00
Peter Boyle	b2b5137d28	Finally starting to get decent performance on Volta	2018-07-13 12:06:18 -04:00
Peter Boyle	2cc07450f4	Fastest option for the dslash	2018-07-05 09:57:55 -04:00
Peter Boyle	c0e8bc9da9	Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume.	2018-07-05 07:10:25 -04:00
Peter Boyle	b1265ae867	Prettify code	2018-07-05 07:08:06 -04:00
Peter Boyle	32bb85ea4c	Standard extractLane is fast	2018-07-05 07:07:30 -04:00
Peter Boyle	ca0607b6ef	Clearer kernel call meaning	2018-07-05 07:06:15 -04:00
Peter Boyle	19b527e83f	Better extract merge for GPU. Let the SIMD header files define the pointer type for access. GPU redirects through builtin float2, double2 for complex	2018-07-05 07:05:13 -04:00
Peter Boyle	4730d4692a	Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks	2018-07-05 07:03:33 -04:00
Peter Boyle	1bb456c0c5	Minor GPU vector width changeÂ	2018-07-05 07:02:04 -04:00
Peter Boyle	4b04ae3611	Printing improvement	2018-07-05 06:59:38 -04:00
Peter Boyle	2f776d51c6	Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions, but a bitof (known) work.	2018-07-05 06:58:37 -04:00
paboyle	3a50afe7e7	GPU dslash updates	2018-06-27 22:32:21 +01:00
paboyle	f8e880b445	Loop for s and xyzt offlow	2018-06-27 21:49:57 +01:00
paboyle	3e947527cb	Move looping over "s" and "site" into kernels for GPU optimisatoin	2018-06-27 21:29:43 +01:00
paboyle	31f65beac8	Move site and Ls looping into the kernels	2018-06-27 21:28:48 +01:00
paboyle	38e2a32ac9	Single SIMD lane operations for CUDA	2018-06-27 21:28:06 +01:00
paboyle	efa84ca50a	Keep Cuda 9.1 happy	2018-06-27 21:27:32 +01:00
paboyle	5e96d6d04c	Keep CUDA happy	2018-06-27 21:27:11 +01:00
paboyle	df30bdc599	CUDA happy	2018-06-27 21:26:49 +01:00
paboyle	7f45222924	Diagnostics on memory alloc fail	2018-06-27 21:26:20 +01:00
paboyle	dd891f5e3b	Use NVCC to suppress device Eigen	2018-06-27 21:25:17 +01:00
paboyle	6c97a6a071	Coalescing version of the kernel	2018-06-13 20:52:29 +01:00
paboyle	73bb2d5128	Ugly hack to speed up compile on GPU; we don't use the hand kernels on GPU anyway so why compile	2018-06-13 20:35:28 +01:00
paboyle	b710fec6ea	Gpu code first version of specialised kernel	2018-06-13 20:34:39 +01:00
paboyle	b2a8cd60f5	Doubled gauge field is useful	2018-06-13 20:27:47 +01:00
paboyle	867ee364ab	Explicit instantiation hooks	2018-06-13 20:27:12 +01:00
paboyle	25becc9324	GPU tweaks for benchmarking; really necessary?	2018-06-13 20:26:07 +01:00
paboyle	94d1ae4c82	Some prep work for GPU shared memory. Need to be careful, as will try GPU direct RDMA and inter-GPU memory sharing on SUmmit later	2018-06-13 20:24:06 +01:00
paboyle	2075b177ef	CUDA_ARCH more carefule treatment	2018-06-13 20:22:34 +01:00
paboyle	847c761ccc	Move sfw IEEE fp16 into central location	2018-06-13 20:22:01 +01:00
paboyle	8287ed8383	New GPU vector targets	2018-06-13 20:21:35 +01:00

1 2 3 4 5 ...

3846 Commits