portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-06-27 05:53:30 +01:00

Author	SHA1	Message	Date
Peter Boyle	6d0f1aabb1	Fix the multi-node path	2018-09-09 14:27:37 +01:00
Peter Boyle	28db0631ff	Hack to force 128bit accesses	2018-07-23 06:10:27 -04:00
Peter Boyle	b2b5137d28	Finally starting to get decent performance on Volta	2018-07-13 12:06:18 -04:00
Peter Boyle	c0e8bc9da9	Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume.	2018-07-05 07:10:25 -04:00
Peter Boyle	b1265ae867	Prettify code	2018-07-05 07:08:06 -04:00
Peter Boyle	32bb85ea4c	Standard extractLane is fast	2018-07-05 07:07:30 -04:00
Peter Boyle	ca0607b6ef	Clearer kernel call meaning	2018-07-05 07:06:15 -04:00
paboyle	3a50afe7e7	GPU dslash updates	2018-06-27 22:32:21 +01:00
paboyle	3e947527cb	Move looping over "s" and "site" into kernels for GPU optimisatoin	2018-06-27 21:29:43 +01:00
paboyle	31f65beac8	Move site and Ls looping into the kernels	2018-06-27 21:28:48 +01:00
paboyle	38e2a32ac9	Single SIMD lane operations for CUDA	2018-06-27 21:28:06 +01:00
paboyle	6c97a6a071	Coalescing version of the kernel	2018-06-13 20:52:29 +01:00
paboyle	73bb2d5128	Ugly hack to speed up compile on GPU; we don't use the hand kernels on GPU anyway so why compile	2018-06-13 20:35:28 +01:00
paboyle	b710fec6ea	Gpu code first version of specialised kernel	2018-06-13 20:34:39 +01:00
paboyle	b2a8cd60f5	Doubled gauge field is useful	2018-06-13 20:27:47 +01:00
paboyle	867ee364ab	Explicit instantiation hooks	2018-06-13 20:27:12 +01:00
Peter Boyle	eb7d34a4cc	GPU version	2018-05-14 19:41:47 -04:00
Peter Boyle	aab27a655a	Start of GPU kernels	2018-05-14 19:41:17 -04:00
Peter Boyle	13f50406e3	Suppress print statement	2018-05-12 18:00:00 -04:00
Peter Boyle	b15db11c60	Kernels -> pure static object to enable device execution	2018-03-24 19:35:20 -04:00
Peter Boyle	f6077f9d48	Kernels -> not instantiaed otherwise object ref on GPU	2018-03-24 19:33:44 -04:00
Peter Boyle	572954ef12	Kernels not an instantiated object, just static	2018-03-24 19:33:13 -04:00
Peter Boyle	cedeaae7db	Lebesge -> StencilView if necessary	2018-03-24 19:32:41 -04:00
Peter Boyle	e6cf0b1e17	View typedefs go to OperatorImpl	2018-03-24 19:32:11 -04:00
Peter Boyle	1f70cedbab	Have to make all kernel called routines static since object reference will be a host pointer on GPU	2018-03-24 19:29:26 -04:00
Peter Boyle	4e1272fabf	Kernels need to be static to work on GPU. No reference to host resident data	2018-03-22 18:44:53 -04:00
Peter Boyle	607dc2d3c6	Remove lebesgue order	2018-03-22 18:23:09 -04:00
Peter Boyle	23c880b009	Remove lebesgue order; stick in stencil if need	2018-03-22 18:13:41 -04:00
Peter Boyle	334bb6792f	Lebesgue order removed. Stick in the stencil view	2018-03-22 18:12:12 -04:00
Peter Boyle	8a1d303ab9	GPU friendly stencil improvements	2018-03-19 07:11:03 -04:00
paboyle	4d60b92b7f	Update oSites	2018-03-08 21:00:25 +00:00
paboyle	c159c70c84	View introduced	2018-03-08 14:58:04 +00:00
Peter Boyle	4548523ecc	This modification eliminates what looks like a compiler bug on Intel 2017.	2018-03-08 04:41:16 -08:00
paboyle	44188a5c6f	AVX512 fix	2018-03-05 00:32:24 +00:00
paboyle	3277bda130	View introduction to prepare for accelerator offload. Probably same problem exists for stencil object	2018-03-04 16:38:08 +00:00
paboyle	078901278c	Coordinate handling gpu friendly	2018-02-24 22:22:02 +00:00
paboyle	aa6de818e2	Copy data needed by Kernels out of the grid object to avoid host reference	2018-02-02 11:36:11 +00:00
paboyle	dcf6517a93	Accelerator offload and copy Opt into the kernel for GPU host var safety	2018-02-02 11:35:35 +00:00
paboyle	a308dff410	accelerator loop, copy Opt into the GPU	2018-02-02 11:34:37 +00:00
paboyle	14ba20898a	Accelerator loop the key kernel call	2018-02-02 11:30:07 +00:00
paboyle	a53d3ee19a	Add Opt to the lambda capture to get it into the GPU	2018-02-02 11:28:39 +00:00
paboyle	e4df025d01	Accelerator related	2018-02-01 23:20:05 +00:00
paboyle	cfeda9d536	constexpr on const ints	2018-02-01 22:59:12 +00:00
paboyle	8ae77d3706	Small simplification of FermionOperatorImpl towards GPU but not there yet	2018-02-01 22:41:54 +00:00
paboyle	70e276e1ab	parallel_for elimination -> thread_loop	2018-01-28 01:01:14 +00:00
paboyle	2d0bcc2606	Zero changes, acceleartor on kernels and some thread loop changes	2018-01-27 23:47:38 +00:00
paboyle	c4f82e072b	_grid becomes private ; use Grid()§	2018-01-27 00:04:12 +00:00
paboyle	85771e97e9	Hide internal data	2018-01-26 23:04:46 +00:00
paboyle	87ee592176	Pragma changes and layout and warning elimination for nvcc	2018-01-24 13:14:09 +00:00
paboyle	e5535f4d72	Namespace, indent	2018-01-14 23:46:51 +00:00

1 2 3 4 5 ...

495 Commits