1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-04-08 13:10:45 +01:00

2566 Commits

Author SHA1 Message Date
Peter Boyle
19b527e83f Better extract merge for GPU. Let the SIMD header files define the pointer type for
access. GPU redirects through builtin float2, double2 for complex
2018-07-05 07:05:13 -04:00
Peter Boyle
4730d4692a Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks 2018-07-05 07:03:33 -04:00
Peter Boyle
1bb456c0c5 Minor GPU vector width change 2018-07-05 07:02:04 -04:00
paboyle
3a50afe7e7 GPU dslash updates 2018-06-27 22:32:21 +01:00
paboyle
f8e880b445 Loop for s and xyzt offlow 2018-06-27 21:49:57 +01:00
paboyle
3e947527cb Move looping over "s" and "site" into kernels for GPU optimisatoin 2018-06-27 21:29:43 +01:00
paboyle
31f65beac8 Move site and Ls looping into the kernels 2018-06-27 21:28:48 +01:00
paboyle
38e2a32ac9 Single SIMD lane operations for CUDA 2018-06-27 21:28:06 +01:00
paboyle
efa84ca50a Keep Cuda 9.1 happy 2018-06-27 21:27:32 +01:00
paboyle
5e96d6d04c Keep CUDA happy 2018-06-27 21:27:11 +01:00
paboyle
df30bdc599 CUDA happy 2018-06-27 21:26:49 +01:00
paboyle
7f45222924 Diagnostics on memory alloc fail 2018-06-27 21:26:20 +01:00
paboyle
dd891f5e3b Use NVCC to suppress device Eigen 2018-06-27 21:25:17 +01:00
paboyle
6c97a6a071 Coalescing version of the kernel 2018-06-13 20:52:29 +01:00
paboyle
73bb2d5128 Ugly hack to speed up compile on GPU; we don't use the hand kernels on GPU anyway so why compile 2018-06-13 20:35:28 +01:00
paboyle
b710fec6ea Gpu code first version of specialised kernel 2018-06-13 20:34:39 +01:00
paboyle
b2a8cd60f5 Doubled gauge field is useful 2018-06-13 20:27:47 +01:00
paboyle
867ee364ab Explicit instantiation hooks 2018-06-13 20:27:12 +01:00
paboyle
94d1ae4c82 Some prep work for GPU shared memory. Need to be careful, as will try GPU direct
RDMA and inter-GPU memory sharing on SUmmit later
2018-06-13 20:24:06 +01:00
paboyle
2075b177ef CUDA_ARCH more carefule treatment 2018-06-13 20:22:34 +01:00
paboyle
847c761ccc Move sfw IEEE fp16 into central location 2018-06-13 20:22:01 +01:00
paboyle
8287ed8383 New GPU vector targets 2018-06-13 20:21:35 +01:00
paboyle
e6be7416f4 Use managed memory 2018-06-13 20:14:00 +01:00
paboyle
26863b6d95 User Managed memory 2018-06-13 20:13:42 +01:00
paboyle
ebd730bd54 Adding 2D loops 2018-06-13 20:13:01 +01:00
paboyle
066be31a3b Optional GPU target SIMD types; work in progress and trying experiments 2018-06-13 20:07:55 +01:00
Peter Boyle
eb7d34a4cc GPU version 2018-05-14 19:41:47 -04:00
Peter Boyle
aab27a655a Start of GPU kernels 2018-05-14 19:41:17 -04:00
Peter Boyle
93280bae85 Gpu option 2018-05-14 19:40:58 -04:00
Peter Boyle
c5f93abcd7 GPU clean up 2018-05-14 19:40:33 -04:00
Peter Boyle
d5deef782d Useful debug comments 2018-05-14 19:39:52 -04:00
Peter Boyle
5f50473c0d Clean up 2018-05-14 19:39:11 -04:00
Peter Boyle
13f50406e3 Suppress print statement 2018-05-12 18:00:00 -04:00
Peter Boyle
09cd46d337 Lane by Lane operation 2018-05-12 17:59:35 -04:00
Peter Boyle
d3f51065c2 Give command line control of blocks/threads split 2018-05-12 17:58:56 -04:00
Peter Boyle
925ac4173d Thread count control for warp scheduler thingy doodaa thing 2018-05-12 17:58:22 -04:00
Peter Boyle
a8a0bb85cc Control scalar execution or vector under generic. Disable Eigen vectorisation on powerpc / SUmmit 2018-04-12 12:32:57 -04:00
Peter Boyle
6411caad67 work distribution 2018-04-12 11:41:41 -04:00
Peter Boyle
7533035a99 Control Eigen vectorisatoin 2018-04-12 11:40:56 -04:00
Peter Boyle
b15db11c60 Kernels -> pure static object to enable device execution 2018-03-24 19:35:20 -04:00
Peter Boyle
f6077f9d48 Kernels -> not instantiaed otherwise object ref on GPU 2018-03-24 19:33:44 -04:00
Peter Boyle
572954ef12 Kernels not an instantiated object, just static 2018-03-24 19:33:13 -04:00
Peter Boyle
cedeaae7db Lebesge -> StencilView if necessary 2018-03-24 19:32:41 -04:00
Peter Boyle
e6cf0b1e17 View typedefs go to OperatorImpl 2018-03-24 19:32:11 -04:00
Peter Boyle
5412628ea6 begin end lamda 2018-03-24 19:31:45 -04:00
Peter Boyle
1f70cedbab Have to make all kernel called routines static since object reference will be a host pointer on GPU 2018-03-24 19:29:26 -04:00
Peter Boyle
b50f37cfb4 Remove overlap comms flag 2018-03-24 19:28:53 -04:00
Peter Boyle
cb0d2a1b03 threaded rng init; I thought this was on 2018-03-24 19:28:17 -04:00
Peter Boyle
4e1272fabf Kernels need to be static to work on GPU. No reference to host resident data 2018-03-22 18:44:53 -04:00
Peter Boyle
607dc2d3c6 Remove lebesgue order 2018-03-22 18:23:09 -04:00