1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-09-20 17:25:37 +01:00
Commit Graph

3836 Commits

Author SHA1 Message Date
Peter Boyle
6d0f1aabb1 Fix the multi-node path 2018-09-09 14:27:37 +01:00
Peter Boyle
f4bfeb835d Drop back to smaller Ls 2018-09-09 14:25:06 +01:00
Peter Boyle
394b7b6276 Verbose decrease 2018-09-09 14:24:46 +01:00
Peter Boyle
da17a015c7 Pack the stencil smaller for 128 bit access 2018-07-23 06:12:45 -04:00
Peter Boyle
1fd08c21ac make simd width configure time option for GPU 2018-07-23 06:10:55 -04:00
Peter Boyle
28db0631ff Hack to force 128bit accesses 2018-07-23 06:10:27 -04:00
Peter Boyle
b35401b86b Fix CUDA_ARCH. Need to simplify. See when new eigen release happens 2018-07-23 06:09:33 -04:00
Peter Boyle
a0714de8ec Define vector length for GPU 2018-07-23 06:09:05 -04:00
Peter Boyle
21a1710b43 Verbose vector length 2018-07-23 06:08:39 -04:00
Peter Boyle
b2b5137d28 Finally starting to get decent performance on Volta 2018-07-13 12:06:18 -04:00
Peter Boyle
2cc07450f4 Fastest option for the dslash 2018-07-05 09:57:55 -04:00
Peter Boyle
c0e8bc9da9 Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume. 2018-07-05 07:10:25 -04:00
Peter Boyle
b1265ae867 Prettify code 2018-07-05 07:08:06 -04:00
Peter Boyle
32bb85ea4c Standard extractLane is fast 2018-07-05 07:07:30 -04:00
Peter Boyle
ca0607b6ef Clearer kernel call meaning 2018-07-05 07:06:15 -04:00
Peter Boyle
19b527e83f Better extract merge for GPU. Let the SIMD header files define the pointer type for
access. GPU redirects through builtin float2, double2 for complex
2018-07-05 07:05:13 -04:00
Peter Boyle
4730d4692a Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks 2018-07-05 07:03:33 -04:00
Peter Boyle
1bb456c0c5 Minor GPU vector width change 2018-07-05 07:02:04 -04:00
Peter Boyle
4b04ae3611 Printing improvement 2018-07-05 06:59:38 -04:00
Peter Boyle
2f776d51c6 Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
but a bitof (known) work.
2018-07-05 06:58:37 -04:00
paboyle
3a50afe7e7 GPU dslash updates 2018-06-27 22:32:21 +01:00
paboyle
f8e880b445 Loop for s and xyzt offlow 2018-06-27 21:49:57 +01:00
paboyle
3e947527cb Move looping over "s" and "site" into kernels for GPU optimisatoin 2018-06-27 21:29:43 +01:00
paboyle
31f65beac8 Move site and Ls looping into the kernels 2018-06-27 21:28:48 +01:00
paboyle
38e2a32ac9 Single SIMD lane operations for CUDA 2018-06-27 21:28:06 +01:00
paboyle
efa84ca50a Keep Cuda 9.1 happy 2018-06-27 21:27:32 +01:00
paboyle
5e96d6d04c Keep CUDA happy 2018-06-27 21:27:11 +01:00
paboyle
df30bdc599 CUDA happy 2018-06-27 21:26:49 +01:00
paboyle
7f45222924 Diagnostics on memory alloc fail 2018-06-27 21:26:20 +01:00
paboyle
dd891f5e3b Use NVCC to suppress device Eigen 2018-06-27 21:25:17 +01:00
paboyle
6c97a6a071 Coalescing version of the kernel 2018-06-13 20:52:29 +01:00
paboyle
73bb2d5128 Ugly hack to speed up compile on GPU; we don't use the hand kernels on GPU anyway so why compile 2018-06-13 20:35:28 +01:00
paboyle
b710fec6ea Gpu code first version of specialised kernel 2018-06-13 20:34:39 +01:00
paboyle
b2a8cd60f5 Doubled gauge field is useful 2018-06-13 20:27:47 +01:00
paboyle
867ee364ab Explicit instantiation hooks 2018-06-13 20:27:12 +01:00
paboyle
25becc9324 GPU tweaks for benchmarking; really necessary? 2018-06-13 20:26:07 +01:00
paboyle
94d1ae4c82 Some prep work for GPU shared memory. Need to be careful, as will try GPU direct
RDMA and inter-GPU memory sharing on SUmmit later
2018-06-13 20:24:06 +01:00
paboyle
2075b177ef CUDA_ARCH more carefule treatment 2018-06-13 20:22:34 +01:00
paboyle
847c761ccc Move sfw IEEE fp16 into central location 2018-06-13 20:22:01 +01:00
paboyle
8287ed8383 New GPU vector targets 2018-06-13 20:21:35 +01:00
paboyle
e6be7416f4 Use managed memory 2018-06-13 20:14:00 +01:00
paboyle
26863b6d95 User Managed memory 2018-06-13 20:13:42 +01:00
paboyle
ebd730bd54 Adding 2D loops 2018-06-13 20:13:01 +01:00
paboyle
066be31a3b Optional GPU target SIMD types; work in progress and trying experiments 2018-06-13 20:07:55 +01:00
paboyle
7a4c142955 Add GPU specific simd targets 2018-06-13 19:55:30 +01:00
Peter Boyle
eb7d34a4cc GPU version 2018-05-14 19:41:47 -04:00
Peter Boyle
aab27a655a Start of GPU kernels 2018-05-14 19:41:17 -04:00
Peter Boyle
93280bae85 Gpu option 2018-05-14 19:40:58 -04:00
Peter Boyle
c5f93abcd7 GPU clean up 2018-05-14 19:40:33 -04:00
Peter Boyle
d5deef782d Useful debug comments 2018-05-14 19:39:52 -04:00