1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-19 08:17:05 +01:00
Commit Graph

128 Commits

Author SHA1 Message Date
22c5168d70 Sycl happier 2020-05-25 08:35:56 -07:00
32be2b13d3 Updates for HiP 2020-05-24 14:00:55 -04:00
7860a50f70 Make view specify where and drive data motion - first cut.
This is a compile tiime option --enable-unified=yes/no
2020-05-21 16:13:16 -04:00
ebb60330c9 Automatic data motion options beginning 2020-05-17 16:34:25 -04:00
d24d8e8398 Use X-direction as more bits meaningful on CUDA.
2^31-1 shoulddd always bee enough for SIMD and thread reduced local volume

e.g. 32*2^31 = 2^36 = (2^9)^4 or 512^4 ias big enough.

Where 32 is gpu_threads * Nsimd = 8*4
2020-05-12 10:35:49 -04:00
07c0c02f8c Speed up Cshift 2020-05-11 17:02:01 -04:00
bbbee5660d First compiile on HiP 2020-05-10 05:28:09 -04:00
52081acfa5 NVCC compile fixes 2020-05-08 13:14:12 -04:00
f8b8e00090 Systematise the accelerator primitives and locate to Grid/threads/Accelerator.h / Accelerator.cc
Aim to reduce the amount of cuda and other code variations floating around all over the place.

Will move GpuInit iinto Accelerator.cc from Init.cc
Need to worry about SharedMemoryMPI.cc and the Peer2Peer windows
2020-05-08 06:23:55 -07:00
28a1fcaaff First compile against SYCL 2020-05-05 11:13:27 -07:00
9b7a6d197f Fix for GCC preprocessor/pragma handling bug 2019-08-23 14:37:46 +01:00
275c1c920f More info dump on error from CUDA 2019-07-26 12:18:53 +01:00
f15eeb0283 localise scope of variables declared in macro 2019-07-12 06:47:01 +01:00
7379047482 Threading and acceleration primitives further changes. accelerator_barrier() needed and used 2019-06-15 08:22:48 +01:00
d836ce3b78 Clean up of acceleration and threading primitives 2019-06-15 08:14:21 +01:00
8adc5da7dd Testig out approaches to kernel writing introducing SIMT_loop temporarily 2019-06-08 13:47:04 +01:00
8113845f9c coalesce loop. Need to rationalise this file 2019-06-04 23:49:29 +01:00
c2625a127e Non blocking loop. Want to change the naming here. 2019-06-04 20:52:59 +01:00
a584b16c4a Adding a non-blocking kernel launch 2019-05-18 17:39:54 +01:00
8c91e82ee8 GPU clean up, remove parallel_for. Split into accelerator_loop, thread_loop
cases, and collides with parallel_for in thrust
2019-01-01 15:06:46 +00:00
2fcedb13dd Step size modification in HMC; ICC happy thread pragmas 2018-12-20 09:32:33 +00:00
afc462bd58 Bracketing issue in macro 2018-12-13 10:53:22 +00:00
b57a4d32aa Merge branch 'develop' into feature/gpu-port 2018-12-13 05:11:34 +00:00
f592ec8baa Hadrons: contractor performance fix 2018-11-16 20:59:49 +00:00
8b007b5c24 Hadrons: remove the use of OpenMP reductions 2018-11-16 20:00:29 +00:00
88d9922e4f Hadrons: fast A2A matrix contraction kernels 2018-11-06 19:49:09 +00:00
1651111d18 Hadrons: final, portable form of the contractor benchmark 2018-11-05 21:29:13 +00:00
fb7d021b9d Hadrons: moving Hadrons to root directory, build system improvements 2018-08-28 15:00:40 +01:00