Peter Boyle
6c5fa8dcd8
Aligned allocate on CPU put through this interface
2020-06-20 14:34:29 -04:00
Peter Boyle
0d2f913a1a
String.h for linux
2020-06-20 09:37:31 -04:00
Peter Boyle
11bc1aeadc
TThread count defaultt to fastest
2020-06-19 14:30:35 -04:00
Peter Boyle
66005929af
Set up the cache size on all ranks
2020-06-19 12:50:54 -04:00
Peter Boyle
2b1e259441
Decode of SYCL devices fix
2020-06-04 17:16:55 -07:00
Peter Boyle
f39c2a240b
Priintinig and device memory size detection
2020-06-04 14:58:03 -04:00
Peter Boyle
e93e12b6a4
More verbose SYCL setup
2020-06-03 09:12:11 -04:00
Peter Boyle
ee63721bad
int unhappiness sycl fix
2020-05-25 08:36:24 -07:00
Peter Boyle
22c5168d70
Sycl happier
2020-05-25 08:35:56 -07:00
Peter Boyle
32be2b13d3
Updates for HiP
2020-05-24 14:00:55 -04:00
Peter Boyle
7860a50f70
Make view specify where and drive data motion - first cut.
...
This is a compile tiime option --enable-unified=yes/no
2020-05-21 16:13:16 -04:00
Peter Boyle
ebb60330c9
Automatic data motion options beginning
2020-05-17 16:34:25 -04:00
Peter Boyle
d24d8e8398
Use X-direction as more bits meaningful on CUDA.
...
2^31-1 shoulddd always bee enough for SIMD and thread reduced local volume
e.g. 32*2^31 = 2^36 = (2^9)^4 or 512^4 ias big enough.
Where 32 is gpu_threads * Nsimd = 8*4
2020-05-12 10:35:49 -04:00
Peter Boyle
07c0c02f8c
Speed up Cshift
2020-05-11 17:02:01 -04:00
Peter Boyle
bbbee5660d
First compiile on HiP
2020-05-10 05:28:09 -04:00
Peter Boyle
52081acfa5
NVCC compile fixes
2020-05-08 13:14:12 -04:00
Peter Boyle
f8b8e00090
Systematise the accelerator primitives and locate to Grid/threads/Accelerator.h / Accelerator.cc
...
Aim to reduce the amount of cuda and other code variations floating around all over the place.
Will move GpuInit iinto Accelerator.cc from Init.cc
Need to worry about SharedMemoryMPI.cc and the Peer2Peer windows
2020-05-08 06:23:55 -07:00
Peter Boyle
28a1fcaaff
First compile against SYCL
2020-05-05 11:13:27 -07:00
Peter Boyle
9b7a6d197f
Fix for GCC preprocessor/pragma handling bug
2019-08-23 14:37:46 +01:00
Peter Boyle
275c1c920f
More info dump on error from CUDA
2019-07-26 12:18:53 +01:00
Peter Boyle
f15eeb0283
localise scope of variables declared in macro
2019-07-12 06:47:01 +01:00
Peter Boyle
7379047482
Threading and acceleration primitives further changes. accelerator_barrier() needed and used
2019-06-15 08:22:48 +01:00
Peter Boyle
d836ce3b78
Clean up of acceleration and threading primitives
2019-06-15 08:14:21 +01:00
Peter Boyle
8adc5da7dd
Testig out approaches to kernel writing introducing SIMT_loop temporarily
2019-06-08 13:47:04 +01:00
Peter Boyle
8113845f9c
coalesce loop. Need to rationalise this file
2019-06-04 23:49:29 +01:00
Peter Boyle
c2625a127e
Non blocking loop. Want to change the naming here.
2019-06-04 20:52:59 +01:00
Peter Boyle
a584b16c4a
Adding a non-blocking kernel launch
2019-05-18 17:39:54 +01:00
Peter Boyle
8c91e82ee8
GPU clean up, remove parallel_for. Split into accelerator_loop, thread_loop
...
cases, and collides with parallel_for in thrust
2019-01-01 15:06:46 +00:00
Peter Boyle
2fcedb13dd
Step size modification in HMC; ICC happy thread pragmas
2018-12-20 09:32:33 +00:00
Peter Boyle
afc462bd58
Bracketing issue in macro
2018-12-13 10:53:22 +00:00
Peter Boyle
b57a4d32aa
Merge branch 'develop' into feature/gpu-port
2018-12-13 05:11:34 +00:00
f592ec8baa
Hadrons: contractor performance fix
2018-11-16 20:59:49 +00:00
8b007b5c24
Hadrons: remove the use of OpenMP reductions
2018-11-16 20:00:29 +00:00
88d9922e4f
Hadrons: fast A2A matrix contraction kernels
2018-11-06 19:49:09 +00:00
1651111d18
Hadrons: final, portable form of the contractor benchmark
2018-11-05 21:29:13 +00:00
fb7d021b9d
Hadrons: moving Hadrons to root directory, build system improvements
2018-08-28 15:00:40 +01:00