Peter Boyle
|
d66a9af6a3
|
No compile fix
|
2025-04-04 18:35:05 -04:00 |
|
paboyle
|
68f112d576
|
New software moves cl::sycl
|
2024-10-10 22:03:04 +00:00 |
|
paboyle
|
066544281f
|
Deprecate UVM
|
2024-09-17 13:34:27 +00:00 |
|
dbollweg
|
461cd045c6
|
sliceSum cleanup
|
2024-03-13 18:18:44 -04:00 |
|
dbollweg
|
31f9971dbf
|
avoid PI_ERROR_OUT_OF_RESOURCES in sycl sliceSum
|
2024-03-13 13:39:26 -04:00 |
|
dbollweg
|
be94cf1c6f
|
Fewer wait-calls in sycl slicesum
|
2024-03-06 16:53:13 -05:00 |
|
dbollweg
|
3c9012676a
|
CUDA cub refuses to reduce vSpinColourMatrix, breaking up into smaller parts like already done for HIP case.
|
2024-02-27 12:41:45 -05:00 |
|
Dennis Bollweg
|
6cd2d8fcd5
|
Replace cuda/hip memcpy with Grid functions
|
2024-02-26 09:55:07 -05:00 |
|
dbollweg
|
0a816b5509
|
Merge branch 'feature/sliceSum_gpu' of https://github.com/dbollweg/Grid into feature/sliceSum_gpu
|
2024-02-22 21:43:06 -05:00 |
|
dbollweg
|
1c8b807c2e
|
free malloc'd memory
|
2024-02-22 21:42:44 -05:00 |
|
Dennis Bollweg
|
15878f7613
|
sliceSumReduction_cub_large now also faster than CPU on Frontier
|
2024-02-16 13:55:21 -05:00 |
|
dbollweg
|
6f3455900e
|
Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs
|
2024-02-16 13:15:02 -05:00 |
|
dbollweg
|
9514035b87
|
refactor slicesum: slicesum uses GPU version by default now
|
2024-02-09 13:02:28 -05:00 |
|