Simplify Impl

2025-07-27 17:57:08 +01:00 · 2019-06-09 22:26:27 +01:00
parent d6c0e0756d
commit 36f06555a2
6 changed files with 33 additions and 44 deletions
--- a/14
+++ b/14
@@ -6,17 +6,20 @@ GPU branch code item work list
 * 0) Single GPU
 - 128 bit integer table load in GPU code.
 - coalescedRead <- threadIdx.x
+- Gianluca's changes to Cayley into gpu-port
+- GPU accelerate EOFA
 - Clean up PRAGMAS, and SIMT_loop
  thread_loop interface revisit.
  for_n
  for
+- Staggered kernels -> GPU coalesced loop
+- Staggered kernels inline for GPU -- DONE


-* 2) 5D terms
+* 2) 5D terms & Gianluca
  - Cayley coefficients -> GPU retention or prefetch
-  - Gianluca's changes to Cayley into gpu-port
-  - GPU accelerate EOFA
  - Mobius kernel fusion. -- Gianluca?
+  - Make GPU offload reductions optionally deterministic -- Gianluca

 * 3) Comms/NVlink
 - OpenMP tasks to run comms threads. 
@@ -30,12 +33,9 @@ GPU branch code item work list

 * 5) Misc

- SIMD dirs in stencil
 - Conserved current clean up.
 - multLinkProp eliminate
- Staggered kernels -> GPU coalesced loop
- Staggered kernels inline for GPU -- DONE
- Make GPU offload reductions optionally deterministic -- Gianluca
+- SIMD dirs in stencil
 
 7) Accelerate the cshift