TODO updates

2025-07-26 09:17:08 +01:00 · 2019-07-12 17:11:15 +01:00
parent 78ebd93281
commit c0d89a2dbb
1 changed files with 21 additions and 29 deletions
--- a/50
+++ b/50
@@ -1,66 +1,58 @@
 - Lattice_arith - are the mult, mac etc.. still needed after ET engine?
 - LinalgUtils  ssp loop not offloaded
 - Mobius/Domain EOFA cache header implementaiotn has thread_loop
 - ImprovedStaggered accelerate
 - Lattice_reduction - remnant thread_loops must offload. Audit thread_loop in main code for non-accelerated code  
  Lattice_rng
  Lattice_transfer.h
- Stencil.h : Thread loops in exchange code. Need to offload these
+- accelerate A2Autils -- off critical path for HMC
-
+- Lebesque order reintroduction. StencilView should have pointer to it
 - Lebesque order reintroduction. StencilView should have pointer
 - accelerate A2Autils
 GPU branch code item work list
 -----------------------------
-7) Accelerate the cshift
+7) Accelerate the cshift & benchmark
 * 0) Single GPU
 - 128 bit integer table load in GPU code.
- coalescedRead <- threadIdx.x
+- Staggered kernels -> GPU coalesced loop, loop in kernels
 - Gianluca's changes to Cayley into gpu-port
 - GPU accelerate EOFA
 - Staggered kernels -> GPU coalesced loop
 - Staggered kernels inline for GPU -- DONE
-
+* Gianluca merger
 * 2) 5D terms & Gianluca
  - Cayley coefficients -> GPU retention or prefetch
-  - Mobius kernel fusion. -- Gianluca?
+  - Gianluca's changes to Cayley into gpu-port
-  - Make GPU offload reductions optionally deterministic -- Gianluca
+  - Mobius kernel fusion.                     -- Gianluca?
  - Make GPU offload reductions deterministic -- Gianluca merge
  - Lattice_reduction - remnant thread_loops must offload. Audit thread_loop in main code for non-accelerated code  
 * 3) Comms/NVlink
- OpenMP tasks to run comms threads. 
+- OpenMP tasks to run comms threads. Experiment with it 
 - Remove explicit openMP in staggered. 
- Single parallel region around both the Kernel call
+- Single parallel region around both the Kernel call and the comms.
  and the comms.
 - Fix the halo exchange SIMT loop
- Stencil gather
+- Stencil gather ??
 - SIMD dirs in stencil
 * 4) ET enhancements
 - eval -> scalar ops in ET engine
-   - coalescedRead, coalescedWrite in expressions.
+- coalescedRead, coalescedWrite in expressions.
 * 5) Misc
 - Conserved current clean up.
 - multLinkProp eliminate
 8) Merge develop and test HMC
-
+9) Gamma tables on GPU; check this. Appear to work, but no idea why. Are these done on CPU?
 9) Gamma tables on GPU; check this.
 10) Audit
 -     pragma once uniformly
 -     Audit NAMESPACE CHANGES
 -     Audit changes
 =============================================================================================
 - GPU accelerate EOFA                                                  -- DONE
 - LinalgUtils  ssp loop not offloaded                                  -- DONE
 - coalescedRead <- threadIdx.x                                         -- DONE
 - Stencil.h : Thread loops in exchange code. Need to offload these     -- DONE ; pending debug
 - Mobius/Domain EOFA cache header implementaiotn has thread_loop       -- DONE ; pending test
 - Differentiate non-temporal coalescedWrite from temporal              -- DONE
 - Clean up PRAGMAS, and SIMT_loop                                      -- DONE
  thread_loop interface revisit.
  _foreach