Udpdate TODO afer gianluc marge

2025-11-18 12:29:31 +00:00 · 2019-05-18 22:58:25 +01:00
parent ee6f96d85c
commit a9342c6ae5
1 changed files with 34 additions and 6 deletions
--- a/40
+++ b/40
@@ -1,32 +1,60 @@
 GPU branch code item work list
 -----------------------------
-TODO:
+- Investigate why slower than september
---------------
+
 - Common source GPU and CPU generic kernels???
   - coalescedRead, coalescedWrite in expressions.
   - Uniform coding between GPU kernels and CPU kernels attempt
 - SIMD dirs in stencil
 - Merge develop and test HMC
 - GPU accelerate EOFA
 - Make GPU offload reductions optionally deterministic
 - Accelerate the cshift
- Accelerate non-dslash elements of Mobius; check accelerator_loop uniformly used in fermion operators
+
- Gamma tables on GPU
+- Gamma tables on GPU; check this.
 - Mobius kernel fusion.
 - Reread WilsonKernels and check diffs
 - thread_loop interface revisit.
 - pragma once uniformly
 - Audit changes
 - Audit NAMESPACE CHANGES
- Verify HMC one flavour ratio; suspect dH too big; verify timestep with Guido.
+
 - Staggered kernels inline for GPU
 - Single GPU simd target (VGPU)
 -----
 Gianluca's changes
 - Performance impact of construct in aligned allocator???
 - Inner product compare to Summit inner product optimisation
 - CayleyFermion5D.cc - flop count line 166 odd. Shouldn't depend on arch
 -                    - Review Vector use
 - CayleyFermion5D.h  - DperpGPU unify coding style
 ---------
 - Lebesgue reorder in all kernels
 - merge2 where is it used. Audit routines, comment out and check compile.
 - AVX512 still broken, lebesgue order missing ?
- Neon ??
+
 DONE:
 -----------------------------
 - Committed my modifications
 - Accelerate non-dslash elements of Mobius; check accelerator_loop uniformly used in fermion operators
   - Merged Gianluca modifications
 - Verify HMC one flavour ratio
 - GPU offload reductions: using thrust::reduce?
 - Deprecate JSON.
 - pugixml difficult.