Udpdate TODO afer gianluc marge

2025-07-26 17:27:08 +01:00 · 2019-05-18 22:58:25 +01:00
parent ee6f96d85c
commit a9342c6ae5
1 changed files with 34 additions and 6 deletions
--- a/40
+++ b/40
@@ -1,32 +1,60 @@

+
 GPU branch code item work list
 -----------------------------

-TODO:
---------------
+- Investigate why slower than september
+
+- Common source GPU and CPU generic kernels???
+   - coalescedRead, coalescedWrite in expressions.
+   - Uniform coding between GPU kernels and CPU kernels attempt
+
+- SIMD dirs in stencil
+
+- Merge develop and test HMC
+
+- GPU accelerate EOFA

 - Make GPU offload reductions optionally deterministic
+
 - Accelerate the cshift
- Accelerate non-dslash elements of Mobius; check accelerator_loop uniformly used in fermion operators
- Gamma tables on GPU
+
+- Gamma tables on GPU; check this.

 - Mobius kernel fusion.
+
 - Reread WilsonKernels and check diffs

 - thread_loop interface revisit.
 - pragma once uniformly
 - Audit changes
 - Audit NAMESPACE CHANGES
- Verify HMC one flavour ratio; suspect dH too big; verify timestep with Guido.
+
 - Staggered kernels inline for GPU
+
 - Single GPU simd target (VGPU)
+
+-----
+Gianluca's changes
+- Performance impact of construct in aligned allocator???
+- Inner product compare to Summit inner product optimisation
+- CayleyFermion5D.cc - flop count line 166 odd. Shouldn't depend on arch
+-                    - Review Vector use
+- CayleyFermion5D.h  - DperpGPU unify coding style
+---------
+
 - Lebesgue reorder in all kernels
 - merge2 where is it used. Audit routines, comment out and check compile.
 - AVX512 still broken, lebesgue order missing ?
- Neon ??
+

 DONE:
 -----------------------------
+
+- Committed my modifications
+- Accelerate non-dslash elements of Mobius; check accelerator_loop uniformly used in fermion operators
+   - Merged Gianluca modifications
+- Verify HMC one flavour ratio
 - GPU offload reductions: using thrust::reduce?
 - Deprecate JSON.
 - pugixml difficult.