Update todo list

2026-02-26 16:46:12 +00:00 · 2019-01-01 13:48:06 +00:00
parent 4a96c067ae
commit bf5685eb11
1 changed files with 34 additions and 28 deletions
--- a/62
+++ b/62
@@ -1,18 +1,45 @@
 TODO:
 ---------------

-
 GPU branch code item work list
 -----------------------------

- Audit NAMESPACE CHANGES
- Audit HMC timestep / traj length size
- Verify HMC one flavour ratio; suspect dH too big
- pragma once uniformly
 - GPU offload reductions; thrust initial ; inclusive_scan vs reduce?
- Audit changes
- thread_loop interface revisit.
+- Accelerate the cshift
+- Accelerate non-dslash elements of Mobius; check accelerator_loop uniformly used in fermion operators
+- Gamma tables on GPU

+- Mobius kernel fusion.
+- Reread WilsonKernels and check diffs
+
+- thread_loop interface revisit.
+- pragma once uniformly
+- Audit changes
+- Audit NAMESPACE CHANGES
+- Verify HMC one flavour ratio; suspect dH too big; verify timestep with Guido.
+- Staggered kernels inline for GPU
+- Single GPU simd target (VGPU)
+- Lebesgue reorder in all kernels
+- merge2 where is it used. Audit routines, comment out and check compile.
+- AVX512 still broken, lebesgue order missing ?
+- Neon ??
+
+-----------------------------
+Physics item work list:
+
+2)- Consistent linear solver flop count/rate -- PARTIAL, time but no flop/s yet
+4)- Multigrid Wilson and DWF, compare to other Multigrid implementations
+5)- HDCR resume
+
+-----------------------------
+
+DONE
+
+- Audit HMC timestep / traj length size
+- GPU offload reductions; thrust initial ; inclusive_scan vs reduce?
+- Pragmas.h - prune and remove strong_inline (?)
+- GPU offload reductions; thrust initial ; inclusive_scan vs reduce?
+- Remove old parallel_for macros, fix errors
 - - Need (1) omp parallel for     <-- thread_loop
 - -      (2) omp for
 - -      (3) omp for collapse(n)
@@ -37,27 +64,6 @@ GPU branch code item work list

    and same "in_region".

- Remove old parallel_for macros, fix errors
- check accelerator_loop uniformly used in fermion operators
- Gamma tables on GPU
- Accelerate the cshift
- Accelerate non-dslash elements of Mobius
- Mobius kernel fusion.
- Staggered kernels inline for GPU
- Reread WilsonKernels and check diffs
- Single GPU simd target (VGPU)
- Lebesgue reorder in all kernels
- merge2 where used. Audit routines, comment out and check compile.
- Pragmas.h - prune and remove strong_inline (?)
- 
-----------------------------
-Physics item work list:
-
-2)- Consistent linear solver flop count/rate -- PARTIAL, time but no flop/s yet
-4)- Multigrid Wilson and DWF, compare to other Multigrid implementations
-5)- HDCR resume
-
-----------------------------
 Nov 2018

 1)- BG/Q port and check ; Andrew says ok.