1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-04-25 13:15:55 +01:00

Udpdate TODO afer gianluc marge

This commit is contained in:
Peter Boyle 2019-05-18 22:58:25 +01:00
parent ee6f96d85c
commit a9342c6ae5

40
TODO
View File

@ -1,32 +1,60 @@
GPU branch code item work list GPU branch code item work list
----------------------------- -----------------------------
TODO: - Investigate why slower than september
---------------
- Common source GPU and CPU generic kernels???
- coalescedRead, coalescedWrite in expressions.
- Uniform coding between GPU kernels and CPU kernels attempt
- SIMD dirs in stencil
- Merge develop and test HMC
- GPU accelerate EOFA
- Make GPU offload reductions optionally deterministic - Make GPU offload reductions optionally deterministic
- Accelerate the cshift - Accelerate the cshift
- Accelerate non-dslash elements of Mobius; check accelerator_loop uniformly used in fermion operators
- Gamma tables on GPU - Gamma tables on GPU; check this.
- Mobius kernel fusion. - Mobius kernel fusion.
- Reread WilsonKernels and check diffs - Reread WilsonKernels and check diffs
- thread_loop interface revisit. - thread_loop interface revisit.
- pragma once uniformly - pragma once uniformly
- Audit changes - Audit changes
- Audit NAMESPACE CHANGES - Audit NAMESPACE CHANGES
- Verify HMC one flavour ratio; suspect dH too big; verify timestep with Guido.
- Staggered kernels inline for GPU - Staggered kernels inline for GPU
- Single GPU simd target (VGPU) - Single GPU simd target (VGPU)
-----
Gianluca's changes
- Performance impact of construct in aligned allocator???
- Inner product compare to Summit inner product optimisation
- CayleyFermion5D.cc - flop count line 166 odd. Shouldn't depend on arch
- - Review Vector use
- CayleyFermion5D.h - DperpGPU unify coding style
---------
- Lebesgue reorder in all kernels - Lebesgue reorder in all kernels
- merge2 where is it used. Audit routines, comment out and check compile. - merge2 where is it used. Audit routines, comment out and check compile.
- AVX512 still broken, lebesgue order missing ? - AVX512 still broken, lebesgue order missing ?
- Neon ??
DONE: DONE:
----------------------------- -----------------------------
- Committed my modifications
- Accelerate non-dslash elements of Mobius; check accelerator_loop uniformly used in fermion operators
- Merged Gianluca modifications
- Verify HMC one flavour ratio
- GPU offload reductions: using thrust::reduce? - GPU offload reductions: using thrust::reduce?
- Deprecate JSON. - Deprecate JSON.
- pugixml difficult. - pugixml difficult.