mirror of
https://github.com/paboyle/Grid.git
synced 2025-04-25 13:15:55 +01:00
Udpdate TODO afer gianluc marge
This commit is contained in:
parent
ee6f96d85c
commit
a9342c6ae5
40
TODO
40
TODO
@ -1,32 +1,60 @@
|
|||||||
|
|
||||||
|
|
||||||
GPU branch code item work list
|
GPU branch code item work list
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
|
||||||
TODO:
|
- Investigate why slower than september
|
||||||
---------------
|
|
||||||
|
- Common source GPU and CPU generic kernels???
|
||||||
|
- coalescedRead, coalescedWrite in expressions.
|
||||||
|
- Uniform coding between GPU kernels and CPU kernels attempt
|
||||||
|
|
||||||
|
- SIMD dirs in stencil
|
||||||
|
|
||||||
|
- Merge develop and test HMC
|
||||||
|
|
||||||
|
- GPU accelerate EOFA
|
||||||
|
|
||||||
- Make GPU offload reductions optionally deterministic
|
- Make GPU offload reductions optionally deterministic
|
||||||
|
|
||||||
- Accelerate the cshift
|
- Accelerate the cshift
|
||||||
- Accelerate non-dslash elements of Mobius; check accelerator_loop uniformly used in fermion operators
|
|
||||||
- Gamma tables on GPU
|
- Gamma tables on GPU; check this.
|
||||||
|
|
||||||
- Mobius kernel fusion.
|
- Mobius kernel fusion.
|
||||||
|
|
||||||
- Reread WilsonKernels and check diffs
|
- Reread WilsonKernels and check diffs
|
||||||
|
|
||||||
- thread_loop interface revisit.
|
- thread_loop interface revisit.
|
||||||
- pragma once uniformly
|
- pragma once uniformly
|
||||||
- Audit changes
|
- Audit changes
|
||||||
- Audit NAMESPACE CHANGES
|
- Audit NAMESPACE CHANGES
|
||||||
- Verify HMC one flavour ratio; suspect dH too big; verify timestep with Guido.
|
|
||||||
- Staggered kernels inline for GPU
|
- Staggered kernels inline for GPU
|
||||||
|
|
||||||
- Single GPU simd target (VGPU)
|
- Single GPU simd target (VGPU)
|
||||||
|
|
||||||
|
-----
|
||||||
|
Gianluca's changes
|
||||||
|
- Performance impact of construct in aligned allocator???
|
||||||
|
- Inner product compare to Summit inner product optimisation
|
||||||
|
- CayleyFermion5D.cc - flop count line 166 odd. Shouldn't depend on arch
|
||||||
|
- - Review Vector use
|
||||||
|
- CayleyFermion5D.h - DperpGPU unify coding style
|
||||||
|
---------
|
||||||
|
|
||||||
- Lebesgue reorder in all kernels
|
- Lebesgue reorder in all kernels
|
||||||
- merge2 where is it used. Audit routines, comment out and check compile.
|
- merge2 where is it used. Audit routines, comment out and check compile.
|
||||||
- AVX512 still broken, lebesgue order missing ?
|
- AVX512 still broken, lebesgue order missing ?
|
||||||
- Neon ??
|
|
||||||
|
|
||||||
DONE:
|
DONE:
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
|
||||||
|
- Committed my modifications
|
||||||
|
- Accelerate non-dslash elements of Mobius; check accelerator_loop uniformly used in fermion operators
|
||||||
|
- Merged Gianluca modifications
|
||||||
|
- Verify HMC one flavour ratio
|
||||||
- GPU offload reductions: using thrust::reduce?
|
- GPU offload reductions: using thrust::reduce?
|
||||||
- Deprecate JSON.
|
- Deprecate JSON.
|
||||||
- pugixml difficult.
|
- pugixml difficult.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user